[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What is the best way to control espeak?

From: Jeremy Whiting
Subject: Re: What is the best way to control espeak?
Date: Fri, 10 Jun 2022 15:08:13 -0600

Hello Rastislav,

From looking at espeak --help here I think I can answer some of this.

1) To output to a wav file instead of speaking on audio device use the -w filename parameter.
2) To have espeak interpret SSML, use the -m argument to parse SSML (makes it ignore <> characters also though, so those would need to be escaped if by themselves in your text.)
3) For sound effects I'm not sure if espeak supports sound icons, but if it does those could be used for sound effects.

SSML can also be used for manipulating the voice parameters throughout the text with tags that specify how the voice should change or even to change voices completely as far as I understand it.

Jeremy Whiting

On Fri, Jun 10, 2022 at 2:23 PM Rastislav Kish <rastislav.kish@protonmail.com> wrote:
Hello list,

this question would likely be better suited for the espeak mailing list,
though I'm not subscribed there and it's partially also related to
Speech dispatcher, so I'm posting it here.

When synthesizing text, I would like espeak-ng to do these things:

* Change pitch in specified areas of the text

* Change the voice variant in specified areas of the text, preferably
without changing the on-going intonation, though I suppose it's not an
absolute necessity

* Mix in some kind of sound effect. Note that I don't mean specific
sounds to be played, but rather generation of various noises, brummings,
tones etc. in the exact time the marked text is being spoken in the audio.

It would be best if it was possible to do these things simply by some
markup, for example the SSML speechd modules use, though especially in
case of the last mentioned, that would likely be difficult to do in this
way, so I could do it programmatically as far as espeak could somehow
tell me the exact time marks an area of text is spoken, so I could
include the effect there.

And of course, as far as it could give me the audio. :)

Thank you in advance!

Best regards


reply via email to

[Prev in Thread] Current Thread [Next in Thread]