speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RFC: TTSAPI: First subjects


From: Michael Pozhidaev
Subject: RFC: TTSAPI: First subjects
Date: Sun, 23 Jan 2011 03:25:08 +0600

Hello!

Let see some subjects to discuss in TTS API document. These questions
are actual as base for TTS processing daemon interface only. I mean
daemon to provide TTS service with receiving text and sending back
produced audio data.

Let see the structure for voice discovery:
typedef struct {
    /* Voice discovery */ 

    /* Prosody parameters */
    bool_t can_set_rate_relative;
    bool_t can_set_rate_absolute;
    bool_t can_get_rate_default;

Are there any ideas, why tTS daemon can be unable to provide such
functions?  Requested TTS may be unable to change rate at all, but
relative or absolute  selection always may be implemented in daemon. So,
I suggest to replace these parameters by just one:

    bool_t can_set_rate;

And the same for pitch and volume. Next, I suggest replace
all punctuation fields by one: can requested TTS mark punctuation
(exclaim, question, etc) by voice intonation. All other punctuation
processing can be easely done on client side, it is not TTS concern. For
example, punctuation in Russian often is processing by another
language. Russian users often prefer to listen punctuation by Russian
even in English text. Such feature in daemon is useless.
So, punctuation processing is the client side
concern. Suggest to add:

    bool_t can_speak_punctuation_by_intonation;

    bool_t can_set_capital_letters_mode_spelling;

OK, really needed.

    bool_t can_set_capital_letters_mode_icon;

OK, really needed too.

    bool_t can_set_capital_letters_mode_pitch;

OK.

    bool_t can_set_number_grouping;

Numbers processing is the same as for punctuation. All required numbers
processing can be easely done on client side too, so TTS daemon may not take
care about numbers. Numbers are processed just as TTS wants. For
example, numbers also can be spoken on language differed from text
language.

    bool_t can_say_text_from_position;

I think, not needed.

    bool_t can_say_char;

OK.

    bool_t can_retrieve_audio;
    bool_t can_play_audio;

Since TTS daemon always functioning in data retrieve mode, these two
fields, I suppose, are needless.

    bool_t can_report_events_by_sentences;
    bool_t can_report_events_by_words;
    bool_t can_report_custom_index_marks;

OK, very needed things, but in terms of daemon we are not talking about
events reporting. Audio data can be divided onto several chunks at
requested positions. So, I suggest to replace word "report" by word "mark".

    bool_t can_defer_message;

In terms of daemon not needed.

    bool_t can_parse_ssml

OK, but SSML is going to be replaced by SABLE, it seems to me. May be
can_parse_sable is better?

    bool_t supports_multilingual_utterances;

OK, but I would like to restrict daemon to process only one language
by one connection. In case of multiple languages in one connection we
must provide information about available language set, but it is very
confusing. Let client side initiate new connection for another language
by itself.

Any comments are welcome. If it is OK with these items, we can go next.

Thanks!

-- 
Michael Pozhidaev. Tomsk, Russia.
Russian info page: http://www.marigostra.ru/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]