speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Chinese with IBM TTS driver


From: Gary Cramblitt
Subject: Chinese with IBM TTS driver
Date: Thu, 15 Mar 2007 17:10:36 -0500

On Thursday 15 March 2007 06:10, Tomas Cerha wrote:
> Gary Cramblitt wrote:
> > I guess the first step would be to examine the logs to
> > see if speech-dispatcher is receiving the text OK, then see if the ibmtts
> > module is receiving the text OK.  If both are true, then the bug would be
> > in the way the ibmtts module sends it to the ibm eloquence engine.
>
> I'm attaching the log file with debug level 5 which I got from the
> Chinese user.
>
> Best regards, Tomas.

Thanks Tomas.

It appears that the text is being mangled by Speech Dispatcher before it gets 
to ibmtts module.  Here is hex dump of portion of the log:

000010f0  53 47 20 62 65 66 6f 72  65 20 69 6e 64 65 78 20  |SG before index |
00001100  6d 61 72 6b 69 6e 67 3a  20 7c e9 ab 3f e7 3f 3f  |marking: |..?.??|
00001110  e6 3f ba 7c 2c 20 73 73  6d 6c 5f 6d 6f 64 65 3d  |.?.|, ssml_mode=|
00001120  30 0a 5b 54 68 75 20 4d  61 72 20 31 35 20 31 37  |0.[Thu Mar 15 17|
00001130  3a 30 32 3a 33 30 20 32  30 30 37 20 3a 20 37 33  |:02:30 2007 : 73|
00001140  39 38 39 36 5d 20 73 70  65 65 63 68 64 3a 20 20  |9896] speechd:  |
00001150  20 20 20 4d 53 47 20 61  66 74 65 72 20 69 6e 64  |   MSG after ind|
00001160  65 78 20 6d 61 72 6b 69  6e 67 3a 20 7c 3c 73 70  |ex marking: |<sp|
00001170  65 61 6b 3e ff bf bf bf  bf bf 3f ff bf bf bf bf  |eak>......?.....|
00001180  bf 3f 3f ff bf bf bf bf  bf 3f 3c 2f 73 70 65 61  |.??......?</spea|
00001190  6b 3e 7c 0a 5b 54 68 75  20 4d 61 72 20 31 35 20  |k>|.[Thu Mar 15 |

Notice how the buffer's bytes change between the "before index marking" and 
the "after index marking" (ignoring the speak tags).

It is probable that even if speechd did not mangle the buffer, the ibmtts 
module would do mangling of its own.  :/  To check, would need log from 
ibmtts module in addition to speechd.log.

I'm not sure what the explanation for the mangling is.  We do handle UTF-8  
languages, such as Czech, which is ISO 8859-2.  ??

-- 
Gary Cramblitt (aka PhantomsDad)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]