gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuspeech-contact] Understanding diphones.mxml and improving vocalizati


From: Omari Stephens
Subject: [gnuspeech-contact] Understanding diphones.mxml and improving vocalization quality
Date: Fri, 26 Jan 2007 07:09:36 +0000
User-agent: Icedove 1.5.0.9 (X11/20061220)

Hi, all

I'm part of a 5-person team at MIT that is participating in the class 6.189: 
Multicore Programming Primer [1], a project based class in which we implement a 
computationally intensive application on a parallel processor, the PlayStation 
3's Cell architecture [2].  Put shortly, we are using gnuspeech as a reference 
implementation for a speech synthesis implementation on the PS3.

I'm currently working on a stripped-down, non-interactive analog of Monet to 
generate postures for the tube.  It seems that everything I would need for this 
is catalogued in diphones.mxml, but we're having trouble figuring out how to 
calculate the transitions (that is, we're unsure how to use the rules, 
transitions, and equations sections).  Any specific help on this front or 
pointers to useful spots in the source would be tremendously helpful.

Additionally, other group members are working on finding, implementing, and 
hooking up a more realistic vocal fold model.  From my own poking around on the 
Internet, it seems that most of the models are two-mass models, but I haven't 
read through anything in enough detail to know the differences between them.  
Is there a model someone would recommend that would likely improve the 
vocalization quality but also could be coded in a reasonable amount of time? 
(Hopefully a day or less)  We will probably implement this in C or C++, and may 
put more hands on this part of the project if the benefits merit that sort of 
attention.  Our final product is due this coming Friday, 2 Feb.

Lastly, what other changes could we make to improve the vocalization quality?  
I had thought of perhaps emulating smoother transitions between the different 
vocal tract regions, but I know neither if this is feasible time-wise, nor if 
it will make an appreciable difference/improvement in output sound quality.

[1] http://cag.csail.mit.edu/ps3/
[2] http://en.wikipedia.org/wiki/Cell_microprocessor

Thanks very much for your time and any help you all may be able to offer.
--xsdg, for the 6.189 Speech Synthesis team

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]