Re: [gnuspeech-contact] TRM as backend for festival

Monet is an interactive tool for developing data sets for arbitrary languages. Real-time Monet (which has not yet been ported) is the heart of a daemon that uses these data sets to convert text to speech. It is a stripped down version of Monet and it would be really nice if someone would take on that task (please ;-). Without the data sets, and the algorithms for manipulating the parameters tracks, you don't have a speech synthesiser, you have a rather specialised trumpet!

The data sets developed for synthesis in "diphones.monet" were developed based on several years of research in which British English speech was analysed for sound data, rhythmic (duration) data, and intonation data. This research is reported in other papers on the site.

If you would like to hear some samples of gnuspeech, go to my university web site:

http://www.cpsc.ucalgary.ca/~hill

click on "Published papers" in the left menu and click on the first paper in section "B. National and international invited contributions ..." and select the first paper in that section (it is the one referred to above).

at the bottom of the left side menu in the resulting page you will find a whole bunch of examples of gnuspeech synthesis. Some short, some long.

The tube resonance model parameters are specified in the source code for the TRM. I attach a sample set of parameters, 24 fixed (utterance-rate) parameters, and 6 blocks of 16 parameters that drive the so-called "speech-rate" parameters. The utterance represented is "oi", lasting about 1.5 seconds (the input control rate is 4 herz and there are 6 blocks).

The 16 speech-rate parameters are, in order:

GlottalPitch, GlottalVolume, AspirationVolume, FricativeVolume, FricativePosition, FricativeCentreFrequency,

FricativeBandWidth, radius1, radius2, radius3, raduius4, radius5, radius6, radius7, radius8, velumRadius

I hope this helps.

--------

David Hill

Simplicity, patience, compassion. These three are your greatest treasures (Tao Te Ching #67)

---------

On Feb 8, 2007, at 9:20 AM, Nickolay V. Shmyrev wrote:

Heh, since it seems it would be hard to build Monet on Linux I've tried

to adopt trm to work with festival. Actually for me it seems it would be

interesting work, since festival predicts intonation and duration much

more precisely and is able to produce very good annotations.

----------

Parameter set for TRM "oi" (1.5 seconds, falling pitch during diphthong)

------------------------------------------------------------------------------------------------

4 ; input control rate (1 - 1000 Hz)

60.0 ; master volume (0 - 60 dB)

1 ; number of sound output channels (1 or 2)

0.0 ; stereo balance (-1 to +1)

0 ; glottal source waveform type (0 = pulse, 1 = sine)

40.0 ; glottal pulse rise time (5 - 50 % of GP period)

22.0 ; glottal pulse fall time minimum (5 - 50 % of GP period)

45.0 ; glottal pulse fall time maximum (5 - 50 % of GP period)

2.50 ; glottal source breathiness (0 - 10 % of GS amplitude)

10.0 ; nominal tube length (10 - 20 cm)

32 ; tube temperature (25 - 40 degrees celsius)

1.00 ; junction loss factor (0 - 5 % of unity gain)

3.05 ; aperture scaling radius (3.05 - 12 cm)

0.75 ; mouth aperture coefficient (0 - 0.99)

0.72 ; nose aperture coefficient (0 - 0.99)

1.35 ; radius of nose section 1 (0 - 3 cm)

1.96 ; radius of nose section 2 (0 - 3 cm)

1.91 ; radius of nose section 3 (0 - 3 cm)

1.3 ; radius of nose section 4 (0 - 3 cm)

0.73 ; radius of nose section 5 (0 - 3 cm)

1500.0 ; throat lowpass frequency cutoff (50 - nyquist Hz)

6.0 ; throat volume (0 - 48 dB)

1 ; pulse modulation of noise (0 = off, 1 = on)

48.0 ; noise crossmix offset (30 - 60 db)

10.0 0.0 0.0 0.0 4.0 4400 600 0.8 0.8 0.4 0.4 1.78 1.78 1.26 0.8 0.0

9.5 54.0 0.0 0.0 4.0 4400 600 0.8 0.8 0.4 0.4 1.78 1.78 1.26 0.8 0.0

9.0 60.0 0.0 0.0 4.0 4400 600 0.8 0.8 0.6 0.6 1.58 1.58 1.13 1.01 0.1

8.5 60.0 0.0 0.0 4.0 4450 550 0.8 0.8 1.28 1.28 1.0 1.0 1.0 0.8 1.0

8.0 54.0 0.0 0.0 4.0 4500 500 0.8 0.8 1.68 1.58 0.8 0.8 0.5 0.4 1.0

7.0 51.0 0.0 0.0 4.0 4500 500 0.8 0.8 1.78 1.78 0.2 0.2 0.4 0.0 1.0

----------

From:	David Hill
Subject:	Re: [gnuspeech-contact] TRM as backend for festival
Date:	Sat, 10 Feb 2007 15:53:07 -0800