Re: [gnuspeech-contact] TRM as backend for festival

The "oi" sound is just a succession of vowel sounds with a varying pitch, so a series of what appear to be .trm values will work. To produce speech, you need to be able to construct a more complex set of varying parameters reflecting the reality of speech. This is what Monet does. This is the part of Monet that needs to be extracted if all you wish to do is convert sound specifications to a speech waveform specification. The current Monet does much more since it allows you to create the databases as well as listen to the speech that can then be produced. The extracted part (non-interactive) that would simply use the databases to convert streams of posture symbols to an output waveform is what we call "Real-time Monet". It has not been ported from the original NeXT implementation yet.

david

On Feb 11, 2007, at 1:06 PM, Nickolay V. Shmyrev wrote:

В Сбт, 10/02/2007 в 15:53 -0800, David Hill пишет:

I have tried accessing the samples you provided. Only one of them
loaded and played. It did not sound anything like speech. The TRM is
simply the waveguide model of an acoustic tube, with control regions
applied according to the Distinctive Region Model developed by Carré,
based on earlier work by Fant. The underlying theory is outlined in
the paper "Real-time articulatory speech-synthesis-by-rules" on my
university web site and referenced from the gnuspeech project site
(see below for the university web site URL). Manuals for
"Synthesiser" and "Monet" also appear on that web site, towards the
end of section E of the published papers page. In the Monet manual
there is a table showing the equivalences between IPS symbols and the
Monet symbols. This should allow you to translate into the Festival
set.

Ok, thanks, I'll do

Monet is an interactive tool for developing data sets for arbitrary
languages. Real-time Monet (which has not yet been ported) is the
heart of a daemon that uses these data sets to convert text to speech.
It is a stripped down version of Monet and it would be really nice if
someone would take on that task (please ;-). Without the data sets,
and the algorithms for manipulating the parameters tracks, you don't
have a speech synthesiser, you have a rather specialised trumpet!

Well, I can do that. I just need more explanation. Is it something Steve
splitted in Framework dir? Currently Monet compiles file, only gorm
files are missing. I don't think sound is required btw, it's enough to
be able to save audio file.

The data sets developed for synthesis in "diphones.monet" were
developed based on several years of research in which British English
speech was analysed for sound data, rhythmic (duration) data, and
intonation data. This research is reported in other papers on the
site.

Btw, have you heart about MOSHA database?
http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html
It seems that Alan already used it in unit-selection synthesis. Although
it's not free I suppose, that's why this work isn't available still. If
it will be possible to generate set of prompts (around 1000 will be
enough I suppose) with Monet and later process coefficients with
unit-selection that would be interesting thing I suppose.

If you would like to hear some samples of gnuspeech, go to my
university web site:

Yeah, I've downloaded them, but the problem is that I can reproduce
vowels, like in example "oi" you've sent. But I have no idea how Monet
reproduces consonants. There are examples, but no trm files for them.
And the examples I have (for instance the one Steve kindly sent to me),
they sound like trumpet as you've noticed :) That's why I suspect there
is a bug in trm that makes consonants generation impossible.

From:	David Hill
Subject:	Re: [gnuspeech-contact] TRM as backend for festival
Date:	Sun, 11 Feb 2007 16:31:47 -0800