flat assembler
Message board for the users of flat assembler.

Index > Windows > Speech synthesiser

Author
Thread Post new topic Reply to topic
ManOfSteel



Joined: 02 Feb 2005
Posts: 1154
ManOfSteel 16 Jan 2006, 09:22
Hello,
I would like to program a speech synthesiser (or text-to-speech) using formants. Since this is done by combining simple sine waves to produce complex (natural) sounds, I will begin with that.
So my question is: how can I generate sine waves? I know the PIT only generates square wave (AFAIK) so it is probably not the way to go.
Thank you in advance for any help.

PS: any help concerning speech synthesis (using formants) is also welcome.
Post 16 Jan 2006, 09:22
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 16 Jan 2006, 16:41
Hello,
I've seen many programs actually work under Windows and they are fine. Even Estonian Very Happy T2S has been made. The thought of making full-ASM-blown T2S engine or application has crossed my mind and its on my lifetime to-do list Razz

On to the specifics: The analogue signal is sine wave, but everything concerning digital computing and DSP is square. The only place where you have a hint of sine wave is just before it is sent to the speakers - the output of DAC. Even then it isn't completely sine but square-on-sine.

In speech synthesis it doesn't matter what the signal looks like, but the output you get when you convert it to PCM...analogue signal etc. You need samples of vowels (a,e,i,o,u,...) and consonants. Then you can modify these samples to produce sounds of wished length. With vowels its easy - repeating the same sample and pitching gives the required results. The same way you can act with voiced consonants (f,l,m,n,r,s,z,...) but I don't know the magic behind g,b,d,k,p,t,...

I can only go this far because I can't tell you how the formants works and neither can I tell you how to tell the computer how to pronounce words. In Estonian, you can make a T2S with simply putting samples together with letters. In English you would have to compare letters with its neighbours and choose the proper sound sample for it.
Post 16 Jan 2006, 16:41
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
FrozenKnight



Joined: 24 Jun 2005
Posts: 128
FrozenKnight 16 Jan 2006, 22:24
To do what you are asking requires A lot of knowledge about how we recognize sound. people often think that understanding speach is easy. just think back on how long it took you to learn to speak. then remember that the human mind is the most complex thing we are currently trying to understand.
to give you an example imagen a window shattering, that probably took you a few seconds if you were slow. the most advanced CGI still takes hours to render that effect, and the physics took even longer to get right.

if you are still intrested in understanding human speach either to have your comptuer speak it or understand it. then i suggest you make a study of phonics, not the stuff you learnt in school. but real phonics where you break speach down into every sound the human mouth can posiably produce. after you have compleated that then make a study of how sounds are produced and recorded digitaly. then you may be ready to begin writing such a program.
Post 16 Jan 2006, 22:24
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 17 Jan 2006, 03:24
I don't understand the trouble you're having making sinewaves?
Can't you used the FPU fsin opcode and then manipulate the results on a lot of points. And combining two sinewaves is as easy as adding the corresponding points that make them up together.

Google has a few good pages on the subject
http://www.google.com/search?hl=en&q=synthesis%2C+formants

LAZY WAY TO MAKE TEXT-TO-SPEECH
Linguistically, all you'd technically need to make a working text-to-speech engine, is the digital recorded sound of every phoneme and a parser that takes words breaks them into those phonemes and plays the sounds.

Quote:

Phoneme - The smallest phonetic unit in a language that is capable of conveying a distinction in meaning, as the m of mat and the b of bat in English.


That's the bare bones simplest approach, using a dual dictionary with
[WORD][PronounciationKey] will make sure text like "one" is pronounced wa-un instead of Ohne` (phonetically speaking).
Post 17 Jan 2006, 03:24
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
ManOfSteel



Joined: 02 Feb 2005
Posts: 1154
ManOfSteel 17 Jan 2006, 21:15
Hello.

Madis731,
Quote:
everything concerning digital computing and DSP is square. The only place where you have a hint of sine wave is just before it is sent to the speakers - the output of DAC. Even then it isn't completely sine but square-on-sine.

But then how do programs such as Steinberg WaveLab or Sonic Foundry Sound Forge generate waves: sine, triangle, square, saw, etc?

Quote:
In speech synthesis it doesn't matter what the signal looks like, but the output you get when you convert it to PCM...analogue signal etc.

I am not sure I understand this. When you are synthesising sound, or speech in this case, are you not actually designing the sound, drawing every wave it is composed of, then mixing it all into one, single, more or less near to natural sound. And what exactly is PCM, is it the sound compression format?

Quote:
You need samples of vowels (a,e,i,o,u,...) and consonants. Then you can modify these samples to produce sounds of wished length. With vowels its easy - repeating the same sample and pitching gives the required results. The same way you can act with voiced consonants (f,l,m,n,r,s,z,...) but I don't know the magic behind g,b,d,k,p,t,...

But this would be useful for concatenative synthesis. Unfortunately, this requires recordings for every single diphone in every language the synthesiser will support. It takes too much place (German for example has 2500 ones) and is thus incompatible with my objective of creating a small synthesiser. That is why I chose using formants since, in addition, the naturalness of sound is not one of my first goals.


r22,
Quote:
I don't understand the trouble you're having making sinewaves?
Can't you used the FPU fsin opcode and then manipulate the results on a lot of points. And combining two sinewaves is as easy as adding the corresponding points that make them up together.

Easier said than done. Do you have any example for doing all these?

Quote:
Linguistically, all you'd technically need to make a working text-to-speech engine, is the digital recorded sound of every phoneme

In other words a program using concatenative synthesis and a library of a few thousand sound samples. Not really the kind of stuff I am thinking of, as I already mentioned before.
Post 17 Jan 2006, 21:15
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 514
Location: Czech republic, Slovak republic
Feryno 18 Jan 2006, 06:22
Sample for produce wavs.
Disadvantage - source is for TASM and run under DOS (as well emulators of dos under Linux and win...). It produce WAV file playable under various OS (Linux, Win,...).
a01.com produce wav with 32 Hz for test subwoofer, so change frequency value at begin of source and recompile it.


Description: produce 32 Hz wav for test subwoofer
Download
Filename: wavgener.ZIP
Filesize: 1.37 KB
Downloaded: 305 Time(s)

Post 18 Jan 2006, 06:22
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 18 Jan 2006, 12:08
ManOfSteel wrote:

Quote:
everything concerning digital computing and DSP is square. The only place where you have a hint of sine wave is just before it is sent to the speakers - the output of DAC. Even then it isn't completely sine but square-on-sine.

But then how do programs such as Steinberg WaveLab or Sonic Foundry Sound Forge generate waves: sine, triangle, square, saw, etc?


I meant the output of the speakers. When you mean plotting sinewaves and writing a PCM file (this is the raw *.wav-file; Pulse Code Modulation), its as simple as writing consecutive numbers to st(0) and calling fsin/fcos/fsincos/etc. functions. Then you can convert them to integer, draw them on screen or write them to a 32-bit float supporting sound-file.

And adding sinewaves IS simple. Just take two arrays of floats or integers of sine output and add them together (ADD/FADD instructions)

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 18 Jan 2006, 12:08
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
ManOfSteel



Joined: 02 Feb 2005
Posts: 1154
ManOfSteel 19 Jan 2006, 09:54
Hello.

Feryno,
thank you very much for the code.

Madis731,
I understand, thank you. I will be checking the Intel manuals for more information about the FPU.

Two more things:
1. Do you have any link related to sounds on computers like frequencies, bit rates, etc?
2. Where can I get a list of the Windows API related to sound?
Post 19 Jan 2006, 09:54
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.