flat assembler
Message board for the users of flat assembler.
Index
> Windows > Speech synthesiser |
Author |
|
Madis731 16 Jan 2006, 16:41
Hello,
I've seen many programs actually work under Windows and they are fine. Even Estonian T2S has been made. The thought of making full-ASM-blown T2S engine or application has crossed my mind and its on my lifetime to-do list On to the specifics: The analogue signal is sine wave, but everything concerning digital computing and DSP is square. The only place where you have a hint of sine wave is just before it is sent to the speakers - the output of DAC. Even then it isn't completely sine but square-on-sine. In speech synthesis it doesn't matter what the signal looks like, but the output you get when you convert it to PCM...analogue signal etc. You need samples of vowels (a,e,i,o,u,...) and consonants. Then you can modify these samples to produce sounds of wished length. With vowels its easy - repeating the same sample and pitching gives the required results. The same way you can act with voiced consonants (f,l,m,n,r,s,z,...) but I don't know the magic behind g,b,d,k,p,t,... I can only go this far because I can't tell you how the formants works and neither can I tell you how to tell the computer how to pronounce words. In Estonian, you can make a T2S with simply putting samples together with letters. In English you would have to compare letters with its neighbours and choose the proper sound sample for it. |
|||
16 Jan 2006, 16:41 |
|
FrozenKnight 16 Jan 2006, 22:24
To do what you are asking requires A lot of knowledge about how we recognize sound. people often think that understanding speach is easy. just think back on how long it took you to learn to speak. then remember that the human mind is the most complex thing we are currently trying to understand.
to give you an example imagen a window shattering, that probably took you a few seconds if you were slow. the most advanced CGI still takes hours to render that effect, and the physics took even longer to get right. if you are still intrested in understanding human speach either to have your comptuer speak it or understand it. then i suggest you make a study of phonics, not the stuff you learnt in school. but real phonics where you break speach down into every sound the human mouth can posiably produce. after you have compleated that then make a study of how sounds are produced and recorded digitaly. then you may be ready to begin writing such a program. |
|||
16 Jan 2006, 22:24 |
|
r22 17 Jan 2006, 03:24
I don't understand the trouble you're having making sinewaves?
Can't you used the FPU fsin opcode and then manipulate the results on a lot of points. And combining two sinewaves is as easy as adding the corresponding points that make them up together. Google has a few good pages on the subject http://www.google.com/search?hl=en&q=synthesis%2C+formants LAZY WAY TO MAKE TEXT-TO-SPEECH Linguistically, all you'd technically need to make a working text-to-speech engine, is the digital recorded sound of every phoneme and a parser that takes words breaks them into those phonemes and plays the sounds. Quote:
That's the bare bones simplest approach, using a dual dictionary with [WORD][PronounciationKey] will make sure text like "one" is pronounced wa-un instead of Ohne` (phonetically speaking). |
|||
17 Jan 2006, 03:24 |
|
ManOfSteel 17 Jan 2006, 21:15
Hello.
Madis731, Quote: everything concerning digital computing and DSP is square. The only place where you have a hint of sine wave is just before it is sent to the speakers - the output of DAC. Even then it isn't completely sine but square-on-sine. But then how do programs such as Steinberg WaveLab or Sonic Foundry Sound Forge generate waves: sine, triangle, square, saw, etc? Quote: In speech synthesis it doesn't matter what the signal looks like, but the output you get when you convert it to PCM...analogue signal etc. I am not sure I understand this. When you are synthesising sound, or speech in this case, are you not actually designing the sound, drawing every wave it is composed of, then mixing it all into one, single, more or less near to natural sound. And what exactly is PCM, is it the sound compression format? Quote: You need samples of vowels (a,e,i,o,u,...) and consonants. Then you can modify these samples to produce sounds of wished length. With vowels its easy - repeating the same sample and pitching gives the required results. The same way you can act with voiced consonants (f,l,m,n,r,s,z,...) but I don't know the magic behind g,b,d,k,p,t,... But this would be useful for concatenative synthesis. Unfortunately, this requires recordings for every single diphone in every language the synthesiser will support. It takes too much place (German for example has 2500 ones) and is thus incompatible with my objective of creating a small synthesiser. That is why I chose using formants since, in addition, the naturalness of sound is not one of my first goals. r22, Quote: I don't understand the trouble you're having making sinewaves? Easier said than done. Do you have any example for doing all these? Quote: Linguistically, all you'd technically need to make a working text-to-speech engine, is the digital recorded sound of every phoneme In other words a program using concatenative synthesis and a library of a few thousand sound samples. Not really the kind of stuff I am thinking of, as I already mentioned before. |
|||
17 Jan 2006, 21:15 |
|
Feryno 18 Jan 2006, 06:22
Sample for produce wavs.
Disadvantage - source is for TASM and run under DOS (as well emulators of dos under Linux and win...). It produce WAV file playable under various OS (Linux, Win,...). a01.com produce wav with 32 Hz for test subwoofer, so change frequency value at begin of source and recompile it.
|
|||||||||||
18 Jan 2006, 06:22 |
|
Madis731 18 Jan 2006, 12:08
ManOfSteel wrote:
I meant the output of the speakers. When you mean plotting sinewaves and writing a PCM file (this is the raw *.wav-file; Pulse Code Modulation), its as simple as writing consecutive numbers to st(0) and calling fsin/fcos/fsincos/etc. functions. Then you can convert them to integer, draw them on screen or write them to a 32-bit float supporting sound-file. And adding sinewaves IS simple. Just take two arrays of floats or integers of sine output and add them together (ADD/FADD instructions) |
|||
18 Jan 2006, 12:08 |
|
ManOfSteel 19 Jan 2006, 09:54
Hello.
Feryno, thank you very much for the code. Madis731, I understand, thank you. I will be checking the Intel manuals for more information about the FPU. Two more things: 1. Do you have any link related to sounds on computers like frequencies, bit rates, etc? 2. Where can I get a list of the Windows API related to sound? |
|||
19 Jan 2006, 09:54 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.