flat assembler
Message board for the users of flat assembler.

Index > OS Construction > Unicode/International Languages question

Author
Thread Post new topic Reply to topic
Gomer73



Joined: 29 Nov 2003
Posts: 151
Gomer73 20 Jun 2004, 17:35
I plan to support multiple languages with my OS.

Unfortunately, I have no idea what is required in order to do this. I know almost every text file and html page that I come across in English uses only 8 bit characters.

To support the multiple language, I figured I would go 16-bit characters for everything(I think this is called unicode). But, I am not sure if this will do the job, or if I should use 8 bit characters and code pages (whatever that is).

Any help would be appreciated. I even plan to do language support for my function call names, so it will be pretty engrained into the OS.

...Gomer73
Post 20 Jun 2004, 17:35
View user's profile Send private message Reply with quote
decard



Joined: 11 Sep 2003
Posts: 1092
Location: Poland
decard 20 Jun 2004, 19:14
Maybe you could use UTF-8 char format? it uses different char width: normal (english) characters are encoded as usual, while the other characters (like ą, ś, ń, ź in Polish) are encoded in two or more bytes. This system is good, as it saves place, and makes life a lot easier when working with "standard" characterset. However, things like StrLen routine are more complicated... anyway it is a good system, and you should consider implementing it.
I can't help you much with more detailed explanation, as I was only using routines that deal with UTF-8, and never coding my own. Take a look at Allegro Library http://alleg.sourceforge.net/, it has a rich set of UTF-8 routines (AFAIR they are coded using assembly.

regards,
Mateusz
Post 20 Jun 2004, 19:14
View user's profile Send private message Visit poster's website Reply with quote
Gomer73



Joined: 29 Nov 2003
Posts: 151
Gomer73 21 Jun 2004, 19:47
Thanks, UTF-8 looks like the way to go with my OS.

I can still use 0 for string termination. Also I didn't realize that unicode was greater than 64k, so even two bytes isn't sufficient.

This way I can code my source in any editor and not have to worry. Otherwise it would be kind of a pain trying to save in two byte format.

Takes up a little more space for Asian characters, but oh well.
Post 21 Jun 2004, 19:47
View user's profile Send private message Reply with quote
Juras



Joined: 18 Jun 2003
Posts: 23
Location: Belarus
Juras 11 Nov 2004, 19:41
UTF-8 would be the best choice, I think. It's the application task to support 8-bit code pages when editing/viewing text files (there are more text files that are still encoded in native code pages) . UTF-8 is perfect. However to write a correct FAT12/16/32 driver you'll need in-kernel code pages support in order to encode old dos 8.3 file names with 8-bit code pages, use something like DosCP variable, as I plan to use in my kernel.
Post 11 Nov 2004, 19:41
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.