flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > Unicode support

Author
Thread Post new topic Reply to topic
Chewy509



Joined: 19 Jun 2003
Posts: 297
Location: Bris-vegas, Australia
Chewy509
While I know FASM supports encoding ASCII as unicode (via the du directive) for application development for system that have applicable APIs for unicode, however AFAIK the original source must still be 8bit ASCII.

Are there any plans to allow source code in UTF-16 format? eg to allow symbols/labels to contain non-ASCII characters, and easier unicode string support from within the source file itself?

eg to be able to support source code like this:
Quote:

org 100h

старт:
mov ax, 9
mov dx, шнур
int 21h
ret

шнур du "여보세요 세계"
db "$"
шнур2 du "Γειάσου κόσμος"
db "$"


Obvisously all directives and operands should remain as they are (in english as defined by Intel/AMD), but would be nice to have true support for userdefined labels and strings.

PS. I know the DOS API doesn't support unicode strings, but just used it for the example.
Post 19 Apr 2005, 01:59
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7802
Location: Kraków, Poland
Tomasz Grysztar
Look here: http://board.flatassembler.net/topic.php?t=307 (and also older thread which inspired it).
I believe something similar might be done for UTF-16.


Last edited by Tomasz Grysztar on 19 Apr 2005, 21:21; edited 1 time in total
Post 19 Apr 2005, 06:49
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
I never quite understood the point in UTF-8/16 source files - obviously it's necessary when localizing strings, but for source code? Programming is international, and labels/data/comments should be in english, really.

Oh well, just my opinion. GoAsm has full unicode support.
Post 19 Apr 2005, 12:50
View user's profile Send private message Visit poster's website Reply with quote
Chewy509



Joined: 19 Jun 2003
Posts: 297
Location: Bris-vegas, Australia
Chewy509
f0dder wrote:
I never quite understood the point in UTF-8/16 source files - obviously it's necessary when localizing strings, but for source code?


Well, from the example, it allow non-english speakers to use labels, have comments, and strings in their native language, and within a single source file have ALL language scripts available.

While, I've implemented a few macros to construct UTF-16 strings from ASCII source, it doesn't help when I want to use some of the mathematical characters provided within the unicode definations. Generally I end up with:
{label} du "Greater or equal to symbol: ",2265h, 0h
even though I would prefer to see:
{label} du "Greater or equal to symbol: ≥",0h
within the source file.

f0dder wrote:
Programming is international, and labels/data/comments should be in english, really.


Even though I agree that most of the world technology bases are english speakers (or languages that use a latinised character script), but one has to consider both India and China, which are upcoming in the technology arena. Why not cater for those non-english speakers?

I'm not saying it has to be done at all, but something that should be added to the TODO list and given some consideration.
Post 19 Apr 2005, 23:49
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Quote:

Why not cater for those non-english speakers?

Partially because of globalization - you have to "work without borders". I would hate being assigned a project with comments or, even worse, variable names, in non-english. People that don't speak english should really learn to, if they want to work with programming. I know this sounds harsh and cynical, but it's the reality of this world.

As for "and within a single source file have ALL language scripts available.", well... when projects get large enough, you need some external tool to manage the localized strings. I wouldn't even use unicode resource strings, as I haven't seen any resource editor that handles strings in a proper way. Also, you need to be able to ship the localization data + an editor to your translators.
Post 20 Apr 2005, 00:09
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
Well, lots of software companies promised us "wellfare comes with unicode utf16", but still it is either badly supported or takes up a lot of space just for the fonts and other language resources.

While we watch software standards getting slowly accustomed, we could start using the full range of utf8 characters meanwhile - I mean char 128-255 have always been supported by fasm, and with proper keyboard codepages, you can even use them Cool like this:
Code:
ÇüéâäåçêëèïîìÄÅÉæÆôöòûùÿÖÜø£Ø׃:;this is a label
    
Post 20 Apr 2005, 07:19
View user's profile Send private message Reply with quote
mike.dld



Joined: 03 Oct 2003
Posts: 235
Location: Belarus, Minsk
mike.dld
Even if I would have a compiler that supports russian characters for identifiers (and actually I CAN declare symbols with russian characters using FASM) I would never use this feature. First, because it's habit alike. Second, you never know who'll read your source, and English is an international language. So IMHO it's not a good idea to support Unicode source files.
Post 20 Apr 2005, 07:27
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
mike.dld wrote:
Second, you never know who'll read your source, and English is an international language. So IMHO it's not a good idea to support Unicode source files.
This is the most obvious reason for me too. You know, I don't use German identifiers neither. (But my senile IT-teacher likes to use them, ' have no idea why Question )
Post 20 Apr 2005, 07:34
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
One of my friends got a job for a company that makes software for the educational sector (exam planning and such). First, they use danish identifiers (including danish chars, æøåÆØÅ). Second, they use smalltalk Neutral . It's pretty damn awful. There's some pretty heavy calculations in there as well, and needless to say smalltalk isn't very fast.

Companies like that should have a thorough beating.
Post 20 Apr 2005, 09:17
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
f0dder wrote:
Companies like that should have a thorough beating.
But it's unlikely that this will happen since all of those users (school mark and bulletin office users)don't care, e.g. would have no practical use of that nor have they any specific idea about the software itself. They are just used to using sloppy, lame software and won't change it ASAP, since this may actually reduce the time they need to work on it and may get less money. Thats the whole said story.

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 20 Apr 2005, 09:39
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Unfortunately, they *are* used to slow and sloppy software - they don't question that it takes 10+ seconds from clicking a button until the resulting dialog shows up. Nor that the computations can take a couple hours when it should take max 15 minutes.

*sigh*
Post 20 Apr 2005, 17:18
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.