flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > unicode.

Author
Thread Post new topic Reply to topic
b1528932



Joined: 21 May 2010
Posts: 287
b1528932
Why fasm doesnt support utf8/utf16?
Isnt it a standard for a couple of years?
Post 03 Feb 2011, 20:35
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler
Because maintaining assembly is hell, and updating it's even worse. Razz
Post 03 Feb 2011, 20:39
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
b1528932 wrote:
Why fasm doesnt support utf8...?
It does.


Description: Showing support for UTF8
Download
Filename: fasmSupportsUTF8.asm
Filesize: 337 Bytes
Downloaded: 340 Time(s)

Post 03 Feb 2011, 20:49
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
See here for previous discussions.
Post 03 Feb 2011, 20:56
View user's profile Send private message Visit poster's website Reply with quote
b1528932



Joined: 21 May 2010
Posts: 287
b1528932
This example doesnt work, propably fasm doesnt like BOM, wich is a part of unicode standard.

This expose that fasm is badly written, i would make any text io based program with abstract functions like ReadChar or ReadLine, wich would support only utf8 or maybe utf16. utf8 is better choice since utf16 is only native to windows for some strange reason.
Post 03 Feb 2011, 21:05
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
The file I uploaded doesn't have a BOM. And it assembles fine on my system.
Post 03 Feb 2011, 21:07
View user's profile Send private message Visit poster's website Reply with quote
b1528932



Joined: 21 May 2010
Posts: 287
b1528932
it accept any byte != 0x00.
I might even feed it with illegal utf8 characters. Also i can put unfinished utf8 char, and it suck it. 0xFC as the name of label works, when it should be error.

I bet it use 1 byte = 1 character engine. I wouldnt call it support of unicode.
Post 03 Feb 2011, 21:15
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
If you take a little time to read the thread I linked you will see the trick to dealing with the BOM also.

Start your file with a single line
Code:
zxcvbnm = 0    
And it looks like this to fasm:
Code:
<hidden BOM>zxcvbnm = 0    
Post 03 Feb 2011, 21:18
View user's profile Send private message Visit poster's website Reply with quote
b1528932



Joined: 21 May 2010
Posts: 287
b1528932
thats called hax, not support.
Post 03 Feb 2011, 21:19
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
b1528932 wrote:
I wouldnt call it support of unicode.
No one said it does support unicode. But UTF8 can be used without too much problem. It is your problem if you want to put in illegals UTF8 sequences, perhaps your editor has to be looked at if it is generating bad text files.
Post 03 Feb 2011, 21:21
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7802
Location: Kraków, Poland
Tomasz Grysztar
fasm itself works on byte-based text encoding, and it doesn't care what actual encoding you use for your regional characters. This is because of the SSSO principle - the exactly same file is always interpreted by fasm in the same way. And it cannot treat BOM in any special way, because it is a sequence of bytes just like any other - you can have label of such name at the very beginning of your file and if fasm ignored these characters and defined some different label (or displayed an error), it would be against its principles.
And also - what is invalid sequence of bytes in UTF-8 may still be perfectly valid sequence in some other encoding. And fasm has to handle them both.
Post 03 Feb 2011, 21:22
View user's profile Send private message Visit poster's website Reply with quote
b1528932



Joined: 21 May 2010
Posts: 287
b1528932
Ok, i see the point of this. Fasm does support 'original' instruction names, like cpu manufacturer called and documented them, right?
Fasm support not only x86 line, right? imagine that new popular cpu emerges, and you would like to support its instructions. Manual is written in korean only, as are instructions documentes. Or even china would by amd and introduce sse7 with only chinese names. You would then rewrite entire engine to support it. Based on those assumption that fasm support original instruction names, lack of real unicode parser is a bug.

Everything taken from user, wich is character/name based, must be processed as unicode.





And seriously, considering what english-speaking countries are doing, i would prepare to support asian and arab languages asap.
Post 03 Feb 2011, 22:03
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler
Mnemonics just arbitrary memory aids. What's to say Tomasz can't just give them English like names? There's enough English speaking asm programmers that there would be an unofficial English version of the Chinese mnemonics that he could use.
Post 04 Feb 2011, 03:08
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.