flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > [feature requast] Extended ASCII characters compliler "bug"

Author
Thread Post new topic Reply to topic
EladAshkcenazi335



Joined: 25 Jun 2017
Posts: 2
EladAshkcenazi335 26 Jun 2017, 05:22
If you try to use eASCII characters (using the windows's eASCII spaicel keyboard feature (alt+0->255) ) in fasm's text editor and compile it you will end up with the hax value of the ASCII character '?' or with their (UNIOCODE, UTF-8...) hex values if you try this with other text editors .

SO what i'm suggesting is to add a spaicel keyword that tell FASM to translate this specific (UNIOCODE, UTF-8...) characters into eASCII hex valuesonly when compiling. so FASM can handle properly this specific characters.

*eascii stands for extended ascii.


Last edited by EladAshkcenazi335 on 26 Jun 2017, 14:44; edited 3 times in total
Post 26 Jun 2017, 05:22
View user's profile Send private message Reply with quote
EladAshkcenazi335



Joined: 25 Jun 2017
Posts: 2
EladAshkcenazi335 26 Jun 2017, 06:55
This video explalains why this weird bug even exists in the first place

https://www.youtube.com/watch?v=MijmeoH9LT4
https://www.youtube.com/watch?v=qBex3IDaUbU - more specifically explains the bug .

In short , you can't represent real eASCII characters in non eASCII forms like ANSI, UNIOCODE, UTF-8...
Because eASCII characters requires bit 7 to be set which is also the bit that UTF-8 uses to know if charcter is one byte long (non extended ascii) or two bytes long (or even longer) thus eASCII characters cannot represent with their real hex values and FASM is putting wrong values when using eASCII characters (their ANSI,UNICODE,UTF-8... forms ) instead of the real hex values.

I'm basically suggesting to use a spaicel keyword to tell FASM to compile this fake eASCII characters into real eASCII hex values (only when the keyword is used , otherwise to use the defualt behavior).


For Example :

the eASCII character has a (eASCII) hex value of 0xB2 but in (_) encoded text FASM will compile the following as:

mov al,'▓' =


mov al,(UTF-8 ) 0x9396E2 ; because this value is greater than 0xff (maximum value in a byte ), fasm will indicate an error!! (try if you don't belive me ), even though the user probably intended to use the eASCII hex value instead of the UTF-8 one.

mov al,(UTF-16) 0x2593 ; also an error will occur

mov al,(UTF-32) 0x00002593 ;same

etc...


BUT with the keyword FASM will theoretically compile the original 0xB2 value. (mov al,0xB2)

sources:
http://www.theasciicode.com.ar/
http://www.fileformat.info/info/unicode/char/2593/index.htm
Post 26 Jun 2017, 06:55
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 29 Jun 2017, 14:18
fasm is generally encoding-neutral, hence the strings of bytes are copied literally from the source text into output. If you need to convert between different encodings, the right way to do it is with macros, just like the "du" encoding macros that come as includes in fasm for Windows package.
Post 29 Jun 2017, 14:18
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.