flat assembler
Message board for the users of flat assembler.
Index
> Main > Unicode in FASM Goto page 1, 2 Next |
Author |
|
kohlrak 28 Jul 2007, 10:51
This topic is split from Things you hate most about FASM
Except for something a little more friendly towards windows resources than manual typing (which really isn't tomasz' duty), my problems with fasm are people related. Also, the lack of handling unicode directly (having to make seperate files and import them using the file data directive) is annoying (but tomasz always did include the source so we could all fix things like that ourselves)... But outside of those two problems i can't argue. I heard talk of people wanting a plugin system for fasmw? We really don't need that. |
|||
28 Jul 2007, 10:51 |
|
LocoDelAssembly 29 Jul 2007, 04:54
Perhaps the problem is that FASMW (and maybe FASM itself) handle 8-bit chars source code only?
Note that du is still useful even with this 8-bit chars at source code limitation, for example I can write "myName du 'Hernán'" and the á will be correctly displayed in Russian or Japanese systems (unless I did understand the purpose of unicode wrong ). |
|||
29 Jul 2007, 04:54 |
|
kohlrak 29 Jul 2007, 05:56
LocoDelAssembly wrote: Perhaps the problem is that FASMW (and maybe FASM itself) handle 8-bit chars source code only? Yea, but the last time i tried using windows functions, there is only A and W, not 8... Unless fasm turns the 8bit uni into regular uni... |
|||
29 Jul 2007, 05:56 |
|
LocoDelAssembly 29 Jul 2007, 13:51
Quote:
Well, du in my example gets translated to the equivalent of "db 'H', 0, 'e', 0, 'r', 0, 'n', 0, 'á', 0, 'n', 0". What I'm not sure however is if the translation is done properly. I saved in notepad a unicode TXT with my name and except for the starting FF FE it had the same content as the binary produced by fasm but, it will be properly translated when assembled on other systems? |
|||
29 Jul 2007, 13:51 |
|
f0dder 29 Jul 2007, 18:07
LocoDelAssembly wrote:
That will depend on the locale of the target system... |
|||
29 Jul 2007, 18:07 |
|
LocoDelAssembly 29 Jul 2007, 18:15
So I suppose that my assumption above about that "Hernán" will be correctly displayed on Russian and Japanese systems is also wrong?
|
|||
29 Jul 2007, 18:15 |
|
kohlrak 29 Jul 2007, 23:16
LocoDelAssembly wrote: So I suppose that my assumption above about that "Hernán" will be correctly displayed on Russian and Japanese systems is also wrong? One way to find out. i changed my ansi settings to japanese. =p Quote: As for unicode, you have to include proper file from 'include/encoding'. This way is little weird but allows you to have sources in multiple encodings (UTF8, win1250, etc...). This suits FASM's no-command-line design well. Well, then the problems remains a problem for fasmw, but fasmw isn't really that important for other systems which don't have it. Though, can you provide an example source for this. I don't completely understand. |
|||
29 Jul 2007, 23:16 |
|
peter 29 Jul 2007, 23:23
LocoDelAssembly: No, you should use explicit Unicode code:
db 'H', 0, 'e', 0, 'r', 0, 'n', 0, 0xE1, 0, 'n', 0 if you want the source code to be compilable under Russian version of Windows. |
|||
29 Jul 2007, 23:23 |
|
LocoDelAssembly 29 Jul 2007, 23:47
ah, but "0, $E1" is the correct sequence for 'á' even on Russian systems? That is the reason why I am not sure about how safe it is because the 8-bit representation is $E1 also and if in a Russian system $E1 is used for something else but fasm still uses the "0, $E1" code then I see it somewhat unsafe.
Example Code: du 'This is a Russian char: ', {some Russian char}, 0 Since FASMW will still be 8-bit, can the Russian char get translated into "0, $E1" even though it obviously wasn't 'á'? vid told about the encodings includes, so I assume that when the char is prefixed with 0 it means "current locale"? Perhaps my problem is that I'm thinking that Unicode sequences are of universal representation while it isn't the truth? I'm starting to believe that this is the problem, specially because of the different encodings |
|||
29 Jul 2007, 23:47 |
|
kohlrak 29 Jul 2007, 23:49
W (unicode) is supposed to be universal while A isn't...
|
|||
29 Jul 2007, 23:49 |
|
vid 30 Jul 2007, 09:13
those macros are proper answer. If your text editor can create files in UTF8 encoding (best option), then include "utf8.inc". If your editor uses 8bit character set with your locale settings (like FASMW), then you should just include proper file for your encoding.
For example, i create following file under WIN1250 (central european) character set: Code: include "encoding\win1250.inc" du "pi a" it uses character "č" (c with caron). In Win1250, this character has code 0xE8. In Unicode, it is U+010D. The source file is encoded as Win1250 text, so it contains just single 0xE8 byte in place of that character. On system with different encoding, text editor will display this source wrongly. For example in win1251 (cyrillic set), 0xE8 will display as "и". But unlike displaying, compiling this source would work on any locale settings. It will always compile to unicde character 0x10D in "du" string. To get rid of all problems, simply use some editor that supports UTF-8. It is universal. |
|||
30 Jul 2007, 09:13 |
|
LocoDelAssembly 30 Jul 2007, 15:05
Perfect explanation, I have no doubts this time
Thank you very much vid and the others that tried to make me understand this |
|||
30 Jul 2007, 15:05 |
|
kohlrak 31 Jul 2007, 00:15
vid wrote: those macros are proper answer. If your text editor can create files in UTF8 encoding (best option), then include "utf8.inc". If your editor uses 8bit character set with your locale settings (like FASMW), then you should just include proper file for your encoding. I'm still not sure, but i think i get it. So, let's say i use WIN1257.INC, and i put text in from a win1257 system. It would convert all du strings from the text of that system to unicode? If so, if i use the UTF-8 file, it'll convert all the UTF into unicode? If that's the case, it still leaves a problem when trying to use something other than ansi for resource files, or does it work on that as well? |
|||
31 Jul 2007, 00:15 |
|
peter 31 Jul 2007, 00:27
Quote:
Yes, U+00E1 character is 'á' on every system. Unicode is one universal encoding, not a set of different encodings. Quote:
No, U+0080..U+0100 characters are always Latin-1 block. Russian (more precisely, Cyrillic) chars are always U+0400..U+0500. See http://www.unicode.org/charts/ for the list of blocks and characters in it. For example, my name in Russian will be: db 0x1F, 0x04, 0x51, 0x04, 0x42, 0x04, 0x40, 0x04 Of course, UTF-8 source files are better than these "magic" numbers, but your text editor and compiler should support them. When programming in C, I often use explicit codes for portability, because many C compilers still don't support UTF-8. |
|||
31 Jul 2007, 00:27 |
|
vid 31 Jul 2007, 00:28
Quote: I'm still not sure, but i think i get it. So, let's say i use WIN1257.INC, and i put text in from a win1257 system. It would convert all du strings from the text of that system to unicode? You should create source file on win1257 system, and include win1257.inc. Then, anyone who tries to compile that file, even on system with different locale, with will get same result as you. But opening source in some editor would display it wrongly, because editors usually use current system default encoding. Viewing file encoded in win1250 as file encoded in win1252 doesn't work well. Trying to re-save file in such case would probably corrupt some characters. Quote: If so, if i use the UTF-8 file, it'll convert all the UTF into unicode? If you encode your source file in UTF-8 encoding, and include "utf8.inc", then "du" will of course produce proper UTF-16 unicode string. Quote: If that's the case, it still leaves a problem when trying to use something other than ansi for resource files, or does it work on that as well? |
|||
31 Jul 2007, 00:28 |
|
kohlrak 31 Jul 2007, 00:31
Alright, that makes sence, but then the problem *may* remain to be one with the resource files (dialog box and menues for example) as hexing them in after making the exe is rather annoying.
|
|||
31 Jul 2007, 00:31 |
|
f0dder 31 Jul 2007, 00:34
If you use some _external_ resource format, rc.exe and a linker, then unicode resources will be just fine and work as expected. If you use FASM's PE output and resource macros, then... *shrug*.
Imho once you start doing unicode, you shouldn't be hardcoding strings in your program anymore... either use resources + loadstring, or do your own support code. Keeping localizable strings in external files pays off in the end, trust me. |
|||
31 Jul 2007, 00:34 |
|
kohlrak 31 Jul 2007, 00:40
Quote: If you use FASM's PE output and resource macros, then... *shrug*. That's where the problem comes in... Quote: Imho once you start doing unicode, you shouldn't be hardcoding strings in your program anymore... either use resources + loadstring, or do your own support code. Keeping localizable strings in external files pays off in the end, trust me. If it's a small program, you should just hardcode it. No point in including files with an already incredibly small program. Especially if it was only intended for 1 target language. |
|||
31 Jul 2007, 00:40 |
|
FlierMate 19 Feb 2021, 07:39
Thanks to this thread, I am able to hardcode Unicode chars into my Assembly code, though it is a bit tricky, I have to find out the Unicode code point in hexadecimal.
Code: IS64 du '64',0x4F4D,0x5143,' Windows',0 IS32 du '32',0x4F4D,0x5143,' Windows',0 And I would make sure I use Wide char/string of Win32 API Code: import user32,\ MessageBox,'MessageBoxW' Before I invoke: Code: invoke MessageBox, 0, IS32, '', MB_OK I found that I cannot type the Unicode chars directly into source file and save as UTF-8 or Unicode. FASM can only open .ASM file saved as ANSI mode. |
|||
19 Feb 2021, 07:39 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.