flat assembler
Message board for the users of flat assembler.
Index
> Main > Unicode in FASM Goto page Previous 1, 2 |
Author |
|
FlierMate 19 Feb 2021, 14:17
revolution wrote:
Thanks for the brilliant solution, now it can compiles, but the output Unicode chars are wrong... As advised by you, I put a colon followed by semicolon Code: :; format PE GUI 4.0 Without this first line label, FASM would complain: Quote: flat assembler version 1.73.27 (1089705 kilobytes memory) Please see the screenshot, showing the Unicode chars in Msgbox are not readable, differs from the words I typed in the source file (saved as UTF-8, yes). However, using my previous approach, i.e. type Unicode code point hex values, then save the .ASM file as ANSI, resulting in readable Unicode chars in MsgBox.
|
||||||||||
19 Feb 2021, 14:17 |
|
revolution 19 Feb 2021, 15:36
If you save as UTF-8 encoding then you won't be able to place wide characters with du. The encodings are different.
fasm sees a string of bytes Code: IS64 du 0x36,0x34,0xf3,0xc6,0xa3,... ; UTF-8 encoding Then the du converts it to this: Code: 0x36,0x00,0x34,0x00,0xf3,0x00,0xc6,0x00,0xa3,0x00 ; fasm just inserts null bytes What you want to output is wide characters directly: Code: IS64 dw '6','4',0x34a2,... ; manually entered wide character |
|||
19 Feb 2021, 15:36 |
|
FlierMate 19 Feb 2021, 15:56
So it was what I have done initially.
The following... Code: IS64 du '64',0x4F4D,0x5143,' Windows',0 IS32 du '32',0x4F4D,0x5143,' Windows',0 ...are working exactly the same as... Code: IS64 dw '6','4',0x4F4D,0x5143,' ','W','i','n','d','o','w','s',0 IS32 dw '3','2',0x4F4D,0x5143,' ','W','i','n','d','o','w','s',0 Hmmm... so it means I still cannot type CJK chars directly in the source file, as FASM would give error: Quote: wow64chs.asm [11]: |
|||
19 Feb 2021, 15:56 |
|
revolution 19 Feb 2021, 16:15
Your CJK characters are UTF-8 encoded so fasm sees all the bytes of the encoding inside the single quotes:
Code: '<UTF-8 bytes in here>' And even if they are 2 bytes long it is still stored wrong because fasm doesn't reverse the bytes of the string characters, so the big-endian encoding of UTF-8 is read in as little-endian by the CPU as a wide character. So there is no simple way around this, you have to manually encode. |
|||
19 Feb 2021, 16:15 |
|
Tomasz Grysztar 19 Feb 2021, 16:25
revolution wrote: If you save as UTF-8 encoding then you won't be able to place wide characters with du. The encodings are different. Code: include 'encoding/utf8.inc' |
|||
19 Feb 2021, 16:25 |
|
FlierMate 19 Feb 2021, 17:45
Sincere thanks to revolution and Tomasz.
By including the UTF8.inc it works wonder. Now I can include CJK chars directly inside the Assembly code. Code: :; format PE GUI 4.0 entry start include '\fasm\include\encoding\utf8.inc' include '\fasm\include\win32w.inc' section '.data' data readable writeable a rb MAX_PATH IS64 du '64位元 Windows',0 IS32 du '32位元 Windows',0 ..... ..... This solved my problem. |
|||
19 Feb 2021, 17:45 |
|
Roman 13 Jan 2024, 17:19
Not help me for Russian symbols unicode
UTF-8 notepad windows 10 store russian symbols(two bytes) 0xD090 = A English symbols store one byte. For me fine 0x0410 =A russian |
|||
13 Jan 2024, 17:19 |
|
macomics 13 Jan 2024, 18:43
It must be saved in UTF-16 LE so that Windows functions with W prefix work normally with this text. Or convert strings to the desired format using the MultiByteToWideChar function
|
|||
13 Jan 2024, 18:43 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.