flat assembler
Message board for the users of flat assembler.
Index
> Windows > How do I print unicode string using printf Goto page 1, 2 Next |
Author |
|
revolution 02 Jun 2023, 13:08
If by "Unicode" you mean UTF-8 then for Windows you can first convert the UTF-8 encoding to WCHAR (sometimes incorrectly called UTF-16) and then use the *W APIs to print the string.
|
|||
02 Jun 2023, 13:08 |
|
Furs 02 Jun 2023, 13:46
revolution wrote: If by "Unicode" you mean UTF-8 then for Windows you can first convert the UTF-8 encoding to WCHAR (sometimes incorrectly called UTF-16) and then use the *W APIs to print the string. BTW, newer versions of Windows support UTF-8 codepage in ANSI functions. |
|||
02 Jun 2023, 13:46 |
|
moistMaven 02 Jun 2023, 13:54
Quote:
may be because UTF-16 is a character encoding scheme while WCHAR is an datatype.🤷 |
|||
02 Jun 2023, 13:54 |
|
revolution 02 Jun 2023, 13:56
Furs wrote:
The size of a wide character type does not dictate what kind of text encodings a system can process, as conversions are available. (Old conversion code commonly overlook surrogates, however.) The historical circumstances of their adoption does also decide what types of encoding they prefer. |
|||
02 Jun 2023, 13:56 |
|
moistMaven 02 Jun 2023, 16:50
Quote:
Researching on your suggestion I got this code, but running this code just opens and shuts the console immediately without showing any output.: format PE console entry _start section '.data' data readable writeable unicode_string db 0xE2, 0x9C, 0x93, 0x20, 0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x2C, 0x20, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x21, 0x00 ; UTF-8 encoded Unicode string wunicode_string dw 0x2654, 0x0020, 0x0048, 0x0065, 0x006C, 0x006C, 0x006F, 0x002C, 0x0020, 0x0057, 0x006F, 0x0072, 0x006C, 0x0064, 0x0021, 0x0000 ; WCHAR (UTF-16) encoded Unicode string section '.code' code readable executable include 'win32a.inc' _start: push 0 ; dwFlags (must be 0) push 0 ; dwSrcEncoding (automatic detection) push unicode_string ; lpSrcStr (UTF-8 encoded string) push -1 ; cchSrc (determine the length automatically) push 0 ; lpWideCharStr (output buffer) push 0 ; cchWideChar (calculate the required buffer size) ccall [MultiByteToWideChar] mov esi, eax ; esi holds the length of the WCHAR string ; Allocate console and get its handle push 0 ; lpSecurityAttributes (not used, set to 0) push 0x40000000 ; dwDesiredAccess (GENERIC_READ | GENERIC_WRITE) ccall [GetStdHandle] mov ebx, eax ; ebx holds the console handle ; Write the WCHAR string to the console using WriteConsoleW function push 0 ; lpNumberOfCharsWritten (output parameter, not used) push esi ; nNumberOfCharsToWrite (length of WCHAR string) push wunicode_string ; lpBuffer (WCHAR string) push ebx ; hConsoleOutput (console handle) ccall [WriteConsoleW] ; Prompt the user to press a key push 0 ; lpBuffer (input buffer) ccall [GetStdHandle] mov ebx, eax ; ebx holds the standard input handle push ebx ; hConsoleInput (standard input handle) call [FlushConsoleInputBuffer] push 0 ; lpNumberOfEventsRead (output parameter, not used) push 1 ; nNumberOfEventsToRead (number of events to read) push 0 ; lpBuffer (input buffer) push ebx ; hConsoleInput (standard input handle) ccall [ReadConsoleInputW] ; Exit the program xor eax, eax xor ebx, ebx ccall [ExitProcess] section '.idata' import data readable writeable library kernel32, 'kernel32.dll', \ user32, 'user32.dll' import kernel32, \ GetStdHandle, 'GetStdHandle', \ WriteConsoleW, 'WriteConsoleW', \ ExitProcess, 'ExitProcess', \ MultiByteToWideChar, 'MultiByteToWideChar', \ FlushConsoleInputBuffer, 'FlushConsoleInputBuffer',\ ReadConsoleInputW, 'ReadConsoleInputW' |
|||
02 Jun 2023, 16:50 |
|
Flier-Mate 03 Jun 2023, 04:15
You can use "du" instead of "db" to specify the Unicode string, as long as you include "encoding/utf8.inc".
My 32-bit example only work with MessageBoxW (GUI), not working with WriteConsoleW (console) because it shows question marks. I don't know the perfect solution to it. Code: ;format PE console format PE GUI entry start include "win32a.inc" include "encoding\utf8.inc" section ".data" data readable writeable hindi du "हिन्दी",13,10,0 ;_len = $ - hindi section ".code" code executable readable start: ;push -11 ;call [GetStdHandle] ;push 0 ;push 0 ;push _len ;push hindi ;push eax ;call [WriteConsoleW] push 0x40 push hindi push hindi push 0 ;Desktop call [MessageBoxW] push 0 call [ExitProcess] section ".idata" import readable writeable library kernel, "kernel32.dll", \ user, "user32.dll" import kernel, \ GetStdHandle, "GetStdHandle", \ WriteConsoleW, "WriteConsoleW", \ ExitProcess, "ExitProcess" import user, \ MessageBoxW, "MessageBoxW"
|
||||||||||
03 Jun 2023, 04:15 |
|
moistMaven 03 Jun 2023, 07:35
I tried to to do it using wprintf but interestingly all the english unicodes are shown correcly but non english unicode (both utf-8 and 16) show gibrish output in the console:
Code: format PE64 console entry start include './include/win64w.inc' include './include/macro/proc64.inc' include './include/encoding/utf8.inc' ;====================================== section '.data' data readable writeable ;====================================== wunicode_string du 0x0939, 0x093f, 0x0928, 0x094d, 0x0926, 0x0940 ;======================================= section '.code' code readable executable ;======================================= start: mov rax, 0 ccall [wprintf], "%ls", wunicode_string ccall [getchar] ; I added this line to exit the application AFTER the user pressed any key. stdcall [ExitProcess],0 ; Exit the application ;==================================== section '.idata' import data readable ;==================================== library kernel,'kernel32.dll', msvcrt,'msvcrt.dll' import kernel, ExitProcess,'ExitProcess' import msvcrt, printf,'printf', wprintf, 'wprintf', getchar,'_fgetchar' |
|||
03 Jun 2023, 07:35 |
|
revolution 03 Jun 2023, 07:40
To print WCHAR you use the *W APIs.
So, for example, wsprintfW (not wsprintfA). It is in user.dll Code: ;... user_table: MessageBox dd RVA _MessageBoxA wsprintf dd RVA _wsprintfW ; <--- use the W version ;... dd 0 ;... |
|||
03 Jun 2023, 07:40 |
|
moistMaven 03 Jun 2023, 09:30
I feel so dumb right now the problem was with code page used by the console. Now it works even with printf:
Code: format PE64 console entry start include './include/win64a.inc' include './include/macro/proc64.inc' ;====================================== section '.data' data readable writeable ;====================================== hello_newline db "Hello World!",10,0 ;======================================= section '.code' code readable executable ;======================================= start: ccall [SetConsoleOutputCP], 65001 ccall [printf], "%s", "हिन्दी" ccall [getchar] ; I added this line to exit the application AFTER the user pressed any key. stdcall [ExitProcess],0 ; Exit the application ;==================================== section '.idata' import data readable ;==================================== library kernel,'kernel32.dll', msvcrt,'msvcrt.dll' import kernel, ExitProcess,'ExitProcess', SetConsoleOutputCP, 'SetConsoleOutputCP' import msvcrt, printf,'printf', getchar,'_fgetchar' |
|||
03 Jun 2023, 09:30 |
|
Flier-Mate 08 Jun 2023, 10:22
I use the same code that I posted above, but no luck seeing Hindi characters in my Windows 10 console.
I typed "chcp 65001" then run hindi.exe , a program that uses WriteConsoleW API function. Does anybody know why?
|
||||||||||
08 Jun 2023, 10:22 |
|
Picnic 08 Jun 2023, 11:56
It's a bit more confusing on the console.
You have to find and install a monospace font that supports the language you want, then select it as the default console font, but modify the registry also. My toy interpreter supports wide characters, below I am using the Deja Vu Sans Mono to display Greek and Arabic. Code: screen 80,25,300 color 0,7 cls print "Good morning my friend" print "Καλημέρα φίλε μου" print REVERSE("صباح الخير يا صديقي") end |
|||
08 Jun 2023, 11:56 |
|
Flier-Mate 08 Jun 2023, 13:33
Thanks Picnic, it is good that Deja Vu Sans Mono supports Arabic (not Hindi). I tried Lucida Console as suggested on the Internet, but no luck also.
I am facing the same issue.... Quote: The Windows Console only allows you to select fixed pitch fonts. All the Hindi fonts that I have found are variable pitch. |
|||
08 Jun 2023, 13:33 |
|
Picnic 08 Jun 2023, 13:49
Υou're welcome. I did a fast search for Hindi console font before I post the message, but I had no luck either.
|
|||
08 Jun 2023, 13:49 |
|
bitRAKE 08 Jun 2023, 23:24
Flier-Mate wrote: I typed "chcp 65001" then run hindi.exe , a program that uses WriteConsoleW API function. My Win11 system is configured for UTF-8 system wide, it's as simple as sending a byte string to console with WriteConsoleA. I installed no special fonts. I can use UTF-8 in GUI code with no special handling - it finally works how it should have 10+ years ago, imho. (If I am understanding that correctly, I've messed up the Arabic? Or, right-left languages require configuring the console differently.) Edit: the GUI looks a lot better. I think the console is lacking a lot of language features - looks like it isn't combining the characters correctly. Playing with the Hebrew, the console switched to right-left automatically. I'm missing something to get it to work correctly with those languages.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||||||||||||||||||
08 Jun 2023, 23:24 |
|
Flier-Mate 09 Jun 2023, 03:46
Interesting demonstration, bitRAKE.
I test with WriteConsoleA and "db" string, the output is the same as using WriteConsoleW and "du" string. Still showing question marks on console. It is nice you got it working in your Windows 11. Maybe what Furs said is referring to Windows 11 only (not Windows 10)? Furs wrote:
GUI is a lot easier to output Unicode string than the console. |
|||
09 Jun 2023, 03:46 |
|
revolution 09 Jun 2023, 03:48
Win7 supported CP65001 perfectly fine. I'm not sure about ANSI though, perhaps that is new.
|
|||
09 Jun 2023, 03:48 |
|
Furs 09 Jun 2023, 13:39
Flier-Mate wrote: Interesting demonstration, bitRAKE. ANSI codepages / Wide versions are a way to encode the glyphs as text. It's just data. But just because you encode them, doesn't mean they will render. Try to copy them, for example, and paste them into a browser, you'll likely see the proper characters (unless it copies question marks as a "hack" which would be dumb imo), most browser fonts tend to have support for more glyphs due to web pages. Rendering such glyphs/characters requires the font to support it. It's likely your console font does not, especially if your Windows version is not Hindi in the first place. A font supporting all glyphs would be huge, maybe gigabytes. UTF-8 is not a rendering thing, it's a way to encode those glyphs. Actually rendering them is up to the font. |
|||
09 Jun 2023, 13:39 |
|
Flier-Mate 10 Jun 2023, 05:42
Thanks Furs for the helpful explanation.
Quote: UTF-8 is not a rendering thing, it's a way to encode those glyphs. Actually rendering them is up to the font. Maybe Windows 11 has the answer to it, both encoding and font rendering. I asked someone to test my hindi.exe on their Windows 11, but again their web browser blocked the download because of suspected trojan. (Sigh) |
|||
10 Jun 2023, 05:42 |
|
bitRAKE 10 Jun 2023, 07:45
Flier-Mate wrote: I asked someone to test my hindi.exe on their Windows 11, but again their web browser blocked the download because of suspected trojan. (Sigh) The following test program works well on Win11, and if GPT-4 knows anything, it should work well there - on Win10. (To the extent that the font your console is configured with supports Hindi.)
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||||||||||
10 Jun 2023, 07:45 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.