flat assembler
Message board for the users of flat assembler.

Index > Windows > Unicode in the console.

Author
Thread Post new topic Reply to topic
wyvern



Joined: 08 Dec 2011
Posts: 27
wyvern 05 Jan 2012, 18:22
I have found some info about this topic in blogs, asm forums and etc. However i cant make it work. Here is mi code:

Code:
=5    ;for UTF-8
FORMAT PE console 
ENTRY  start

INCLUDE 'C:\fasm\include\win32a.inc'

SECTION '.text' code readable executable
start:
       PUSH   msg                                          
       CALL    lstrlenW          ;get string len, this is crashing (new problem)...

        PUSH    eax                  ;string len
        PUSH    msg                 ;message
        CALL    printW              ;print in console
             
        PUSH    NULL               
        CALL    ExitProcess    ;exit

;prints a wide string
printW:
        PUSH    ebp
        MOV     ebp, esp     
        SUB     esp, 4         

        PUSH  869              ;code page for greek... tryed with UTF-8(65001)
        CALL   SetConsoleOutputCP
    
        PUSH    STD_OUTPUT_HANDLE    
        CALL    GetStdHandle  
        
        LEA     edx, [ebp - 4]            
        PUSH    NULL                            
        PUSH    edx                               ;chars written
        PUSH    dword [ebp + 12]          ;message len
        PUSH    dword [ebp + 8]            ;message
        PUSH    eax                               ;stdout handle
        CALL    WriteConsoleW       
  
        MOV     esp, ebp 
        POP      ebp     
        RET                           

SECTION '.data' data readable writeable
        msg du '-B> A>>1I5=85', 00h              ;testing with greek string

SECTION '.idata' import data readable writeable
        library kernel, 'KERNEL32.DLL'   

        import kernel,\
                lstrlenW, 'lstrlenW',\
                SetConsoleOutputCP, 'SetConsoleOutputCP',\
                GetStdHandle,  'GetStdHandle',\
                WriteConsoleW, 'WriteConsoleW',\
                ExitProcess,   'ExitProcess'
    

Tested in win 7 (x64) and XP.

I dont know exactly why lstrlenW is crashing, but besides that im using a TrueType font "Lucida Console" wich is supposed to support unicode. I tryed with other langs like japanese, etc. and other code pages but the only thing i get are empty blocks, even a copy of the console output doesnt convert well when pasting in any unicode editor.

Had somebody successfully tryed to write unicode strings to the cmd console?

Edit: the greek string is not correct when pasted here.

_________________
Thanks
Post 05 Jan 2012, 18:22
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1640
Location: Toronto, Canada
AsmGuru62 05 Jan 2012, 21:27
Why not using invoke macro?
Because, CALL is a jump to a direct address and you need to use
the indirect address for Win API -- which is in Import Table(s).

Try CALL [lstrlenW].
Or even better:

invoke lstrlenW, msg
Post 05 Jan 2012, 21:27
View user's profile Send private message Send e-mail Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1392
Location: Piraeus, Greece
Picnic 05 Jan 2012, 21:59
@wyvern use brackets

call [lstrlenW]
call [ExitProcess]
call [SetConsoleOutputCP]
call [GetStdHandle]
call [WriteConsoleW]


Here is a quick demo that displays Greek text in console.
Code:
format pe console

    include 'include\encoding\win1253.inc'
    include 'include\win32wx.inc'

.data
        temp dd ?
        text du '                       ',0       ; write some Greek text

.code
main:
        invoke WriteConsole,\
            <invoke GetStdHandle,STD_OUTPUT_HANDLE>,\
            text,\
            <invoke lstrlen, text>,\
            temp,\
            0
        ret

.end main
    


Image


Here is a set of unicode routines for the console.


Last edited by Picnic on 30 Mar 2015, 20:19; edited 2 times in total
Post 05 Jan 2012, 21:59
View user's profile Send private message Visit poster's website Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 05 Jan 2012, 23:48
first of all, calling your functions should be
Code:
call [ function ]
    


not
Code:
call function
    


try this
Code:
;=5    ;for UTF-8
FORMAT PE console  
ENTRY  start 

INCLUDE 'C:\fasm\include\win32a.inc' 

SECTION '.text' code readable executable 
start: 
       PUSH   msg                                           
       CALL    [lstrlenW]          ;get string len, this is crashing (new problem)...


        PUSH    eax                  ;string len
        PUSH    msg                 ;message
        CALL    printW              ;print in console



        PUSH    NULL
        CALL    [ExitProcess]    ;exit

;prints a wide string 
printW: 
        PUSH    ebp 
        MOV     ebp, esp      
        SUB     esp, 4          

        PUSH  869              ;code page for greek... tryed with UTF-8(65001)
        CALL   [SetConsoleOutputCP]
     
        PUSH    STD_OUTPUT_HANDLE     
        CALL    [GetStdHandle]
         
        LEA     edx, [ebp - 4]             
        PUSH    NULL                             
        PUSH    edx                               ;chars written
        PUSH    dword [ebp + 12]          ;message len 
        PUSH    dword [ebp + 8]            ;message 
        PUSH    eax                               ;stdout handle 
        CALL    [WriteConsoleW]

        add     esp,    4
        MOV     esp, ebp
        POP      ebp      
        RET                            

SECTION '.data' data readable writeable 
        msg du '-B> A>>1I5=85', 00h              ;testing with greek string
        written dd 0

SECTION '.idata' import data readable writeable 
        library kernel, 'KERNEL32.DLL'    

        import kernel,\ 
                lstrlenW, 'lstrlenW',\ 
                SetConsoleOutputCP, 'SetConsoleOutputCP',\ 
                GetStdHandle,  'GetStdHandle',\ 
                WriteConsoleW, 'WriteConsoleW',\ 
                ExitProcess,   'ExitProcess'
    
Post 05 Jan 2012, 23:48
View user's profile Send private message Reply with quote
wyvern



Joined: 08 Dec 2011
Posts: 27
wyvern 06 Jan 2012, 02:04
Oh yeah, thanks for the correction guys, i forgot the "[]" !!, the code was actually made to work with a linker, so i quickly converted it to "pure fasm" before posting here, just a silly thing.

@Picnic This is your code output:
Image

@typedef This your code output:
Image

The same result for win 7 and XP, with "Lucida Console" font. What font are you using?

By the way, i need always the "=5" at the first line of the source when is UTF-8, if not fasm will say "illegal instruction".

_________________
Thanks
Post 06 Jan 2012, 02:04
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 06 Jan 2012, 03:12
Well, here's my Vista. The same code I gave.

Image
Post 06 Jan 2012, 03:12
View user's profile Send private message Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1392
Location: Piraeus, Greece
Picnic 06 Jan 2012, 11:54
wyvern wrote:

The same result for win 7 and XP, with "Lucida Console" font. What font are you using?
.


I get correct chars with both lucida and raster font.
Post 06 Jan 2012, 11:54
View user's profile Send private message Visit poster's website Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 06 Jan 2012, 12:29
The clue is use of 'UTF8.inc' include file:
Quote:
format pe console

include 'encoding\UTF8.inc' ; <-- use UTF-8
include 'win32wx.inc'

.data
temp dd ?
text du 'ελληνικά English русский',0

.code
main:
invoke WriteConsole,\
<invoke GetStdHandle, STD_OUTPUT_HANDLE>,\
text,\
<invoke lstrlen, text>,\
temp,\
0

ret

.end main
And, of course, save the source file in UTF-8 text format too. Here you have a screenshot (font Lucida Console, Windows 7):

Image
Post 06 Jan 2012, 12:29
View user's profile Send private message Visit poster's website Reply with quote
wyvern



Joined: 08 Dec 2011
Posts: 27
wyvern 06 Jan 2012, 15:49
Well, something funny: nobody here seems to use the "=5" prefix at the start of the file.... im the only one who cant assemble without it?

@typedef The "-B> A>>1I5=85" string is erroneous, when i pasted mi code here the original greek string was converted to that.

@MHajduk Thanks, the UTF-8.inc is equivalent to write manually the unicode code points:
Code:
text DW 0x03B5, 0x03BB,0x03BB ;, ... etc for every letter
    

Altough.. i dont know exactly why i cant write the chars directly and why i need that conversion, after all the source file is saved in UTF-8... so?...

Another funny thing, Lucida Console doesnt support the graphs for Chineese, Arabic, and others. I guess that i need to add a complete-unicode font for the console options, if somebody knows one please let me know.
Post 06 Jan 2012, 15:49
View user's profile Send private message Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 06 Jan 2012, 16:03
wyvern wrote:
Well, something funny: nobody here seems to use the "=5" prefix at the start of the file.... im the only one who cant assemble without it?
The '=5' sequence is equivalent, as far as I know, to the BOM put by some editors before the text of the file. I have been using PSPad editor for work with UTF-8, although common Notepad also can save files in UTF-8 encoding.
wyvern wrote:
Another funny thing, Lucida Console doesnt support the graphs for Chineese, Arabic, and others. I guess that i need to add a complete-unicode font for the console options, if somebody knows one please let me know.
Yes, that's true, seems that Lucida doesn't cover Arabic and Chinese chars. If it is possible, I'd use Tahoma instead (works perfectly in Windows GUI).
Post 06 Jan 2012, 16:03
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 06 Jan 2012, 16:17
MHajduk wrote:
wyvern wrote:
Well, something funny: nobody here seems to use the "=5" prefix at the start of the file.... im the only one who cant assemble without it?
The '=5' sequence is equivalent, as far as I know, to the BOM put by some editors before the text of the file. I have been using PSPad editor for work with UTF-8, although common Notepad also can save files in UTF-8 encoding.
The "=5" probably is there to "neutralize" the BOM, which is seen by fasm as unrecognized label, so by appending "=5" to it you just create the valid definition of constant.
As for the editors, some (like Notepad++) allow you to choose "UTF-8 without BOM" or "UTF-8 with BOM" as written format, some (like PSPad) have an option in configuration to include BOM (by default off), and other, like Notepad, always put BOM there without an option to disable this.
So if you use Notepad, you need something like "=5". But if some other editor then saves it without BOM, the source will not compile again. So I think putting something like ".utf:" in the first line would be better - it will work both with BOM and without it.
Post 06 Jan 2012, 16:17
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 06 Jan 2012, 16:26
wyvern wrote:
Altough.. i dont know exactly why i cant write the chars directly and why i need that conversion, after all the source file is saved in UTF-8... so?...
The strings for *W API are not UTF-8, they are UTF-16 (or UCS-2 in older versions of Windows). Hence you need the conversion.

If you wanted to use UTF-8 strings in your program, you would need to use the MultiByteToWideChar API with CP_UTF8 in first parameter.


Last edited by Tomasz Grysztar on 06 Jan 2012, 16:27; edited 1 time in total
Post 06 Jan 2012, 16:26
View user's profile Send private message Visit poster's website Reply with quote
wyvern



Joined: 08 Dec 2011
Posts: 27
wyvern 06 Jan 2012, 16:27
@MHajduk Ok, I will check if "Tahoma" is valid font for the console, there are some restrictions according to Microsoft KB...

@Tomasz Yeah, i was using the simple Win-Notepad. And thanks for the ".utf:", is better. By the way, FASM is really cool!, great work.


Last edited by wyvern on 06 Jan 2012, 16:28; edited 1 time in total
Post 06 Jan 2012, 16:27
View user's profile Send private message Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 06 Jan 2012, 16:27
Tomasz Grysztar wrote:
The "=5" probably is there to "neutralize" the BOM, which is seen by fasm as unrecognized label, so by appending "=5" to it you just create the valid definition of constant.
This sounds as the best explanation of the observed behavior. Wink

Well, I would suggest to choose a text editor which, like PSPad, allows to configure programmer's environment once in a such way that you don't need to remember all the time what's going on with FASM compiler "guts". Wink
Post 06 Jan 2012, 16:27
View user's profile Send private message Visit poster's website Reply with quote
wyvern



Joined: 08 Dec 2011
Posts: 27
wyvern 06 Jan 2012, 16:31
@MHajduk I dont know PSPad. Normally i use RADasm for large things, you think PSPad is a better or more configurable envirmoment?
Post 06 Jan 2012, 16:31
View user's profile Send private message Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 06 Jan 2012, 16:39
wyvern wrote:
@MHajduk I dont know PSPad. Normally i use RADasm for large things, you think PSPad is a better or more configurable envirmoment?
Well, PSPad is quite flexible and I chose it mainly because I can't work with source files encoded in many languages at the same time (original FASM IDE wasn't good for it). After all, PSPad has predefined syntax highlighting for many languages (to me, PHP and HTML are most important), so you can code simultaneously in various languages in the same place. PSPad hasn't such advanced tools as visual programming environment though (I mean, you can't just drag and drop elements of the projected GUI).

I think you may consider use of Fresh if you want to work with really complicated projects. Smile
Post 06 Jan 2012, 16:39
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 06 Jan 2012, 19:03
First: if you change the console codepage, it's probably a good idea to be a nice citizen and return it to it's previous codepage when exiting your app Smile - of course that means a utility (rather than an interactive) app will seem to produce gibberish. Something that ought to be documented in a README.TXT (ie. "make sure to set a unicode-able font, and do a `chcp 65001` to set UTF-8 console encoding).

Also, WriteConsoleOutputW() is great for interactive use - but if you want to support stdout redirection (which is often crucial for non-interactive apps), it's no-go. In that case, you want to use standard WriteFile output, but the CP_UTF8 codepage... and of course UTF-8 (rather than UCS-2/UTF-16) data, which means a WideCharToMultiByte() call.

_________________
Image - carpe noctem
Post 06 Jan 2012, 19:03
View user's profile Send private message Visit poster's website Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 06 Jan 2012, 21:42
weird, When I use '=5' I get a compiler error.
Post 06 Jan 2012, 21:42
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.