flat assembler
Message board for the users of flat assembler.

Index > Main > Unicode string value

Author
Thread Post new topic Reply to topic
yoshimitsu



Joined: 07 Jul 2011
Posts: 96
yoshimitsu
Hi,
I want to know whether there's a way to have a string literal expand to a unicode string value:
Like, when you have cmp eax,'ABCD' it's equal to cmp eax,44434241h.
And I wonder if there's something like cmp eax,'AC'u (cmp eax,'A\0C\0') which would be equal to cmp eax,00430041h.
Thx :)
Post 28 Feb 2013, 01:04
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
yoshimitsu
There's no native way to do that, but you may wanna try something like this:
Code:
macro u [arg]
{
        common inst equ
                irps argn,arg
                \{
                        \forward inst equ inst argn
                                match \`argn\#'',argn
                                \\{
                                        \\local ustr
                                        restore inst
                                        virtual
                                                du argn
                                                assert $-$$ <= 8
                                                dq 0
                                                load ustr qword from $$
                                        end virtual
                                        inst equ inst ustr
                                \\}
                        \common match inst,inst \\{ inst \\}
                        \forward restore inst
                \}
                restore inst
}    

Your example would then look like this:
Code:
u cmp eax,'AC'    

And an hour of wonderful music can become reality just like that:
Code:
u invoke Beep,'z','07'    
Post 28 Feb 2013, 03:57
View user's profile Send private message Reply with quote
yoshimitsu



Joined: 07 Jul 2011
Posts: 96
yoshimitsu
Thank you for the macro, l_inc.
Small question, isn't the '' in \`argn\#'' just an empty literal and doesn't get matched at all, so you can omit it in the first place?

Your macro seems like a good solution, although a nice native way would be more welcome.
Post 28 Feb 2013, 12:59
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
yoshimitsu
Quote:
isn't the '' in \`argn\#'' just an empty literal and doesn't get matched at all, so you can omit it in the first place

'' is an empty string, not an empty literal. In general this concatenation with an empty string is done in order to avoid problems in case argn is an empty argument, cause in this case without concatenation you'll get a construction match `, which is expanded into match "," which in turn is not a valid construction.

But you are right. In this specific case it is not necessary to concatenate with an empty string because irps does not produce empty arguments.

Quote:
Your macro seems like a good solution

I personally would prefer an even simpler solution which however needs more coding overhead:
Code:
macro ustrdef [arg*]
{
   forward
      match name==val,arg
      \{
         \local ustr
         name equ ustr
         virtual
            du val 
            assert $-$$ <= 8 
            dq 0 
            load ustr qword from $$ 
         end virtual
      \}
}    

And this is the related coding overhead:
Code:
ustrdef x='ab',y='cd'
   cmp eax,x
   mov eax,y
restore x,y    


Quote:
although a nice native way would be more welcome

That's up to the author to decide. I could live without this possibility without problems, but it would make sense to consider including the corresponding feature into the fasm 2.
Post 28 Feb 2013, 15:38
View user's profile Send private message Reply with quote
yoshimitsu



Joined: 07 Jul 2011
Posts: 96
yoshimitsu
Thanks again.
However, I don't quite understand why match 'AC''','AC' matches correctly.
Also does the documentation say anything about matching strings or rather extracting the text out of a string? Because I wouldn't have thought that match `x,x would work..
Post 28 Feb 2013, 16:36
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
yoshimitsu
Quote:
I don't quite understand why match 'AC''','AC' matches correctly.

It doesn't. In my example there's a concatenation operator in between. And therefore it's match `AC # '','AC' -> match 'AC','AC', and this does match correctly.
Quote:
Also does the documentation say anything about matching strings or rather extracting the text out of a string?

It does not extract anything out of a string, because one quoted string is a single standalone inseparable symbol (not a character though):
1.2.1 Instruction syntax wrote:
If the first character of symbol is either a single or double quote, it integrates any sequence of characters following it, even the special ones, into a quoted string, which should end with the same character, with which it began (the single or double quote)

and:
2.3.6 Conditional preprocessing wrote:
any of symbol characters and any quoted string should be matched exactly as is.

Quote:
Because I wouldn't have thought that match `x,x would work.

This is a common way to check at preprocessing stage whether x is a quoted string symbol.
Post 28 Feb 2013, 16:47
View user's profile Send private message Reply with quote
yoshimitsu



Joined: 07 Jul 2011
Posts: 96
yoshimitsu
Sorry, overlooked a couple of things and had some major brain lag.
Everything's understood now, thanks l_inc ;)

PS:
I wonder what Tomasz thinks of using L, u or sth else in front/behind of a string literal to make it a unicode value, like
cmp eax,L'AC'
cmp eax,u'AC'
cmp eax,'AC'L
cmp eax,'AC'u

It wouldn't add much overhead and one also wouldn't be forced to use it, so nothing breaks.
Post 28 Feb 2013, 17:39
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
yoshimitsu
Quote:
I wonder what Tomasz thinks of using L, u or sth else in front/behind of a string literal to make it a unicode value

I think it's quite problematic to make it fit into the current syntax.
First of all it's questionable how to handle the following situations:
Code:
du "ab"
db L"ab"    

Second and more important problem is that the source code is not in Unicode. Thus you need some rules to convert the string specified between quotation marks into Unicode. These rules are called "encoding". Just appending a zero-byte to a character value does not make it Unicode automatically. The encoding problem with the du directive is solved by including a standard header with a redefining macro corresponding to the desired encoding. With current fasm architecture it's impossible to redefine such inlined L with a macro.
Post 28 Feb 2013, 17:55
View user's profile Send private message Reply with quote
yoshimitsu



Joined: 07 Jul 2011
Posts: 96
yoshimitsu
Thing is, I never did anything with Unicode so far.
Means I don't have a clue about it :s
Atm I'm thinking of porting some code to Unicode (Windows-Unicode which is UTF-16, so 2 Byte for one char, afaik) in which I heavily use ASCII-strings as dword-values for easier strcmp which would then break due to being 8 Bytes in size and not fitting into one instruction, so I got to to split them. I'd also need to fill the values with zereos which would be easy with hex-values instead of strings, it'd make the source unreadable, though.

Guess the easiest solution is just to stick with your macro then.
Post 28 Feb 2013, 18:23
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
yoshimitsu
You may need to implement some higher-level macros. I'm not sure whether the following example is suitable, but when I needed to push strings onto the stack, I implemented a more or less high level construction, which in most cases does not fit into a single assembly instruction:

Code:
;Allows to detect current code generation mode (2,4,8 )
macro detectMode byteCount
{
        virtual
                mov eax,[0]
                byteCount = 1 shl ($-$$-3)
        end virtual
}
;Allows to save a string onto the stack
;usage: pushstr "This is gonna be a stack string",0
struc pushstr [arg*]
{
        common local pushCount, buf, modeByteCount
        
        detectMode modeByteCount
        virtual
                db arg
                pushCount = ($-$$+modeByteCount-1)/modeByteCount
        end virtual
        
        . = pushCount*modeByteCount
        
        repeat pushCount
                virtual
                        db arg
                        db (modeByteCount-1) dup 0
                        if modeByteCount = 2
                                load buf word from $$+(pushCount-%)*modeByteCount
                        else if modeByteCount = 4
                                load buf dword from $$+(pushCount-%)*modeByteCount
                        else if modeByteCount = 8
                                if % < pushCount
                                        load buf qword from $$+(pushCount-%-1)*modeByteCount
                                else
                                        load buf qword from $$+(pushCount-1)*modeByteCount
                                end if
                        else
                                display 'Error: unknown code generation mode',13,10
                                err
                        end if
                end virtual
                if modeByteCount = 8
                        push rax
                        mov rax,buf
                else
                        push buf
                end if
        end repeat
        if modeByteCount = 8
                xchg rax,qword[rsp+(pushCount-1)*modeByteCount]
        end if
}    


In this case porting to Unicode means just replacing the first two db-directives with du.

P.S. With newer fasm abilities (addressing space labels), this implementation is not the best possible, but I didn't have much time to modify my macros appropriately.
Post 28 Feb 2013, 19:20
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.