flat assembler
Message board for the users of flat assembler.

Index > Macroinstructions > [fasmg] fastest way to determine if a token is an fasmg id

Author
Thread Post new topic Reply to topic
MaoKo



Joined: 07 May 2019
Posts: 100
Location: Paris/French
MaoKo 15 Nov 2019, 02:07
Hi! The problem is that I want to implement an irps macro which identify sequence of token such as a.b.c?and return an single token such as the way fasm1 do.
I have write my own but its very slow with an single repeat $100.
I figure out that my macro spend so much time in fasmg identifier identification.
Here is my macro:
Code:
_fasmg_identifier_symbols_table? := $00
namespace _fasmg_identifier_symbols_table?
    repeat $100, i:$00
        define  ?i?
        restore ?i?
        if (i <> "$") & (i <> "%") & (i <> ".") & ((i < "0") | (i > "9"))\
         & (i <> "@") & ((i < "A") | (i > "Z")) & (i <> "^") & (i <> "_")\
         & ((i < "a") | (i > "z"))
            ?i? = $00
        else
            ?i? = $01
        end if
    end repeat
end namespace

macro _fasmg_identifier? result?*, token?&
    _assert_string (token)
    namespace _fasmg_identifier_symbols_table?
        repeat $01, identifier:token
            if (definite ?identifier?)
                result = ?identifier?
                break
            end if
            match =0, identifier
                result = $00
            else
                result = $01
                if ((identifier and $FF) >= "0") & ((identifier and $FF) <= "9")
                    result = $00
                    break
                else if  ((identifier and $FF) = "$")
                    repeat $01, identifier_2:identifier shr $08
                        iterate <lbound,ubound>, "0","9", "A","Z", "a","z"
                            if ((identifier_2 and $FF) >= lbound) & ((identifier_2 and $FF) <= ubound)
                                result = $00
                                break
                            end if
                        end iterate
                    end repeat
                end if
                if (result)
                    _iterate_string char, string (identifier)
                        if (~(?char?))
                            result = $00
                            break
                        end if
                    end _iterate_string
                end if
            end match
            define  ?identifier?
            restore ?identifier?
            ?identifier? = result
        end repeat
    end namespace
end macro
    

I use a kind of lookup table for speed up the process. The iterate string is as follow:
Code:
_iterate_string_symbols_table? := $00
namespace _iterate_string_symbols_table?
end namespace

macro _core_iterate_string? parameter?*, text?*
    local start, iterator
    _assert_string (text)
    start = $00
    define iterator parameter
    rawmatch _ =: begin, parameter
        _assert_numeric (begin)
        start = begin
        redefine iterator  _
    end rawmatch
    repeat $01, text_number:text
        namespace _iterate_string_symbols_table?
                if (~(definite ?text_number?))
                    define  ?text_number?
                    restore ?text_number?
                    ?text_number? = $00
                    namespace ?text_number?
                        repeat (lengthof (text) - start), i:start
                            node =: (((text) shr ($08 * i)) and $FF)
                        end repeat
                    end namespace
                end if
        end namespace
        match _, iterator
            outscope irpv _, _iterate_string_symbols_table?.text_number?.node
end macro

macro _iterate_string?! line?*&
    _pairing repeat, match, irpv
    repeat
    match
    irpv
    outscope _core_iterate_string line
end macro

macro end?._iterate_string?!
            end irpv
        end match
    end repeat
end macro
    

_pairing is ensuring macro for proper nesting. And finally my macro irps.
In fact, I want also to include space handling in my irps.
Code:
macro _core_irps? parameter?*, text?&
    local  iterator, stream, inc_space, inc_ident, identifier, valid_id, last_dot, str_item, buffer, token
    define iterator parameter
    define inc_space 0
    define inc_ident 0
    rawmatch it =[ options =], parameter
        redefine iterator it
        rawmatch =+ option_1 =| =+ option_2, options
            rawmatch =space? =ident?, option_1 option_2
            else rawmatch =ident? =space?, option_1 option_2
            else
                err "syntax error: bad irps option"
            end rawmatch
            redefine inc_space 1
            redefine inc_ident 1
        else rawmatch =+ =space?, options
            redefine inc_space 1
        else rawmatch =+ =ident?, options
            redefine inc_ident 1
        else
            err "syntax error: bad irps option"
        end rawmatch
    end rawmatch
    buffer equ text ; expands all symbolic variable once
    while $01
        match _ remain, buffer
            define token _
            match =1 =_= insert_space, inc_space buffer
                define token
            end match
            redefine buffer remain
        else 
            match _, buffer
                define token _
                match =1 =_= , inc_space buffer
                    define token
                end match
            end match
            break
        end match
    end while
    match =1, inc_ident
        define identifier
        last_dot = $00
        irpv item, token
            rawmatch =1, %
                repeat %%
                    restore token ; purge all values of token
                end repeat
            end rawmatch
            str_item = (`item)
            _fasmg_identifier valid_id, str_item
            if (valid_id) 
                if ((str_item) = ".") | (last_dot)
                    _insert_token identifier, item
                    if ((str_item) = ".")
                        last_dot = $01
                    end if
                else 
                    match _, identifier
                        define token _
                    end match
                    redefine identifier item
                end if
            else
                match _, identifier
                    define token _
                    redefine identifier
                end match
                define token item
            end if
            if (last_dot) & ((str_item) <> ".")
                last_dot = $00
            end if
        end irpv
        match _, identifier
            define token _
        end match
    end match
    match it, iterator
        outscope irpv it, token
end macro

macro irps?! line?*&
    _pairing match, irpv
    match
    irpv
    outscope _core_irps line
end macro

macro end?.irps?!
        end irpv
    end match
end macro
    

_insert_token is just an macro that equ with no space in-between. These macro work fine with small input but I must handle also large +1000l input.
If you have time to tell me some optimization that I could do, I will be grateful Smile.
Have a good day.


Last edited by MaoKo on 16 Nov 2019, 14:57; edited 1 time in total
Post 15 Nov 2019, 02:07
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8353
Location: Kraków, Poland
Tomasz Grysztar 15 Nov 2019, 07:41
You made me think if there is any trick that would let us reliably recognize whether a token is a name or numeric, and I came up with an idea to use DEFINE directive:
Code:
struc isname? token*
        . = 0
        define __isname.token.
        match , __isname.token
                . = 1
                namespace __isname
                        rawmatch t, token
                                match , t
                                        . = 2
                                end match
                        end rawmatch
                end namespace
        end match
end struc    
When the token is not a name that can become the part of identifier, DEFINE simply treats it as a text to assign to the defined variable (which is then identified by "__isname.") without causing any error.

The "isname" macro gives value 1 if the token is a name that can be used in a middle of an identifier (after a dot) and value 2 if the token is name that can be an initial of an identifier. Therefore names identified as 1 should be the numeric ones.

I did a simple test:
Code:
macro test token
        T isname token
        display `token,9,'0'+T,13,10
end macro

test foo
test 123
test $A
test $Z
test *
test _
test +
test ?    
and the results are:
Code:
foo     2
123     1
$A      1
$Z      2
*       0
_       2
+       0
?       0    
Note that you can skip the inner check (entirety of NAMESPACE block) when you do not need to distinguish type 1 and 2.
Post 15 Nov 2019, 07:41
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8353
Location: Kraków, Poland
Tomasz Grysztar 15 Nov 2019, 11:09
BTW, the same trick can be also used to test whether a sequence of tokens is a single identifier:
Code:
struc isidentifier? text&
        . = 0
        define __isidentifier.text
        namespace __isidentifier
                rawmatch id, text
                        match , id
                                . = 1
                        end match
                end rawmatch
        end namespace
end struc

macro test text&
        T isidentifier text
        display `text,9,'-'+T*('+'-'-'),13,10
end macro

test foo        ; +
test foo.bar    ; +
test foo+1      ; -
test foo?       ; +
test foo!       ; -
test 123        ; -
test $A         ; -
test $Z         ; +
test +          ; -    
But this is likely not useful in your case, because you need to somehow cut out the identifier to test anyway.
Post 15 Nov 2019, 11:09
View user's profile Send private message Visit poster's website Reply with quote
MaoKo



Joined: 07 May 2019
Posts: 100
Location: Paris/French
MaoKo 16 Nov 2019, 15:20
Yes you are right. Your implementation is pretty fast and straightforward. Thx. I found perhaps a way to speed up the overall irps macro.
In fact, everything is played with the matchs in the while loop.
If you increase the match parameter everything is speed up even with an increase of two:
Code:
macro irps?! parameter?*, text?*&
    local buffer, token
    buffer equ text
    while $01
       match _1 _2 remain, buffer
          define token _1
          define token _2
          redefine buffer remain
       else match _ remain, buffer
          define token _
          redefine buffer remain
       else
         match _, buffer
           define token _ 
         end match   
         break  
      end match   
    end while
    outscope irpv parameter, token
end macro
;macro end?.irps?!
;    end irpv
;end macro
    

This is the base case. So I write a macro that generate in a string the list of matchs statement with an increase of a power of two.
Kind of meta-programming.
Code:
struc _append_string? src?*&
    assert (. eqtype "")
    iterate value, src
        . =: string (((value) shl ($08 * lengthof (.))) or (.))
    end iterate
end struc

struc _int_pow base?*, exp?*
    if (exp)
        . = base
        repeat (exp - $01)
            . = base * .
        end repeat
    else
        . = $01
    end if
end struc

struc _match_n? n?*
    local match_line, begin, count, total_rept
    macro invoker?!
    end macro
    total_rept _int_pow 2, n
    begin = $01
    . = ""
    . _append_string "macro ?! line?&",     $0A
    . _append_string "match =_OFF, line",   $0A
    . _append_string "purge ?",             $0A
    . _append_string "else",                $0A
    . _append_string "esc macro invoker?!", $0A
    . _append_string "esc invoker",         $0A
    . _append_string "line",                $0A
    . _append_string "esc end macro",       $0A
    . _append_string "end match",           $0A
    . _append_string "end macro",           $0A
    repeat (total_rept)
        count = (%% - %) + $01
        if (bsr count) = (bsf count)
            repeat $01, i:$00
                eval "match_line equ _", (`i)
            end repeat
            repeat (count - $01)
                eval "match_line reequ match_line _", (`%)
            end repeat
            match _, match_line
                if (~(begin))
                    . _append_string "else", " "
                end if
                . _append_string "match ", (`_), " remain, buffer"
            end match
            repeat count, i:$00
                . _append_string string($0A), "define token _", (`i)
            end repeat
            . _append_string string($0A), "redefine buffer remain", string($0A)
            begin = $00
        end if
    end repeat
    . _append_string "else",                $0A
    . _append_string "match _, buffer",     $0A
    . _append_string "define token _",      $0A
    . _append_string "end match",           $0A
    . _append_string "break",               $0A
    . _append_string "end match",           $0A
    . _append_string "_OFF",                $0A
end struc

result _match_n $04
eval result

macro irps?! parameter?*, text?*&
    local buffer, token
    buffer equ text
    while $01
        invoker
    end while
    outscope irpv parameter, token
end macro

;macro end?.irps?!
;    end irpv
;end macro

repeat $10000
irps A, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
end irpv
end repeat
    

On my system, I wait only 5 sec while the standard irps it's 13 sec. I don't know why but if I put the result string directly in invoker?! without eval, I gain 1 sec.
Also end?.irps?! slow down the process of 2 sec.
I have don't include the handling of space and ident, but, with your macro, it will be mush faster than my starting point Smile
Post 16 Nov 2019, 15:20
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8353
Location: Kraków, Poland
Tomasz Grysztar 16 Nov 2019, 16:43
MaoKo wrote:
Code:
struc _int_pow base?*, exp?*
    if (exp)
        . = base
        repeat (exp - $01)
            . = base * .
        end repeat
    else
        . = $01
    end if
end struc    
Code:
total_rept _int_pow 2, n    
It might be a relic of your other attempts, but this is such an inefficient way of doing this that I should point out that you can compute it simply this way:
Code:
total_rept = 1 shl (n)    

MaoKo wrote:
Code:
    repeat (total_rept)
        count = (%% - %) + $01
        if (bsr count) = (bsf count)
            ; ...
        end if
    end repeat    
This also seems like a really needless repetition (note that fasmg preprocesses lines in IF block every time this is repeated). I would recommend to do as few repetitions as possible:
Code:
    repeat bsr total_rept + 1
        count = 1 shl (%%-%)
            ; ...
    end repeat    


PS. Also, if you'd ever need to use your "_int_pow" macro with really large numbers (and bases other than 2), I would recommend this variant:
Code:
struc _int_pow base?*, exp?*
        . = 1
        .sq = base
        .xp = exp
        while .xp
                if .xp and 1
                        . = . * .sq
                end if
                .xp = .xp shr 1
                if .xp
                        .sq = .sq * .sq
                end if
        end while
end struc    
But I guess yours is optimized for small numbers.
Post 16 Nov 2019, 16:43
View user's profile Send private message Visit poster's website Reply with quote
MaoKo



Joined: 07 May 2019
Posts: 100
Location: Paris/French
MaoKo 19 Nov 2019, 01:35
Thx Tomasz. Yes you are right this was a relic of my old project. After handling space and identifier with dot, my irps macro turn around 20 sec (without end?.irps?! and 65536 line).
Do you known another way to speed up the whole process or it's impossible? I don't understand why it's take so much time.
In fact I want to use fasmg as a preprocessor, but I really need to replace my macro.
Also do you knwown a workaround for end?.irps.! ?
Thx
Post 19 Nov 2019, 01:35
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8353
Location: Kraków, Poland
Tomasz Grysztar 19 Nov 2019, 08:34
MaoKo wrote:
Do you known another way to speed up the whole process or it's impossible?
A general advice is to reduce the number of lines in the most frequently used macros.
MaoKo wrote:
Also do you knwown a workaround for end?.irps.! ?
This might be some unwanted interaction, but I'm not sure. It might just as well be caused just by a simple fact that you add another layer of macro that gets called really many times (if this is your most used macro).
Post 19 Nov 2019, 08:34
View user's profile Send private message Visit poster's website Reply with quote
MaoKo



Joined: 07 May 2019
Posts: 100
Location: Paris/French
MaoKo 24 Nov 2019, 14:22
Hi! I have find a simpler and better solution to mimic the irps of fasm1 that the one above.
Metaprogramming is fast when you need to iterate token by token but too slow for identifier recognition.
I rearranged isidentifier? a little so that he can help me to match the longest identifier on the input.
So the code:
Code:
macro irps?! parameter?*, text?*&
    local buffer, token
    buffer reequ text
    while $01
        match __input?, buffer
            redefine __prefix?
            redefine __prefix?.__input __delimiter__
            match,__prefix
                rawmatch __suffix?, __input
                    match _1 =__delimiter__? _2, __prefix?.__suffix
                        local shift, tname
                        shift = lengthof(`__input) - lengthof(`_2) 
                        tname = string ((($01 shl (shift * $08)) - $01) and (`__suffix))
                        eval "define token ", tname
                        eval "restore __prefix?.", tname
                        redefine buffer _2
                    else match name, __input
                        define token name
                        restore __prefix?.name
                        break
                    end match
                end rawmatch
            else
                rawmatch first remain, __input
                    define token first
                    redefine buffer remain
                else
                    rawmatch remain, __input
                        define token remain
                    end rawmatch
                    break
                end rawmatch
            end match
        end match
   end while
    outscope irpv parameter, token
end macro

macro end?.irps!
    end irpv
end macro
    

The little drawback is that he treat such sequence: "1.A.E" such as a single identifier.
But for my need, I can bear it. He 2x fast that the above macro but he can't handle space.
If you find an another way, don't hesitate to tell me Razz.
Post 24 Nov 2019, 14:22
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8353
Location: Kraków, Poland
Tomasz Grysztar 24 Nov 2019, 16:33
Out of curiosity: do you use it for some sources originally written for fasm 1, thus the need for compatible behavior?
Post 24 Nov 2019, 16:33
View user's profile Send private message Visit poster's website Reply with quote
MaoKo



Joined: 07 May 2019
Posts: 100
Location: Paris/French
MaoKo 26 Nov 2019, 03:53
Hi! In fact, I want to emulate an assembler that mimic the behavior of fasm1 (for identifier recognition).
It allow the expansion of single-line macro as C/NASM do but allow also an optional prefix to form a complex identifier for the macro name.
Another problem is that this assembler allow (C) expressions such as "1<<2" or "A = 1 >= const.B".
This is why I want to implements a kind of irps fasm1 macro. First of all, I transform the expression above to "1 __shl__ 2" and "A = 1 __ge__ const.B", respectively. After this step, I do the shunting yard algorithm for the computation. For this (preprocessing) problem, I can also do a bunch of match in a loop. Kind of:
Code:
match _1 =<=< _2, buffer
  ; ...
else match _1 =<=<, buffer
  ; ...
else match =<=< _2, buffer
  ; ...
else match =<=<, buffer
  ; ...
end match
    

But this is little cumbersome and it's don"t work very well for the recognition of single-line macro name.
Sorry for my English. I hope, I was clear Smile.
Post 26 Nov 2019, 03:53
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.