flat assembler
Message board for the users of flat assembler.

Index > Tutorials and Examples > Simple pattern matching procedure.

Author
Thread Post new topic Reply to topic
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 07 Nov 2013, 11:15
Here is a simple pattern match program, similar to the DOS/Linux glob matching. Please, if someone wants, make some tests. Also, suggestions about size optimizations are welcome.

Code:
;***************************************************************************************
;
; Provides pattern matching of the [.pattern] in the [.string].
; Returns CF=1 if the pattern MATCH
;         CF=0 if the pattern DOES NOT MATCH
;
; In the pattern, "*" char match 0 or more characters and "?" char match 
; exactly 1 character. If the characters "*" and "?" have to be directly matched
; they can be escaped with "\". "\\" escapes the backslash itself. If the backslash
; is located at the end of the pattern, it is treated as a normal character.
;
; If a character between "@" and "_" is escaped, a controll code is generated equal
; to the ascii code of the char - $40.
;
;***************************************************************************************
proc StrMatchPattern, .pattern, .string
begin
        pushad

        mov     esi, [.pattern]
        mov     edi, [.string]

        xor     ecx, ecx

.main_loop:
        mov     ah, [edi]       ; from the string
        inc     edi

.main_loop2:
        mov     al, [esi]       ; from the pattern
        inc     esi

        test    al, al
        jz      .eo_pattern

        cmp     al, '*'
        jne     .not_star

        mov     ecx, esi        ; the address after the star.
        mov     ebx, edi
        jmp     .main_loop2

.not_star:
        test    ah, ah
        jz      .not_match    ; the string ended, but the pattern not.

        cmp     al, '?'
        je      .main_loop

        cmp     al, '\'
        jne     .compare

        cmp     byte [esi], 0
        je      .compare

        mov     al, [esi]
        inc     esi

        cmp     al, $40
        jb      .compare
        cmp     al, $60
        jae     .compare

        sub     al, $40         ; the control codes escaped.

.compare:
        cmp     al, ah
        je      .main_loop

        jecxz   .not_match

.char_ne:
        mov     edi, ebx
        mov     esi, ecx
        inc     ebx
        jmp     .main_loop

.eo_pattern:
        test    ah, ah
        jz      .match

        jecxz   .not_match

        cmp     byte [ecx], 0
        jne     .char_ne

.match:
        stc
        popad
        return

.not_match:
        clc
        popad
        return
endp    

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 07 Nov 2013, 11:15
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 07 Nov 2013, 11:20
JohnFound wrote:
Here is a simple pattern match program, similar to the DOS/Linux glob matching. Please, if someone wants, make some tests. Also, suggestions about size optimizations are welcome.

Code:
;***************************************************************************************
;
; Provides pattern matching of the [.pattern] in the [.string].
; Returns CF=1 if the pattern MATCH
;         CF=0 if the pattern DOES NOT MATCH
;
; In the pattern, "*" char match 0 or more characters and "?" char match 
; exactly 1 character. If the characters "*" and "?" have to be directly matched
; they can be escaped with "\". "\\" escapes the backslash itself. If the backslash
; is located at the end of the pattern, it is treated as a normal character.
;
; If a character between "@" and "_" is escaped, a controll code is generated equal
; to the ascii code of the char - $40.
;
;***************************************************************************************
proc StrMatchPattern, .pattern, .string
begin
        pushad

        mov     esi, [.pattern]
        mov     edi, [.string]

        xor     ecx, ecx

.main_loop:
        mov     ah, [edi]       ; from the string
        inc     edi

.main_loop2:
        mov     al, [esi]       ; from the pattern
        inc     esi

        test    al, al
        jz      .eo_pattern

        cmp     al, '*'
        jne     .not_star

        mov     ecx, esi        ; the address after the star.
        mov     ebx, edi
        jmp     .main_loop2

.not_star:
        test    ah, ah
        jz      .not_match    ; the string ended, but the pattern not.

        cmp     al, '?'
        je      .main_loop

        cmp     al, '\'
        jne     .compare

        cmp     byte [esi], 0
        je      .compare

        mov     al, [esi]
        inc     esi

        cmp     al, $40
        jb      .compare
        cmp     al, $60
        jae     .compare

        sub     al, $40         ; the control codes escaped.

.compare:
        cmp     al, ah
        je      .main_loop

        jecxz   .not_match

.char_ne:
        mov     edi, ebx
        mov     esi, ecx
        inc     ebx
        jmp     .main_loop

.eo_pattern:
        test    ah, ah
        jz      .match

        jecxz   .not_match

        cmp     byte [ecx], 0
        jne     .char_ne

.match:
        stc
        popad
        return

.not_match:
        clc
        popad
        return
endp    
What the "begin" word in the procedure body is doing there? Looks a bit like Pascal syntax. Wink
Post 07 Nov 2013, 11:20
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 07 Nov 2013, 11:27
It is FreshLib syntax. The procedure header can be easily edited for whatever macro library you use.
Post 07 Nov 2013, 11:27
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 07 Nov 2013, 12:25
JohnFound wrote:
It is FreshLib syntax. The procedure header can be easily edited for whatever macro library you use.
Well, I had an impression it must be a part of a bigger thing, now it's clear to me. Wink

Another question. Why don't use 'lodsb' here:
Code:
.main_loop2:
        mov     al, [esi]       ; from the pattern
        inc     esi    
and in the following places? This instruction shouldn't change the ah register content, so what is the reason?
Post 07 Nov 2013, 12:25
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 07 Nov 2013, 13:03
MHajduk wrote:
Another question. Why don't use 'lodsb' here:


It better corresponds with the next instruction pair "mov ah, [edi]". As long as both make the same (load char from the string), I considered they have to look the same. It makes the code more readable.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 07 Nov 2013, 13:03
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 07 Nov 2013, 13:16
Well, to substitute these two instructions:
Code:
        mov     ah, [edi]       ; from the string
        inc     edi    
we can do something like the following:
Code:
        xchg    esi, edi
        xchg    ah, al
        lodsb
        xchg    ah, al
        xchg    esi, edi    
but I'm not sure it's more readable or more compact than original, at least it is in the same convention... Wink
Post 07 Nov 2013, 13:16
View user's profile Send private message Visit poster's website Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 07 Nov 2013, 13:17
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 19:10; edited 1 time in total
Post 07 Nov 2013, 13:17
View user's profile Send private message Reply with quote
ejamesr



Joined: 04 Feb 2011
Posts: 52
Location: Provo, Utah, USA
ejamesr 09 Nov 2013, 22:32
If you're not too concerned about alignment and would prefer to save space, use 'align 1' in front of your procedures. This will save, on average, a couple bytes per procedure.
Post 09 Nov 2013, 22:32
View user's profile Send private message Send e-mail Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 10 Nov 2013, 05:10
ejamesr wrote:
If you're not too concerned about alignment and would prefer to save space, use 'align 1' in front of your procedures. This will save, on average, a couple bytes per procedure.


Well, "align 1" is actually NOP. The assembler always align on 1.

The alignment of the procedures in FreshLib is optional, so the user of the library can choose the alignment with "options.AlignCode".

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 10 Nov 2013, 05:10
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
ejamesr



Joined: 04 Feb 2011
Posts: 52
Location: Provo, Utah, USA
ejamesr 11 Nov 2013, 21:55
'align 1' does nothing; it will not insert a NOP into the code. But if it replaces a larger-sized 'align' statement, it can eliminate NOPs.

My previous comment assumed you used 'align 4' or even 'align 16' for your procedures. If not, please disregard my comment.

Since I'm always more interested in speed than size (using assembly pretty much guarantees my code will be smaller than when using a different language), I periodically consult the latest version of the "IntelĀ® 64 and IA-32 Architectures Optimization Reference Manual" (July 2013). On page 3-9:
Quote:
Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch targets should be 16-byte aligned.
and then on page 3-46:
Quote:
Alignment of code can be an issue for the Pentium M, Intel Core Duo and Intel Core 2 Duo processors.
Alignment of branch targets will improve decoder throughput.
Since I'm using an Intel Core 2 Duo laptop, I try to align branch targets, especially in inner loops that go through thousands (or more) iterations. And that certainly makes my code size grow.
Post 11 Nov 2013, 21:55
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.