flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > Is that algorythm fastest & workable?

Author
Thread Post new topic Reply to topic
ProMiNick



Joined: 24 Mar 2012
Posts: 636
Location: Russian Federation, Sochi
ProMiNick
resolved case - finding position of pattern(could be sized block or zero term string) in memory (memory is size limited). algorythm should be fastest on single char patterns and any multy char pattern without penalty to each other speed. algorythm code should be minimal with both: execution time & code size.
Code:
GetStrLength:
; string:                   [in]        edi
; string length:            [out]       ecx
;                           [unchanged] ebx, edx, esi, edi, ebp, esp
        or      ecx, -1
        xor     eax, eax
        cld
        repne scasb
        not     ecx
        sub     edi,ecx
        retn
;-----------------------------------------------------------------------
MemoryPos:
; memory:       [in]            esi
; pattern:      [in]            edi
; memorysize:   [in]            edx
; patternpos:   [out]           eax , -1 mean notfound
;               [unchanged] ebp, esp 
        call    GetStrLength
        xchg    edx, ecx
        xchg    edi, esi
MemoryBlockPos:
; memory:       [in]            edi
; pattern:      [in]            esi
; memorysize:   [in]            ecx
; patternsize:  [in]            edx
; patternpos:   [out]           eax , -1 mean notfound
;               [unchanged] ebp, esp 
        lodsb
        push    ecx
        repne scasb
        jne     .notfound
        dec     edx
        jz      .found
      .searchloop:
        push    edx
        push    edi
        push    esi
        xchg    edx, ecx
        repe cmpsb
        mov     ecx, edx
        pop     esi
        pop     edi
        pop     edx
        je      .found
        repne scasb
        je      .searchloop
      .notfound:
        mov     [esp], ecx
      .found:
        pop     eax
        sub     eax, ecx
        dec     eax
        retn
; P.S. 72 chars are opimal to fit in one line for printing code for ex.-
; It is one ";" and 71 "-" ---------------------------------------------
; at A4 list 80x56 of CourrierNew size 9 will fit 56 lines -------------
; with left border = 3 sm (6 chars) and right one = 1 sm (2 chars) -----    

_________________
I don`t like to refer by "you" to one person.
My soul requires acronim "thou" instead.
Post 06 May 2021, 22:34
View user's profile Send private message Send e-mail Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3196
Location: vpcmipstrm
bitRAKE
Code:
.searchloop:
        push    ecx
        mov     ecx, edx
        push    edx
        push    edi
        push    esi
        repz cmpsb
        pop     esi
        pop     edi
        pop     edx
        pop     ecx
        jz      found
        repnz scasb
        jz      .searchloop
.notfound:
        pop     eax
        push    ecx    

_________________
¯\(°_o)/¯ unlicense.org
Post 07 May 2021, 03:13
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18067
Location: In your JS exploiting you and your system
revolution
Do you mean the "fastest" implementation, or the fastest algorithm? You ask for an algorithm but present an implementation.

Algorithms and their implementations are very different things. And their relative performances are only vaguely linked to each other. One is measured by seconds, the other with the big-O notation.

Finding the fastest big-O value is easy. There are about a billion papers and studies published regarding searching for substrings.

Finding the "fastest" general implementation is almost impossible. The sheer number of combinations makes it intractable.
Post 07 May 2021, 03:16
View user's profile Send private message Visit poster's website Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 636
Location: Russian Federation, Sochi
ProMiNick
Code:
      .searchloop: ; bitRAKE, I`m understand what thou mean - preserving edx is useless, just preserve ecx
        push    ecx
        mov     ecx, edx
        push    edi
        push    esi
        repe cmpsb
        pop     esi
        pop     edi
        pop     ecx
        je      .found
        repne scasb
        je      .searchloop
      .notfound:
        pop     ecx
        push    ecx     
mov ecx, [esp] - 3 bytes - 1 tact
vs
pop ecx
push ecx - 2 bytes - 2 tacts
I guess it is case of preference. I prefer to ensure that in top of stack & ecx would be same value by reading it, pop-push looks less readable.
Post 07 May 2021, 06:27
View user's profile Send private message Send e-mail Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3196
Location: vpcmipstrm
bitRAKE
I'm more apt to use flag value return for such a leaf. It might cost an extra instruction, but it save an instruction every use.
Code:
call MemoryPos
jnz not_found    
verses ...
Code:
call MemoryPos
inc eax
jz not_found    
Yet, if one needs to interface another language then flags are probably not an option.

_________________
¯\(°_o)/¯ unlicense.org
Post 07 May 2021, 07:10
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1057
Roman
Quote:

.searchloop: ; bitRAKE, I`m understand what thou mean - preserving edx is useless, just preserve ecx
push ecx
mov ecx, edx
push edi
push esi
repe cmpsb
pop esi
pop edi
pop ecx
je .found
repne scasb
je .searchloop
.notfound:
pop ecx
push ecx

This code bad if we want 64 bit program.
My solution:
Code:
.searchloop: ; bitRAKE, I`m understand what thou mean - preserving edx is useless, just preserve ecx
        movd   xmm0,ecx
        mov     ecx, edx
        movd   xmm1,edi
        movd   xmm2,esi
        repe cmpsb
        movd   edi,xmm1
        movd   esi,xmm2
        movd   ecx,xmm0
        je      .found
        repne scasb
        je      .searchloop
      .notfound:
        movd   ecx,xmm0
        
    
Post 07 May 2021, 08:07
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3196
Location: vpcmipstrm
bitRAKE
The 64-bit version is a couple bytes larger, but fine otherwise.
Code:
GetStrLength:
; string:                   [in]        rdi
; string length:            [out]       ecx
;                           [unchanged] rbx, rdx, rsi, rdi, rbp, rsp
        or      ecx, -1
        xor     eax, eax
        repnz scasb
        not     ecx
        sub     rdi,rcx
        retn
;-----------------------------------------------------------------------
MemoryPos:
; memory:       [in]            rsi
; pattern:      [in]            rdi
; memorysize:   [in]            rdx
; patternpos:   [out]           eax , -1 mean notfound
;               [unchanged] rbp, rsp
        call    GetStrLength
        xchg    edx, ecx
        xchg    rdi, rsi
MemoryBlockPos:
; memory:       [in]            rdi
; pattern:      [in]            rsi
; memorysize:   [in]            ecx
; patternsize:  [in]            edx
; patternpos:   [out]           eax , -1 mean notfound
;               [unchanged] rbp, rsp
        lodsb
        push    rcx

        repnz scasb
        jnz     .notfound
        dec     edx
        jz      .found
.searchloop:
        push    rcx
        mov     ecx, edx
        push    rdi
        push    rsi
        repz cmpsb
        pop     rsi
        pop     rdi
        pop     rcx
        jz      found
        repnz scasb
        jz      .searchloop
.notfound:
        pop     rax
        push    rcx
.found:
        pop     rax
        sub     eax, ecx
        dec     eax
        retn
; P.S. 72 chars are opimal to fit in one line for printing code for ex.-
; It is one ";" and 71 "-" ---------------------------------------------
; at A4 list 80x56 of CourrierNew size 9 will fit 56 lines -------------
; with left border = 3 sm (6 chars) and right one = 1 sm (2 chars) -----    
We assume strings will not exceed 2^32-1, which seems reasonable.

_________________
¯\(°_o)/¯ unlicense.org
Post 07 May 2021, 08:34
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 937
Location: Belarus
DimonSoft
ProMiNick wrote:
mov ecx, [esp] - 3 bytes - 1 tact
vs
pop ecx
push ecx - 2 bytes - 2 tacts

I guess, this needs a reference to the source of the information and detailed analysis of its applicability to modern processor models with all the hardware optimizations involved that make equations of “instruction = N ticks” type… obsolete?
Post 07 May 2021, 10:48
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.