flat assembler
Message board for the users of flat assembler.

Index > Main > String length

Goto page Previous  1, 2, 3, 4, 5, 6  Next
Author
Thread Post new topic Reply to topic
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
Sasha wrote:
The smoothness is what I really wanted to gain. The worst thing is this "sub eax,9" and then scan again.
I tried to extract the information about where is zero byte without scaning again. A saw, that all that 8 bytes are already in registers.


Yes, good work! BTW, may I use your code in FreshLib? You can of course submit it yourself, if you prefer.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 19 Jun 2014, 12:58
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
@Sasha

Code:
        and     ecx,$80808080 
        and     edx,$80808080 

        test    ecx,ecx         ;!! 
        jnz     .sub8 
        test    edx,edx 
        jz      .scan 
    

Can't you combine the AND and TEST to just be a TEST reg32, imm32?
Code:
        test    ecx,$80808080
        jnz     .sub8 
        test    edx,$80808080
        jz      .scan 
    
Post 19 Jun 2014, 17:47
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
r22 wrote:
Can't you combine the AND and TEST to just be a TEST reg32, imm32?


No, because it needs the result later in the code, where the last bytes are scanned byte by byte.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 19 Jun 2014, 18:50
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
JohnFound wrote:
r22 wrote:
Can't you combine the AND and TEST to just be a TEST reg32, imm32?


No, because it needs the result later in the code, where the last bytes are scanned byte by byte.

Couldn't you simply use the TEST reg8, imm8 (TEST cl, $80) encodings later in the code. Speed wise REG,REG and REG,IMM should be similar, the encoding size will be a byte larger (some fiddling of registers could be done so that you use EAX instead of ECX which would make the TEST al, $80 2 bytes instead of 3).
Post 19 Jun 2014, 18:59
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
@r22 - well, it probably can be done this way...
Post 19 Jun 2014, 20:05
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
r22 wrote:
@Sasha

Code:
        and     ecx,$80808080 
        and     edx,$80808080 

        test    ecx,ecx         ;!! 
        jnz     .sub8 
        test    edx,edx 
        jz      .scan 
    

Can't you combine the AND and TEST to just be a TEST reg32, imm32?
Code:
        test    ecx,$80808080
        jnz     .sub8 
        test    edx,$80808080
        jz      .scan 
    


We can remove the test at the and of the loop. It makes the loop smaller.

Code:
align 32
proc    strlen_freshlib_opt_2 uses ebx esi edi,str

        mov     eax,[str]
        mov     ebx,-01010101h

  .aligning:
        test    eax,3
        jz      .scan

        mov     dl,[eax]
        test    dl,dl
        jz      .found

        inc     eax

        jmp     .aligning

align 32
  .scan:
        mov     esi,[eax]
        mov     edi,[eax+4]

        lea     eax,[eax+8]


        lea     ecx,[esi+ebx]   ;!
        lea     edx,[edi+ebx]

        not     esi
        not     edi

        and     ecx,esi
        and     edx,edi

        and     ecx,$80808080           ;!!!! 
        jnz     .sub8
        and     edx,$80808080
        jz      .scan

; byte 0 was found: so search by bytes.
        lea     eax,[eax-4]
        mov     ecx,edx
        jmp     .bytesearch

  .sub8:
        lea     eax,[eax-8]

  .bytesearch:                  ;!!!
        test    cl,cl
        jnz     .found
        inc     eax
        test    ch,ch
        jnz     .found
        shr     ecx,16
        inc     eax
        test    cl,cl
        jnz     .found
        inc     eax

  .found:
        sub     eax,[str]
        ret
endp         
    

But gives poorer results.. I don't know why.
Post 20 Jun 2014, 13:46
View user's profile Send private message Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
r22 wrote:
Couldn't you simply use the TEST reg8, imm8 (TEST cl, $80) encodings later in the code. Speed wise REG,REG and REG,IMM should be similar, the encoding size will be a byte larger (some fiddling of registers could be done so that you use EAX instead of ECX which would make the TEST al, $80 2 bytes instead of 3).

I didn't understand what about reg8,imm8 ?
Did you meant this?
Code:
  .bytesearch:                  ;!!!
        test    cl,80h
        jnz     .found
        inc     eax
        test    ch,80h
        jnz     .found
        shr     ecx,16
        inc     eax
        test    cl,80h
        jnz     .found
        inc     eax 
    
Post 20 Jun 2014, 13:49
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1180
Location: Unknown
HaHaAnonymous
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 18:10; edited 1 time in total
Post 20 Jun 2014, 14:25
View user's profile Send private message Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
JohnFound wrote:

Yes, good work! BTW, may I use your code in FreshLib? You can of course submit it yourself, if you prefer.


Of course you may use it. I don't know yet how to submit it there. And even more, you need to integrate it to your function back, as I've removed some preceding code.
Post 21 Jun 2014, 00:17
View user's profile Send private message Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
HaHaAnonymous wrote:
The bigger disadvantage I see is that this function uses much more bytes (which of course will not be a problem if speed is more important).
Good work!

Thanks. There are many in-between variations, if you want less code and more speed. Like:
Code:
proc    strlen str

        mov     eax,[str]

  .loop:
        mov     dx,[eax]
        inc     eax
        test    dl,dl
        jz      .found
        inc     eax
        test    dh,dh
        jnz     .loop

  .found:
        sub     eax,[str]
        dec     eax
        ret
endp       
    
Post 21 Jun 2014, 01:31
View user's profile Send private message Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
Now I want to think, how to add the maximum length check and make the function to search for any desired byte. Like this:
Code:
proc    strchar str,char,len

        mov     ecx,[len]
        mov     edi,[str]
        mov     eax,[char]
        repne   scasb

        sub     edi,[str]
        lea     eax,[edi-1]
        ret
endp             
    
Post 21 Jun 2014, 01:45
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1180
Location: Unknown
HaHaAnonymous
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 18:10; edited 1 time in total
Post 21 Jun 2014, 05:20
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
Sasha wrote:
Of course you may use it. I don't know yet how to submit it there. And even more, you need to integrate it to your function back, as I've removed some preceding code.

OK, I will integrate it back in the library and will put a comment about your contribution. I will use the nickname Sasha. If you prefer another nickname, of your real name - let me know with PM.

Submitting to the repository is not so hard, but requires use of fossil version control system.
Then, you have to register yourself in the main repository in order to get needed permissions.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 21 Jun 2014, 15:32
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
HaHaAnonymous wrote:
This method is interesting. I decided to test it and it was a little less than 2 times faster than my ordinary "byte-by-byte" method (608 / 315ms). The bigger disadvantage I see is that this function uses much more bytes (which of course will not be a problem if speed is more important).


HaHaAnonymous, what kind of strings you use in the benchmarks? The speed of this procedure is very dependent on the string length. As a rule, longer strings higher speed gain.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 21 Jun 2014, 15:37
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1180
Location: Unknown
HaHaAnonymous
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 18:10; edited 1 time in total
Post 21 Jun 2014, 15:47
View user's profile Send private message Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
HaHaAnonymous, it happens when buffer is not aligned. The word by word routine is sensitive to disaligment.
Post 22 Jun 2014, 20:14
View user's profile Send private message Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
There are some strange behavior even on aligned(unaligned strings behaves really unpredictable.) strings. Look at the chart below.
Upd: The question is why does yellow faster than blue as it is the same(the string IS aligned you jump after test eax,1), and the blue even smaller.


Description:
Filesize: 42.85 KB
Viewed: 5472 Time(s)

strlen_strange4.png




Last edited by Sasha on 22 Jun 2014, 23:36; edited 2 times in total
Post 22 Jun 2014, 22:03
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1409
Location: Toronto, Canada
AsmGuru62
Maybe try straightforward version:
Code:
strlen:
        mov     edx, [esp + 4]
        xor     eax, eax
        xor     ecx, ecx

align 16
@@:
        cmp     [edx + eax], cl
        je      .done

        add     eax, 1
        jmp     @r

.done:
        ret
    
Post 22 Jun 2014, 22:57
View user's profile Send private message Send e-mail Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
AsmGuru62, do you want to execute all that nops? I thing, you must first align the entire procedure, and then decide if you need to fix the aligments inside.
Post 23 Jun 2014, 00:16
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
How does it compare to rep cmpsb?
Post 23 Jun 2014, 00:22
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.