flat assembler
Message board for the users of flat assembler.

flat assembler > Main > String length

Goto page 1, 2, 3, 4, 5, 6  Next
Author
Thread Post new topic Reply to topic
Sasha



Joined: 17 Nov 2011
Posts: 93
I've wrote a proc, that computes the length of zero terminated string with the MaxLen parameter, that should prevent buffer overflows in some manner. The main goal is to optimize it for speed as possible(as always:))
Code:
proc    StrzLen1 Strz,MaxLen

        mov     eax,[Strz]
        mov     edx,[MaxLen]
        add     edx,eax
        dec     eax              
  .loop:        ;5 instructions inside the loop
        inc     eax
        cmp     byte[eax],0
        jz      .finish_ok
        cmp     eax,ecx
        jb      .loop
  .too_big:     ;The given string is bigger than MaxLen
        or      eax,-1
        jmp     .finish
  .finish_ok:
        sub     eax,[Strz]
  .finish:
        ret
endp                                                 
    

Code:
proc    StrzLen2 Strz,MaxLen

        mov     ecx,[Strz]
        mov     edx,[MaxLen]
        xor     eax,eax
        dec     eax
  .loop:        ;The same number instructions inside the loop, but less preparations
        inc     eax
        cmp     byte[ecx+eax],0
        jz      .finish
        cmp     eax,edx
        jb      .loop
  .too_big:     ;The given string is bigger than MaxLen
        or      eax,-1
  .finish:
        ret
endp                                                     
    

Code:
proc    StrzLen Strz,MaxLen

        mov     edx,[nMaxLen]

        xor     eax,eax
        sub     eax,edx
        dec     eax

        mov     ecx,[Strz]
        add     ecx,edx
        inc     ecx
  .loop:        ;Due to all this manipulations we've got 4 instructions inside the loop, but does it really faster?
        cmp     byte[ecx+eax],0
        jz      .finish_ok
        inc     eax
        jnz     .loop
  .too_big:     ;The given string is bigger than MaxLen
        or      eax,-1
        jmp     .finish
  .finish_ok:
        inc     eax
  .finish:
        ret
endp                                        
    

Help me to choose the best option, or to optimize it more.
No matter how the parameters are passed(this can be changed), I think only the loop is important.
And yes, no repne =) May be it is good for this particular example, but I'm going to write more complex procs, where I have to check more options(parser), and i'm going to use this functions as a templates.
Post 26 Oct 2013, 17:08
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3494
Location: Bulgaria
If you want to optimize for speed you definitely should not compare byte by byte. It is the slowest possible solution. Below is the version of StrLen procedure from FreshLib. As you can see, it reads and compares the string by double words.

But what is more important, using ASCIIZ is not the best option both for speed and security. As you can see, when the input string is native FreshLib string (With handle >= $c0000000) it uses format where on the address-4 the actual length is stored (so called Pascal strings). In this case, the speed of the StrLen is O(1), which is the fastest possible at all.

Code:
proc StrLen, .hString  
begin
        mov     eax, [.hString]
        cmp     eax, $c0000000
        jb      .pointer

        stdcall StrPtr, eax
        jc      .error

        mov     eax, [eax+string.len]
        clc
        return

.error:
        xor     eax, eax
        stc
        return

.pointer:
        push    ecx edx esi edi

; align on dword
.byte1:
        test    eax, 3
        jz      .scan

        cmp     byte [eax], 0
        je      .found

        inc     eax
        jmp     .byte1

.scan:
        mov     ecx, [eax]
        mov     edx, [eax+4]

        lea     eax, [eax+8]

        lea     esi, [ecx-$01010101]
        lea     edi, [edx-$01010101]

        not     ecx
        not     edx

        and     esi, ecx
        and     edi, edx

        and     esi, $80808080
        and     edi, $80808080

        or      esi, edi
        jz      .scan

        sub     eax, 9

; byte 0 was found: so search by bytes.
.byteloop:
        lea     eax, [eax+1]
        cmp     byte [eax], 0
        jne     .byteloop

.found:
        sub     eax, [.hString]
        clc
        pop     edi esi edx ecx
        return
endp    
Post 26 Oct 2013, 17:56
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Scanning by dwords is fast, but I still can't understand how to do it, even looking at your code. And this is just to find a zero byte, and what about more complex scan, like parser? Fasm is very fast, it works with bytes as far as I could understand.
Post 26 Oct 2013, 18:37
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1181
Location: Unknown
Stupid post removed.


Last edited by HaHaAnonymous on 28 Feb 2015, 19:40; edited 2 times in total
Post 26 Oct 2013, 19:53
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3494
Location: Bulgaria
Well, it actually uses quad words, by using the parallel instruction execution.
Post 26 Oct 2013, 20:11
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
HaHaAnonymous wrote:
Off-topic note: Scanning byte by byte you can get length of millions of strings in less than a second if I'm not mistaken (human memory can fail).

Let's take a javascript parser. Do you think it is optimized so hard, to scan by dwords and in parallel?)
Btw, I still can't imagine how to scan text by dwords... parallel instruction execution, yes... May be sse?
Note, also, that dwords must be aligned!
Post 26 Oct 2013, 20:30
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3494
Location: Bulgaria
Sasha wrote:
Btw, I still can't imagine how to scan text by dwords...

Hm, did you read my source? What is not clear? Ask specific questions.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 27 Oct 2013, 04:21
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Trying to understand the .scan: section. I understand less or more what should be there, but still don't see it. You make manipulations and than or esi,edi gives you an answer if there was a zero byte inside the qword.
But when I sad "can't imagine" I meant pars string for words and expressions.
Post 27 Oct 2013, 07:29
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3494
Location: Bulgaria
"Parsing" strings usually means comparing one string with another. It is possible as well and pretty easy. You only need to carefully care about the end cases, when the string length is not multiply of 4 (8 ).
Post 27 Oct 2013, 08:14
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
uart777



Joined: 17 Jan 2012
Posts: 369
John: Not true. Byte copy is not the slowest and what you presented is far from the fastest. With that "stdcall StrPtr", your method may be the slowest. And those are not parallel instructions.

Sasha: For small text, inline 8-32BIT copy/scan is fast. For long text, 64BIT-128BIT MMX/XMM copy/scan is recommended. N-01010101h subtracts 1 from each byte in 32BIT, and if it contains zero/s, sign bit/s will be set. Search for Agner Fog's site, optimization tutorial/PDF, one of the best references.
Post 27 Oct 2013, 08:28
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1389
Location: Toronto, Canada
If we talking about parsing, then it is possible that the length of a string is not needed at all.
There is a method of parsing called 'indestructive parsing', where the string being parsed is not destroyed,
i.e. zero bytes are not written into the string.
Instead, the parser simply picks tokens into following structure:
Code:
struct TOKEN
        pchValue dd ?   ; points to token text within parsed string
        nLength  dd ?   ; # of characters in the token
ends
    

This eliminates the need for taking the token length, since it is already will
be known by scanning the parsed string.

Also, an advantage of such method is that original string is left intact. Usual
parsing will force a full copy of an original string to be parsed and destroyed by null characters.
Post 27 Oct 2013, 12:53
View user's profile Send private message Send e-mail Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
This a meant by parser. This is the way, I search for the begining of the token, while space or tab are delimiters, zero is a terminator, and I don't want to exceed the maxlen. Tested, working.
Code:
proc    FindToken Offset,MaxLen

        mov     eax,[Offset]
        mov     ecx,[MaxLen]
        add     ecx,eax
        xor     edx,edx
        dec     eax
  .loop:
        inc     eax
        cmp     eax,ecx
        ja      .finish_too_big
        mov     dl,byte[eax]
        cmp     edx,020h
        je      .loop
        cmp     edx,09h
        je      .loop
        or      edx,edx
        je      .finish_no_token

  .finish:      ;Returns an offset in eax
        ret

  .finish_too_big:
        or      eax,-1
        ret

  .finish_no_token:
        xor     eax,eax
        ret    
endp      
    


Last edited by Sasha on 28 Oct 2013, 16:55; edited 2 times in total
Post 27 Oct 2013, 13:06
View user's profile Send private message Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Add this proc, and we are ready to parse the command line!(tested,fixed,working)
Code:
proc    GetTokenLength Offset,MaxLen

        mov     eax,[Offset]
        mov     ecx,[MaxLen]
        add     ecx,eax
        dec     eax
        xor     edx,edx
        cmp     byte[eax+1],022h
        jne     .loop
        inc     eax

  .quotedstring:
        inc     eax
        cmp     eax,ecx
        ja      .finish_too_big
        mov     dl,byte[eax]
        cmp     edx,022h
        je      .finish_quote
        or      edx,edx
        jne     .quotedstring
        sub     eax,[Offset]
        ret

  .loop:
        inc     eax
        cmp     eax,ecx
        ja      .finish_too_big
        mov     dl,byte[eax]
        cmp     edx,020h
        je      .finish
        cmp     edx,09h
        je      .finish
        or      edx,edx
        jne     .loop
  .finish:
        sub     eax,[Offset]
        ret

  .finish_quote:
        inc     eax
        sub     eax,[Offset]
        ret

  .finish_too_big:
        or      eax,-1
        ret                             
endp       
    

Some problems found, edited. Now the token can have a zero length, if we pointed to space.
Post 27 Oct 2013, 13:34
View user's profile Send private message Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Quote:

There is a method of parsing called 'indestructive parsing'

This is what I've done.
Quote:
Usual parsing will force a full copy of an original string to be parsed and destroyed by null characters.

Full copy doesn't destroy the original string, too. Also the full copy of a token has one good advantage: it will become aligned!
Post 28 Oct 2013, 17:00
View user's profile Send private message Reply with quote
ejamesr



Joined: 04 Feb 2011
Posts: 52
Location: Provo, Utah, USA
Sasha,

You can speed up small loops by eliminating a comparison operation. The idea is to have one register point to the end of the data you are scanning (your code currently sets ecx to point to the end, which is good). Then, use another register as a negative counter; set it equal to the negative count, and then increment until it is no longer negative, at which point the end of the data has been met.

Here's the basic idea you can adapt:
Code:
   mov    eax, [Offset]
   mov    ecx, [MaxLen]
   add    eax, ecx      ; point to end of data
   neg    ecx           ; negative counter
.newLoop:
   mov    dl, [eax+ecx] ; at first iteration, this grabs the very first byte
   ; ... do your tests here; use 'dl' rather than edx, as it's faster
   ;     in this context.
   ; ... but if you want to use edx, change above
   ;     to 'movzx edx, byte [eax+ecx]
   inc    eax           ; this one operations updates the counter AND
                        ; checks to see if it is within bounds
   js     .newLoop      ; keep going as long as eax is negative

; loop flows through when end is reached
.doFinish:
   add    eax, [MaxLen] ; eax is neg, so add rather than sub
   ret    


I havne't compiled the above so there may be an error or two, it's up to you to adapt and test it.

Enjoy!

Eric

P.S. In the .loop section of your code, ecx points to the first invalid position at the end of the string; you might consider using 'jae .finish_too_big', since if eax is equal to ecx, it's already past the end of the string.
Post 29 Oct 2013, 16:07
View user's profile Send private message Send e-mail Reply with quote
ejamesr



Joined: 04 Feb 2011
Posts: 52
Location: Provo, Utah, USA
John,

Since Sasha was focusing on speed here, I would like to point out two things that could make your code slightly faster and smaller (the speed difference is negligible and will not be seen in the vast majority of scenarios, so you can certainly disregard my comments).

In the first snippet:
Code:
proc StrLen, .hString  
begin
        mov     eax, [.hString]
        cmp     eax, $c0000000
        jb      .pointer

        stdcall StrPtr, eax
        jc      .error

        mov     eax, [eax+string.len]
        clc
        return    
the carry flag has already been tested, so it will be clear before the return ('mov' does not affect the CF); you can eliminate the 'clc' instruction. However, in my code I will typically keep the clc, "just in case" in the future I change the code in a way that might set the CF (as I see you've done here).

In the second snippet:
Code:
; byte 0 was found: so search by bytes.
.byteloop:
        lea     eax, [eax+1]
        cmp     byte [eax], 0
        jne     .byteloop    
you could use 'inc eax' for the first line, which is just one byte (instead of three) and could be faster such as the case where the extra two bytes cause the code to spill over to another cache line of the instruction cache.

Just my unasked-for two cents.

Regards,

Eric
Post 29 Oct 2013, 16:38
View user's profile Send private message Send e-mail Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3494
Location: Bulgaria
ejamesr, thanks for the suggestions. These are really things I missed.
Post 29 Oct 2013, 17:26
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Quote:

You can speed up small loops by eliminating a comparison operation.

Seems it's what I've done in the third example of strlen.
Quote:

P.S. In the .loop section of your code, ecx points to the first invalid position at the end of the string; you might consider using 'jae .finish_too_big', since if eax is equal to ecx, it's already past the end of the string.

First, I thought, that MaxLen will be a maximum length of the string excluding the zero byte. But then I have understood, that it is more intuitive to pass the maximum available memory in the MaxLen, and have already changed the ja to jae. Thank you for your attention.
PS. What do you think, is it more optimal and intuitive to use MaxAddr instead of MaxLen? MaxLen needs to be decremented every time, but MaxAddr always stays the same.
Post 04 Nov 2013, 20:54
View user's profile Send private message Reply with quote
ejamesr



Joined: 04 Feb 2011
Posts: 52
Location: Provo, Utah, USA
Correct... but only for your third example of strlen. And, Oops!, my untested code snippet had some errors, it should have been more like this:
Code:
   mov    eax, [Offset]
   mov    ecx, [MaxLen]
   add    eax, ecx      ; point to end of data
   neg    ecx
   dec    ecx           ; use as negative counter
.newLoop:
   inc    ecx           ; this one operation updates the counter AND
                        ; checks to see if it is within bounds
   jns    .finishTooBig ; exit when ecx no longer negative
   mov    dl, [eax+ecx] ; at first iteration, this grabs the very first byte
   ; ... do your tests here; use 'dl' rather than edx, as it's faster
   ;     in this context.
   ; ... but if you want to use edx, change above
   ;     to 'movzx edx, byte [eax+ecx]

; loop flows through when token is reached
.doFinish:
   lea    eax, [eax+ecx] ; make eax point to token start
   ret    

My comment was actually directed to your implementations of FindToken and GetTokenLength. In both of those you use the following code:
Code:
        inc     eax
        cmp     eax,ecx
        ja      .finish_too_big    

If you use the 'negative counter' method I described here, you could eliminate the 'cmp eax, ecx' line and it would be slightly faster (using nine lines instead of ten for the inner loop). Study it carefully and you can see.

And there's still at least a couple more small changes that could speed up your loop even more, if you wanted.

Best wishes,

Eric
Post 04 Nov 2013, 23:31
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16513
Location: M87*
Where are the speed measurement figures? Without measurement how do you know if you are improving anything?
Post 04 Nov 2013, 23:35
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4, 5, 6  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.