flat assembler
Message board for the users of flat assembler.

Index > Main > Optimize str.len for FASMLIB

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 02 Jan 2007, 09:49
Hi, as the string library in FASMLIB already got to stage where i won't change it's functionality, maybe it's time to optimize.

You could challenge yourself and try to make it faster. Here is original interface and code. You can copy-paste it to your project for measuring speed.

Of course, you will stay noted as author in sources, as long as your version is fastest.

Code:
;; str.len
; desc: gets length of string
; args: string - string pointer
;       buflen - length of string buffer
; ret: CF set on error, otherwise
;      OF set and EAX=buflen if ending 0 wasn't found in buffer, otherwise
;      OF clear and EAX=length of string
; error: ERR_ZERO_SIZE - buflen=0
;================================
proc str.len string, buflen
        push    ecx edi
        pushf
        cld

        ;ERR_ZERO_SIZE if buflen=0
        cmp     [buflen], 0
        je      .error_buflen_zero

        ;clear OF in stack
        and     word [esp], not (1 shl 11)

        ;find ending zero in buffer
        mov     ecx, [buflen]
        mov     edi, [string]
        xor     al,al
        repne   scasb

        ;set OF if ending 0 wasn't found
        je      @f
        or      word [esp], 1 shl 11
        dec     ecx             ;to disable effect of "dec eax", if there was no ending zero (ugly, i know...)
@@:

        ;EAX = number of readen chars
        mov     eax, [buflen]
        sub     eax, ecx
        dec     eax             ;ending zero

        ;clear CF
        and     byte [esp], not 1

.r:     popf
        pop     edi ecx
        ret

.error_buflen_zero:
        mov     eax, ERR_ZERO_SIZE
.rc:    or      byte [esp], 1   ;set CF in stack
        jmp     .r

endp
    


You can use instruction up to pentium. Even versions with usage of MMX, SSE etc. are welcomed.
Post 02 Jan 2007, 09:49
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 02 Jan 2007, 09:50
as an inspiration, here is very fast version of strlen, unfortunatelly it is oldschool C strlen() without buffer size checking:

Code:
strlen  proc
        mov     ecx,[esp+4]
        test    ecx,3
        je      short main_loop

str_misaligned:
        mov     al,byte ptr [ecx]
        inc     ecx
        test    al,al
        je      short byte_3
        test    ecx,3
        jne     short str_misaligned

        add     eax,dword ptr 0

        align   16

main_loop:
        mov     eax,dword ptr [ecx]
        mov     edx,7efefeffh
        add     edx,eax
        xor     eax,-1
        xor     eax,edx
        add     ecx,4
        test    eax,81010100h
        je      short main_loop
        ; found zero byte in the loop
        mov     eax,[ecx - 4]
        test    al,al                   ; is it byte 0
        je      short byte_0
        test    ah,ah                   ; is it byte 1
        je      short byte_1
        test    eax,00ff0000h           ; is it byte 2
        je      short byte_2
        test    eax,0ff000000h          ; is it byte 3
        je      short byte_3
        jmp     short main_loop         ; taken if bits 24-30 are clear and bit
                                        ; 31 is set

byte_3:
        lea     eax,[ecx - 1]
        mov     ecx,string
        sub     eax,ecx
        ret
byte_2:
        lea     eax,[ecx - 2]
        mov     ecx,string
        sub     eax,ecx
        ret
byte_1:
        lea     eax,[ecx - 3]
        mov     ecx,string
        sub     eax,ecx
        ret
byte_0:
        lea     eax,[ecx - 4]
        mov     ecx,string
        sub     eax,ecx
        ret
strlen  endp    
Post 02 Jan 2007, 09:50
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 02 Jan 2007, 20:53
If the purpose is a very fast string len then using 16byte aligned string buffers with enough padding at the end of the string would allow you to use SSE which would be the fastest method (on large strings).

That MMX string len (which is just a port of the C routine above from general purpose to mmx, on a 64bit system you could do the same using the general purpose registers) that's floated around these forums would probably be teh fastest taking into account all situations (large, small strings, etc).
Post 02 Jan 2007, 20:53
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 02 Jan 2007, 23:18
You need to make some thought about what string lengths and alignments you're expecting to see most. If you're only getting short strings, there might be too much startup overhead in MMX/SSE. If all your input strings are aligned, there's no reason to do alignment yourself, et cetera.

My guesstimate is that most strings in most apps are going to be rather short, so for a generic routine something similar to what you posted should do fairly well, leaving MMX/SSE routines to the specific cases where you know you'll be dealing with huge strings.
Post 02 Jan 2007, 23:18
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 02 Jan 2007, 23:34
do you think i should make some "str.len.MMX"? This wouldn't be of that much importance now Smile

still, the way suggested in second post is much faster.
Post 02 Jan 2007, 23:34
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 03 Jan 2007, 06:34
You might also want to check this one out, from Donkey's string lib (goasm syntax):
Code:
; from donkey's string lib, see www.assembler.ca
lszLen FRAME pString
        uses ebx,edx,ecx,esi

                mov     eax,[pString]
                mov esi,eax
                test eax,3
                jz >N1
                        add eax,3
                        and eax,-4
ALIGN 4
                :
                        mov dl,[esi]
                        or dl,dl
                        jz >E1
                        inc esi
                        cmp esi,eax
                jl <

                N1:
                lea             edx,[eax+3]

ALIGN 4
        :  
                mov             ebx,[eax]
                add             eax,4
                lea             ecx,[ebx-01010101h]
                not             ebx
                and             ecx,ebx
                and             ecx,80808080h    
                jz              <
                test    ecx,00008080h
                jnz             >
                shr             ecx,16
                add             eax,2
        :
                shl             cl,1
                sbb             eax,edx

        add eax,esi
        sub eax,[pString]
        ret

        E1:
        mov eax,esi
        sub eax,[pString]
        ret
ENDF
    
Post 03 Jan 2007, 06:34
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 03 Jan 2007, 06:43
reminds the version i posted in second post. thanks
Post 03 Jan 2007, 06:43
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 03 Jan 2007, 23:17
Wouldn't it be nice to have a program benchmark these functions (strlen, memcpy, etc) with different input sizes in bytes and graph the results...
Post 03 Jan 2007, 23:17
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 04 Jan 2007, 02:26
hi,
might worth a try
Fast Way to Get String Length (StrLen)

i vote for StrLen32Z
thank revolution for the input,
i just tweaked it a bit.
Post 04 Jan 2007, 02:26
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 04 Jan 2007, 06:48
thanks. but note that i still must also check buffer size (buflen), and never access memory behind it:

Code:
;; str.len 
; desc: gets length of string 
; args: string - string pointer 
;       buflen - length of string buffer 
; ret: CF set on error, otherwise 
;      OF set and EAX=buflen if ending 0 wasn't found in buffer, otherwise 
;      OF clear and EAX=length of string 
; error: ERR_ZERO_SIZE - buflen=0 
;=================================    
Post 04 Jan 2007, 06:48
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 04 Jan 2007, 17:19
vid wrote:
thanks. but note that i still must also check buffer size (buflen), and never access memory behind it:

Code:
;; str.len 
; error: ERR_ZERO_SIZE - buflen=0 
;=================================    


Code:
.error_buflen_zero:     mov     eax, ERR_ZERO_SIZE
    



what is the purpose of this? ;/
Post 04 Jan 2007, 17:19
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 04 Jan 2007, 18:20
vid wrote:
You can use instruction up to pentium. Even versions with usage of MMX, SSE etc. are welcomed.

SSE set was released with Pentium III...
Post 04 Jan 2007, 18:20
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 05 Jan 2007, 13:57
One optimization of design is to use "length-prefixed" strings, aka:

1 byte (or more if the string is long) representing the length
then follows the string

Finding the length requires the trivial load from memory operation (via mov or any other optimization you need).

But that's different design so probably it's not such a good idea Wink
Post 05 Jan 2007, 13:57
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 05 Jan 2007, 14:20
grey beast: that is another area. such strings are planned too, but we also must support classical strings too.
Post 05 Jan 2007, 14:20
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 05 Jan 2007, 14:22
Two words: hybrid strings.
Post 05 Jan 2007, 14:22
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 05 Jan 2007, 14:24
vid wrote:
; ret: CF set on error, otherwise
; OF set and EAX=buflen if ending 0 wasn't found in buffer, otherwise
; OF clear and EAX=length of string
; error: ERR_ZERO_SIZE - buflen=0

Shouldn't buflen=0 just cause the OF set with EAX=0? It appears to me as a logical conclusion from the above: in buffer of length 0 the ending 0 cannot be found, and thus we should get OF=1 and EAX=0.
Post 05 Jan 2007, 14:24
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 05 Jan 2007, 14:27
The_Grey_Beast, f0dder: the problem is to give a solution without changing the presumptions. Wink
Post 05 Jan 2007, 14:27
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 05 Jan 2007, 15:22
I know, that's why I stated it as a design optimization, rather than as the optimization itself (on the same design). I won't be of much help since I am unfamiliar with most MMX and SSE yet...
Post 05 Jan 2007, 15:22
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 05 Jan 2007, 16:34
Quote:
Shouldn't buflen=0 just cause the OF set with EAX=0? It appears to me as a logical conclusion from the above: in buffer of length 0 the ending 0 cannot be found, and thus we should get OF=1 and EAX=0.
my intention was to help catching bugs. Such case (buflen=0) is rarely needed by user, but can happen as bug. I already reduced few such zero-size catching in other places, but i am not conviced if this one should be too.

Especially for strlen, which would be mostly used on entire strings (not just some part of them, that can happen to be of zero length).
Post 05 Jan 2007, 16:34
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 06 Jan 2007, 12:44
Thus maybe you should rewrite the description to something like:
"OF set and EAX=buflen if buffer is not empty, but doesn't contain ending 0". Wink
Post 06 Jan 2007, 12:44
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.