flat assembler
Message board for the users of flat assembler.

Index > Main > Fast Way to Get String Length (StrLen)

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
Code:
macro bdisplay .__stringvar002 {
mov si,.__stringvar002
mov bl,10
call bwritestring
}
macro dw_strlength .__stringvar001 {dw ($-.__stringvar001-1) and $ffff}
org 256

mov si,msg
call StrLen32R ; input: esi=offset to string ; output: ecx=length (w/o 0) esi=pointer to end of string ; destroys: eflags

push cx
bdisplay lengthtext
pop cx

mov ax,cx
call bwriteint

xor ah,ah
int 16h

ret

msg: db 'this is a test string.',0
dw_strlength msg ;length_msg dw endmsg-msg-1
lengthtext: db 'Test string length:',0

StrLen32R: ; input: esi=offset to string ; output: ecx=length (w/o 0) esi=pointer to end of string ; destroys: eflags
        push    eax
        mov     cx,si
.byte:  test    si,11b
        jz      .dword
        lodsb
        or      al,al
        jnz     .byte
        jmp     .done
.dword: lodsd
        or      al,al
        jz      .lastdd
        or      ah,ah
        jz      .thisdd
        test    eax,0ff0000h
        jz      .thirdd
        test    eax,0ff000000h
        jnz     .dword
        inc     si
.thirdd:inc     si
.thisdd:inc     si
.lastdd:sub     si,3
.done:  dec     si
        sub     cx,si
        neg     cx
        pop     eax
ret

bwriteint:     ; AX = signed number, BL = base
  or ax,ax
  jns .printit  ; if it's not negative, we don't print any signs
  push ax
  mov ax,$e2d
  int 10h
  pop ax
  neg ax        ; do the number positive
  .printit:
  push ax bx cx dx
  and bx,$ff
  cmp bl,2 ; base can't be less than 2 Smile
  jge .start
  mov bl,10     ; using bx = 10 instead
  .start:
  xor cx,cx     ; cx = 0
  .new:
  xor dx,dx     ; dx = 0
  div bx        ; number / base
  push dx       ; push the remainder
  inc cx        ; increase the "digit-count"
  or ax,ax      ; if the quotient still is not 0, do it once more
  jnz .new
  .loop:
  pop ax        ; pop the remainder
cmp al,10
sbb al,69h
das
mov ah,$e
int 10h
  loop .loop
  pop dx cx bx ax
ret

bwritestring: ; Writes a 0 terminated string to screen ( string is at ds:si )
push ax bx
mov bx,7 ; use video page 0, normal white
mov ah,$e
localloop:
lodsb
or al,al
jnz next_char
pop bx ax
ret
next_char:
int 10h
jmp localloop
    

[Matrix_Spell_Checking=adjust ['lenght' to 'length'],reason="For Better searching"]Matrix[/Matrix_Spell_Checking]


Last edited by Matrix on 21 Nov 2004, 09:14; edited 2 times in total
Post 18 Nov 2004, 05:47
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17248
Location: In your JS exploiting you and your system
revolution
You need to optimize the instructions you use to take full advantage ot the later processors like P4.

INC --> ADD
OR --> TEST

And remove the difficult to predict branches.

Agner Fogg has published a paper about this topic.

Basically he suggests this:

Code:
;edx=string start
 lea     ecx,[edx+4]             ;load and increment pointer
 mov     ebx,[edx]               ;read first 4 bytes
 lea     edx,[edx+7]             ;pointer+7 used in the end
._1:  lea     eax,[ebx-01010101h]     ;subtract 1 from each byte
  xor     ebx,-1                  ;invert all bytes
   and     eax,ebx                 ;and these two
      mov     ebx,[ecx]               ;read next 4 bytes
  add     ecx,4                   ;increment pointer
  and     eax,80808080h           ;test all sign bits
 jz      ._1                     ;no zero bytes, continue loop
       test    eax,00008080h           ;test first two bytes
       jnz     ._2
 shr     eax,16                  ;not in the first 2 bytes
   add     ecx,2
._2:       shl     al,1                    ;use carry flag to avoid a branch
   sbb     ecx,edx                 ;compute length
     lea     edx,[edx-7]             ;restore pointer
    
Post 18 Nov 2004, 08:33
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
You really ARE a revolution Very Happy . A man worthy of his name.

Tight and fast - gotta luv those lea's.
Post 18 Nov 2004, 13:52
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
hy revolution,
its cool Cool
Post 19 Nov 2004, 03:17
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
I re-read the title (FAST) so short C-compiler code doesn't help much
Code:
    mov     edx,ebx
    cmp     byte ptr [ebx],0
    je      l2
l1: mov     ah,[ebx+1]       ; U
    inc     ebx              ;   V
    test    ah,ah            ; U
    jne     l1               ;   V +1brt
l2: sub     ebx,edx
;Pre PIII optimized I think
    

so I found this DWORD scanning algorithm:
Code:
; by Paul Hsieh

    lea     ecx,[ebx-1]
l1: inc     ecx
    test    ecx,3
    jz      l2
    cmp     byte ptr [ecx],0
    jne     l1
    jmp     l6
l2: mov     eax,[ecx]        ; U
    add     ecx,4            ;   V
    test    al,al            ; U
    jz      l5               ;   V
    test    ah,ah            ; U
    jz      l4               ;   V
    test    eax,0ff0000h     ; U
    jz      l3               ;   V
    test    eax,0ff000000h   ; U
    jnz     l2               ;   V +1brt
    inc     ecx
l3: inc     ecx
l4: inc     ecx
l5: sub     ecx,4
l6: sub     ecx,ebx
    

OK, I won't post all the code here
Go and check out http://www.azillionmonkeys.com/qed/asmexample.html
for more code (it under 5.)

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 19 Nov 2004, 15:28
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
hy Madis731
your code seems familiar to me from somewhere Smile
(its nearly the same i have posted)
Post 19 Nov 2004, 16:51
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
Oh, sorry, I can see sarcasm in your post :$

Now I see you indeed have posted the same code - I didn't look it that closely before. How come it's the same solution with different registers ^o)
even with different types (WORD/DWORD) of register?
Post 19 Nov 2004, 19:15
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
well exactly i wrote that function i posted,
i wrote it just before i posted it Smile
and its in my style, they are just familiar,
same structure, but i see the system in my codes,
in other's codes there used to be difference - however i'm on
doing some class codes too. Cool
my codes usually are what i design them to be, depennding the purpose of code, i guess some of you are doing familiar.
in some cases two codes can be the same without sarcasm.
especially when the code is short ( not a 10 MB source Smile )
Post 19 Nov 2004, 20:47
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
hy all,
hy revolution
you avoided a branch, (however it didn't help much)
and i have removed some unneccessary instructions Smile
(StrLen32M)
and after dword aligning it could be faster: (StrLen32Z)
Code:
macro bdisplay .__stringvar002 {
mov si,.__stringvar002
mov bl,10
call bwritestring
}
macro dw_strlength .__stringvar001 {dw ($-.__stringvar001-1) and $ffff}
org 256

mov si,msg
call StrLen32R ; input: esi=offset to string ; output: ecx=length (w/o 0) esi=pointer to end of string ; destroys: eflags

push cx
bdisplay lengthtext
pop cx

mov ax,cx
call bwriteint
call bendline

mov edx,msg
call StrLen32 ; input: esi=offset to string ; output: ecx=length (w/o 0) esi=pointer to end of string ; destroys: eflags

push cx
bdisplay lengthtext
pop cx

mov ax,cx
call bwriteint
call bendline

mov esi,msg
call StrLen32M ; input: esi=offset to string ; output: ecx=length (w/o 0) esi=pointer to end of string ; destroys: eflags

push cx
bdisplay lengthtext
pop cx

mov ax,cx
call bwriteint
call bendline

mov si,msg
call StrLen32Z ; input: esi=offset to string ; output: ecx=length (w/o 0) esi=pointer to end of string ; destroys: eflags

push cx
bdisplay lengthtext
pop cx

mov ax,cx
call bwriteint
call bendline

xor ah,ah
int 16h

ret

msg: db 'this is a test string.',0
;msg: db '123',0
dw_strlength msg ;length_msg dw endmsg-msg-1
lengthtext: db 'Test string length:',0

StrLen32R: ; input: esi=offset to string ; output: ecx=length (w/o 0) esi=pointer to end of string ; destroys: eflags
        mov     cx,si
.byte:  test    si,11b
        jz      .dword
        lodsb
        or      al,al
        jnz     .byte
        jmp     .done
.dword: lodsd
        or      al,al
        jz      .lastdd
        or      ah,ah
        jz      .thisdd
        test    eax,0ff0000h
        jz      .thirdd
        test    eax,0ff000000h
        jnz     .dword
        inc     si
.thirdd:inc     si
.thisdd:inc     si
.lastdd:sub     si,3
.done:  dec     si
        sub     cx,si
        neg     cx
ret

StrLen32: ;edx=string start, returns: ecx=length
        lea     ecx,[edx+4]             ;load and increment pointer
        mov     ebx,[edx]               ;read first 4 bytes
        lea     edx,[edx+7]             ;pointer+7 used in the end
._1:    lea     eax,[ebx-01010101h]     ;subtract 1 from each byte
        xor     ebx,-1                  ;invert all bytes
        and     eax,ebx                 ;and these two
        mov     ebx,[ecx]               ;read next 4 bytes
        add     ecx,4                   ;increment pointer
        and     eax,80808080h           ;test all sign bits
        jz      ._1                     ;no zero bytes, continue loop
        test    eax,00008080h           ;test first two bytes
        jnz     ._2
        shr     eax,16                  ;not in the first 2 bytes
        add     ecx,2
._2:    shl     al,1                    ;use carry flag to avoid a branch
        sbb     ecx,edx                 ;compute length
        lea     edx,[edx-7]             ;restore pointer
ret

StrLen32M: ;esi=string start, returns: ecx=length
    mov    ecx,esi
._1:lodsd
    lea    ebx,[eax-$01010101]
    xor    eax,-1
    and    ebx,eax
    and    ebx,$80808080
    jz     ._1
    add    ecx,4
.lp:shr    ebx,8
    jc     .sb
    dec    ecx
    jnz    .lp
.sb:neg    ecx
    add    ecx,esi
ret

StrLen32Z: ; input: esi=offset to string ; output: ecx=length (w/o 0) esi=pointer to end of string ; destroys: eflags
        mov     eax,esi
        and     eax,11b
        jz      .dword
        mov     ecx,4
        sub     ecx,eax
.byte:  repnz   scasb
        or      cx,cx
        jz      .dword
        mov     ecx,esi
        jmp     .sb
.dword: mov     ecx,esi
.loop:  lodsd
        lea     ebx,[eax-$01010101]
        xor     eax,-1
        and     ebx,eax
        and     ebx,$80808080
        jz     .loop
    add    ecx,4
.lp:shr    ebx,8
    jc     .sb
    dec    ecx
    jnz    .lp
.sb:neg    ecx
    add    ecx,esi
ret

bwriteint:     ; AX = signed number, BL = base
  or ax,ax
  jns .printit  ; if it's not negative, we don't print any signs
  push ax
  mov ax,$e2d
  int 10h
  pop ax
  neg ax        ; do the number positive
  .printit:
  push ax bx cx dx
  and bx,$ff
  cmp bl,2 ; base can't be less than 2 Smile
  jge .start
  mov bl,10     ; using bx = 10 instead
  .start:
  xor cx,cx     ; cx = 0
  .new:
  xor dx,dx     ; dx = 0
  div bx        ; number / base
  push dx       ; push the remainder
  inc cx        ; increase the "digit-count"
  or ax,ax      ; if the quotient still is not 0, do it once more
  jnz .new
  .loop:
  pop ax        ; pop the remainder
cmp al,10
sbb al,69h
das
mov ah,$e
int 10h
  loop .loop
  pop dx cx bx ax
ret

bendline: ; ends the line ( adds y loc = 1 x loc = 0 )
push ax bx ; returns:  AX = 0xE0A ; BX = 7
mov bx,7
mov ax,$E0D
int 10h
mov al,$A
int 10h
pop bx ax
ret

bwritestring: ; Writes a 0 terminated string to screen ( string is at ds:si )
push ax bx
mov bx,7 ; use video page 0, normal white
mov ah,$e
localloop:
lodsb
or al,al
jnz next_char
pop bx ax
ret
next_char:
int 10h
jmp localloop
    

ps.: its a good feeling to know what am i doing Smile
Madis731 i like LEA too, but i don't use it when i don't need it
Post 21 Nov 2004, 11:41
View user's profile Send private message Visit poster's website Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio
This is not the kind of things that needs optimization,but just for fun here is my version, the main loop is similar to the one used by Agner fog ,but
the pointer is aligned ,and from my tests if the code is in the adecuate position it can process 4 bytes in 3 cpu clocks.

use32
strlen: ;uses the 'C' calling convention the pointer is pushed on the stack

;registers are saved and result is returned in eax
push ecx
push esi
mov esi,[esp+4+8]
mov eax,[esi]
and esi,-4
.l1:
add esi,4
lea ecx,[eax-1010101h]
not eax
and ecx,eax
mov eax,[esi]
and ecx,80808080h
jz .l1
bsf eax,ecx
sub esi,[esp+4+8]
sar eax,3
lea eax,[eax+esi-4]
pop esi
pop ecx
ret ;replace by 'ret 4' to use the Pascal calling convention
Post 21 Nov 2004, 17:10
View user's profile Send private message Visit poster's website Reply with quote
roticv



Joined: 19 Jun 2003
Posts: 374
Location: Singapore
roticv
Matrix what kind of strings are you interested in? Long strings?
Post 21 Nov 2004, 17:20
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
thnx for all for ideas and everything,
for me this thread is complete with my StrLen32Z version, of course if someone wanna continue optimizing, feel free to do so.

looks like MMX is next step, but for a 0 terminated string? Smile
well i don't see any use of it yet.
Post 21 Nov 2004, 17:39
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
Didn't you look under the link I gave.
It already has an MMX solution! Oh, I guess I have to post it anyways.
You people are so lazy Razz
Code:
; MMX version by Ryan Mack
; Roughly 13 + 3n + BRANCH clocks on a P-II

const unsigned __int64 STRINGTBL[8] = {0, 0xff,
        0xffff, 0xffffff, 0xffffffff, 0xffffffffff,
        0xffffffffffff, 0xffffffffffffff}

/* ... */

    pxor     mm1, mm1
    mov      ecx, eax
    mov      edx, eax
    and      ecx, -8
    and      eax, 7
    movq     mm0, [ecx]
    por      mm0, [STRINGTBL+eax*8]
MAIN:
    add      ecx, 8
    pcmpeqb  mm0, mm1
    packsswb mm0, mm0
    movd     eax, mm0
    movq     mm0, [ecx]
    test     eax, eax
    jz       MAIN

    bsf      eax, eax
    shr      eax, 2
    lea      ecx, [ecx+eax-8]
    sub      ecx, edx
    
Post 21 Nov 2004, 17:57
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
Madis731 wrote:
Didn't you look under the link I gave.
It already has an MMX solution! Oh, I guess I have to post it anyways.
You people are so lazy Razz

Smile
you're so right
Post 21 Nov 2004, 18:01
View user's profile Send private message Visit poster's website Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85
Care should be taken in the MMX version if the string spans a page(8-byte access may cause a page fault).

Not that I like this kind of strings(they're overall slow per nature), but will be a big win "align 8" string allocations.
Post 03 Feb 2005, 12:24
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S.
Nope, MMX data may be located at any address, the alignment must be done for data accesed by some SSE instruction like MOVAPS.
However, data shouldn't straddle a cache-line boundary because this gives speed penalty.
Post 04 Feb 2005, 10:11
View user's profile Send private message Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85
S.T.A.S., I'm refering on general memory access. What about if four bytes fall into your valid page, and next 4 into an unmapped page? AV, eh?
Post 04 Feb 2005, 13:36
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
This is a rare occasion but might happen - like when you reserve 1007 Bytes in the end of a page and pass 126QW to your MMX routine. I haven't tryed but maybe MMX fills unknown values with zeroes Very Happy or random Wink
Post 04 Feb 2005, 14:57
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85
The solution is so simple, align your C-strings on qword boundary, or better, forget C(reep)-strings.
Post 04 Feb 2005, 18:41
View user's profile Send private message Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85
Madis731 wrote:
I haven't tryed but maybe MMX fills unknown values with zeroes Very Happy or random Wink

That would be silly. MMX gives the same feedback.

Code:
invoke VirtualAlloc, 0, 8192, MEM_RESERVE, PAGE_NOACCESS
invoke VirtualAlloc, eax, 4096, MEM_COMMIT, PAGE_READWRITE

add eax, 4096-4
movq mm0, [eax]    

_________________
"I assemble, therefore I am"

If you got some spare time, visit my blog: http://www.beppe.theblog.com.br/ and sign my guestmap
Post 04 Feb 2005, 20:05
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.