flat assembler
Message board for the users of flat assembler.

Index > Main > Finding string length with the help of mmx instructions.

Author
Thread Post new topic Reply to topic
Mac2004



Joined: 15 Dec 2003
Posts: 314
Mac2004 12 Nov 2008, 18:02
Hi!

Here's an example how utilize mmx extensions in finding null terminationg string's length.

Code:
align 8
;********************************************************************************************
; MMX_Find_String_Length   Finds string length in bytes (excluding terminating null) by
;                               using mmx cpu extensions. The procedure requires mmx support
;                               (bit 23 in CPUID standard function 1) to be detected before
;                                using this. 
;
; Input:  esi--> A source string pointer, which must be 8 byte aligned.
;
; Output: ecx--> String length in bytes.
;
;
;DISCLAIMER:       You can use this code only at your risk. There is no warranty of any kind 
;         neither express or implied! You acknowledge that this code may contain bugs 
;               although not intentional ones.
;
;********************************************************************************************

MMX_Find_String_Length:

 push eax
    push ebx
    push esi
    
    mov ebx,temp_qword                      ;Set ebx to point temp buffer. This way we avoid overwriting [ebx] contents in memory. 
     pxor mm1,mm1                            ;Clear mmx register.
        xor ecx,ecx                             ;Clear the length counter.

      ;--------------------------
 ;Compare 8 bytes each time.
 ;--------------------------
.compare_loop:

       movq mm2,qword[esi]                     ;Get 8 bytes.
       pcmpeqb mm2,mm1                         ;Compare 8 bytes each time. Pcmpeqb instruction trashes
                                             ;mm2 register contents.
     movq qword[ebx],mm2                     ;Get result qword back to general register.
 
    cmp dword[ebx],0                        ;Anything from the first dword?
     jnz .first_dword_has_a_hit              ;If non zero value is detected, we have a hit.. 

        cmp dword[ebx+4],0                      ;Anything from the second dword?
    jnz .second_dword_has_a_hit             ;If non zero value is detected, we have a hit.. 

        add ecx,8                               ;No luck with these eight bytes.
    add esi,8                               ;Update source pointer.                                 
    jmp .compare_loop                       ;Test next qword.

       ;----------------------------------------
   ;Now we need to finalize our scan by
        ;inspecting which byte produced the hit.
    ;----------------------------------------

.second_dword_has_a_hit:   
    add ecx,4                               ;Detected length is least four bytes longer..
       mov eax,[ebx+4]                         ;Get the second dword under inspection.
     jmp .start_byte_testing

.first_dword_has_a_hit:      
    mov eax,[ebx]                           ;Get the first dword under inspection.
.start_byte_testing:

      ;Test individual bytes.
     ;---------------------
      mov edx,eax                             ;Load original dword.                                   
    and edx,0x000000ff                      ;First byte?
        jnz .done
   inc ecx

 mov edx,eax                             ;Restore original dword.
    and edx,0x0000ff00                      ;Second byte?
       jnz .done
   inc ecx

 mov edx,eax                             ;Restore original dword.
    and edx,0x00ff0000                      ;Third byte?
        jnz .done
   inc ecx

 mov edx,eax                             ;Restore original dword.
    and edx,0xff000000                      ;Fourth byte?
;      jnz .done                                       

.done:
  pop esi
     pop ebx
     pop eax
     ret

align 8
 
     temp_qword      dq 0                    ;A Temporary buffer reservation.
    


Bug reports, optimizations etc. comments are by all means welcome. I think there are ways to optimize the code more, but this is intended to be an example.

regards,
Mac2004
Post 12 Nov 2008, 18:02
View user's profile Send private message Reply with quote
DJ Mauretto



Joined: 14 Mar 2007
Posts: 464
Location: Rome,Italy
DJ Mauretto 12 Nov 2008, 19:26
Good Smile
here a program to test it
Code:
format PE CONSOLE 4.0 
entry start 

Include 'win32a.inc'   


;=============================================================================
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; CODE ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;=============================================================================
section ".code" code readable writeable executable

start:
      mov     esi,Hello
   call    MMX_Find_String_Length
      push    ecx
 push    Hello                   ; Offset String zero terminated
     call    [printf]
    add     esp,4
       push    String                  ; Offset String zero terminated
     call    [printf]
    add     esp,8
       ret


;=============================================================================
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; PROC ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;=============================================================================

align 8
;********************************************************************************************
; MMX_Find_String_Length        Finds string length in bytes (excluding terminating null) by
;                               using mmx cpu extensions. The procedure requires mmx support
;                               (bit 23 in CPUID standard function 1) to be detected before
;                               using this. 
;
; Input:  esi--> A source string pointer, which must be 8 byte aligned.
;
; Output: ecx--> String length in bytes.
;
;
;DISCLAIMER:    You can use this code only at your risk. There is no warranty of any kind 
;               neither express or implied! You acknowledge that this code may contain bugs 
;               although not intentional ones.
;
;********************************************************************************************

MMX_Find_String_Length:

        push eax
        push ebx
        push esi
        
        mov ebx,temp_qword                      ;Set ebx to point temp buffer. This way we avoid overwriting [ebx] contents in memory. 
        pxor mm1,mm1                            ;Clear mmx register.
        xor ecx,ecx                             ;Clear the length counter.

        ;--------------------------
        ;Compare 8 bytes each time.
        ;--------------------------
.compare_loop:

        movq mm2,qword[esi]                     ;Get 8 bytes.
        pcmpeqb mm2,mm1                         ;Compare 8 bytes each time. Pcmpeqb instruction trashes
                                                ;mm2 register contents.
        movq qword[ebx],mm2                     ;Get result qword back to general register.
        
        cmp dword[ebx],0                        ;Anything from the first dword?
        jnz .first_dword_has_a_hit              ;If non zero value is detected, we have a hit.. 

        cmp dword[ebx+4],0                      ;Anything from the second dword?
        jnz .second_dword_has_a_hit             ;If non zero value is detected, we have a hit.. 

        add ecx,8                               ;No luck with these eight bytes.
        add esi,8                               ;Update source pointer.                                 
        jmp .compare_loop                       ;Test next qword.

        ;----------------------------------------
        ;Now we need to finalize our scan by
        ;inspecting which byte produced the hit.
        ;----------------------------------------

.second_dword_has_a_hit:        
        add ecx,4                               ;Detected length is least four bytes longer..
        mov eax,[ebx+4]                         ;Get the second dword under inspection.
        jmp .start_byte_testing

.first_dword_has_a_hit: 
        mov eax,[ebx]                           ;Get the first dword under inspection.
.start_byte_testing:

        ;Test individual bytes.
        ;---------------------
        mov edx,eax                             ;Load original dword.                                   
        and edx,0x000000ff                      ;First byte?
        jnz .done
        inc ecx

        mov edx,eax                             ;Restore original dword.
        and edx,0x0000ff00                      ;Second byte?
        jnz .done
        inc ecx

        mov edx,eax                             ;Restore original dword.
        and edx,0x00ff0000                      ;Third byte?
        jnz .done
        inc ecx

        mov edx,eax                             ;Restore original dword.
        and edx,0xff000000                      ;Fourth byte?
;       jnz .done                                       

.done:
        pop esi
        pop ebx
        pop eax
        ret

;=============================================================================
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; DATA ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;=============================================================================
section '.data' data readable writeable

align 8
 
        temp_qword      dq 0                    ; A Temporary buffer reservation.

Hello   DB "Hello World",0
String  DB " = %d",13,10,0
 
;=============================================================================
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; IDATA ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;=============================================================================
section '.idata' import data readable writeable   
  
  library  msvcrt,'msvcrt.dll'       
  
  import msvcrt,\   
      printf,'printf'
    

_________________
Nil Volentibus Arduum Razz
Post 12 Nov 2008, 19:26
View user's profile Send private message Reply with quote
Mac2004



Joined: 15 Dec 2003
Posts: 314
Mac2004 12 Nov 2008, 19:45
Quote:
Good Smile
here a program to test it


DJ Mauretto: Thanx, you were pretty fast Smile

regards,
Mac2004
Post 12 Nov 2008, 19:45
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 12 Nov 2008, 19:59
Moved to main.

BTW this part
Code:
        movq qword[ebx],mm2                     ;Get result qword back to general register.
        
        cmp dword[ebx],0    

Could be done in such a way that avoids using memory? (Not "movd eax, mm2" since it is an AMD extension of MMX and an SSE2 instruction).

[edit]I was wrong... so I suggest changing to this:

Code:
movd eax, mm2
cmp eax, 0    
Post 12 Nov 2008, 19:59
View user's profile Send private message Reply with quote
Mac2004



Joined: 15 Dec 2003
Posts: 314
Mac2004 12 Nov 2008, 22:40
If I remember correctly, the AMD64 optimization manual discourages using
movd eax,mm1 style instruction format due to fact that 'half' mmx register access
causes a stall. Some stall problems also occur while mixing mmx and general
registers. The manual recommended saving mmx register to memory instead
of general register.

I'am not sure whether Intel cpu's have similar problems or not.

regards,
Mac2004
Post 12 Nov 2008, 22:40
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 13 Nov 2008, 08:05
Mac2004,

Things are somewhat worse.
Software Optimization Guide for AMD64 wrote:
Rationale
The register-to-register forms of the MOVD instruction are either VectorPath or DirectPath Double
instructions. When compared with DirectPath Single instructions, VectorPath and DirectPath Double
instructions have comparatively longer execution latencies. In addition, VectorPath instructions
prevent the processor from simultaneously decoding other insructions.

Example
Avoid code like this, which copies a value directly from an MMX register to a general-purpose
register:
Code:
movd eax, mm2    
If it is absolutely necessary to copy a value from an MMX register to a general-purpose register (or
vice versa), use separate store and load instructions, separating them by at least 10 instructions:
Code:
movd DWORD PTR temp, mm2 ; Store the value in memory.
...
; At least 10 other instructions appear here.
...
mov eax, DWORD PTR temp ; Load the value from memory.    
Post 13 Nov 2008, 08:05
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 13 Nov 2008, 09:25
I just thought: "Why do we need to store that qword? To simply test each byte of it? Hmm…". This is what I contrive:
Code:
strlen:
; expects:
;   esi == address of ASCIIZ
;
; modifies:
;   mm1
;
; returns:
;   ecx == length of ASCIIZ
;   mm0 == 0

        push    esi
        pxor    mm0, mm0

.compare_64:
        movq    mm1, qword [esi]
        add     esi, 8
        pcmpeqb mm1, mm0                ; mm1.byte[i] == -1 if byte [esi+i] == 0, 0 otherwise
        pmovmskb ecx, mm1               ; cl.bit[i] == mm1.byte[i].bit[7]
        bsf     ecx, ecx                ; cl == index of rightmost 1 bit, ZF == 1 if none
        jz      .compare_64             ; ZF == 1 if no zero bytes in qword [esi]
        lea     ecx, [esi-8+ecx]        ; ecx points to zero byte
        pop     esi
        sub     ecx, esi
        ret    
I'm not sure that pmovmskb is available on P-MMX though…
Post 13 Nov 2008, 09:25
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 13 Nov 2008, 13:27
Quote:
The PMOVMSKB instruction is an AMD extension to MMX™ instruction set and is an
SSE instruction. The presence of this instruction set is indicated by CPUID feature
bits. (See “CPUID” in Volume 3.)
Post 13 Nov 2008, 13:27
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 13 Nov 2008, 13:47
LocoDelAssembly,

Thanks for info, does it imply that Pentium MMX doesn't have pmovmskb? Probably yes, because of SSE reference. I've searched MazeGen's x86 reference, no match to shed some light (no smoking then Wink).
Post 13 Nov 2008, 13:47
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 13 Nov 2008, 13:57
Yep, that means but I could later bring back from the death my old PMMX 200 MHz to confirm this Wink

I have searched in http://softpixel.com/~cwright/programming/simd/ before digging inside AMD64 manuals, not sure about the correctness of this site though, but seems that it was right at this one.
Post 13 Nov 2008, 13:57
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 13 Nov 2008, 14:45
baldr wrote:
I've searched MazeGen's x86 reference, no match to shed some light (no smoking then Wink).

Heck, how did you searched? Wink

It is there, clearly says that PMOVMSKB it is P3+, SSE1:

http://ref.x86asm.net/coder32.html#x0FD7

(PMMX is indicated by PX code)
Post 13 Nov 2008, 14:45
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 13 Nov 2008, 14:57
MazeGen,

I've searched x86reference.xml, downloaded 2008-11-07. You're right, online version contains it. Sorry.

LocoDelAssembly,

Anyway, there's some use for bsf… Should I cross-post that code in mattst88's thread? Wink
Post 13 Nov 2008, 14:57
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 13 Nov 2008, 15:21
baldr, I'm sorry, I forgot to upload the most recent version of the XML. Will upload it few days. And good to hear that someone is using the XML Smile
Post 13 Nov 2008, 15:21
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 13 Nov 2008, 15:57
MazeGen,

I've already thought of writing some .XSL to transform it to my taste… Would you be interested if I make it?
Post 13 Nov 2008, 15:57
View user's profile Send private message Reply with quote
Mac2004



Joined: 15 Dec 2003
Posts: 314
Mac2004 13 Nov 2008, 16:41
baldr wrote:
I'm not sure that pmovmskb is available on P-MMX though…


I chose to stick with the basic mmx instructions due to a reason that mmx is pretty well supported these days.
The mmx instructions have been here over a decade and they are widely supported by the x86 cpu's.

SSE instructions are not so largely supported. Sad
Your code seems be nice though. Smile

regards,
Mac2004
Post 13 Nov 2008, 16:41
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 14 Nov 2008, 09:51
baldr wrote:
MazeGen,

I've already thought of writing some .XSL to transform it to my taste… Would you be interested if I make it?

baldr, check your e-mail, please.
Post 14 Nov 2008, 09:51
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.