flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
DJ Mauretto 12 Nov 2008, 19:26
Good
![]() here a program to test it Code: format PE CONSOLE 4.0 entry start Include 'win32a.inc' ;============================================================================= ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; CODE ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;============================================================================= section ".code" code readable writeable executable start: mov esi,Hello call MMX_Find_String_Length push ecx push Hello ; Offset String zero terminated call [printf] add esp,4 push String ; Offset String zero terminated call [printf] add esp,8 ret ;============================================================================= ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; PROC ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;============================================================================= align 8 ;******************************************************************************************** ; MMX_Find_String_Length Finds string length in bytes (excluding terminating null) by ; using mmx cpu extensions. The procedure requires mmx support ; (bit 23 in CPUID standard function 1) to be detected before ; using this. ; ; Input: esi--> A source string pointer, which must be 8 byte aligned. ; ; Output: ecx--> String length in bytes. ; ; ;DISCLAIMER: You can use this code only at your risk. There is no warranty of any kind ; neither express or implied! You acknowledge that this code may contain bugs ; although not intentional ones. ; ;******************************************************************************************** MMX_Find_String_Length: push eax push ebx push esi mov ebx,temp_qword ;Set ebx to point temp buffer. This way we avoid overwriting [ebx] contents in memory. pxor mm1,mm1 ;Clear mmx register. xor ecx,ecx ;Clear the length counter. ;-------------------------- ;Compare 8 bytes each time. ;-------------------------- .compare_loop: movq mm2,qword[esi] ;Get 8 bytes. pcmpeqb mm2,mm1 ;Compare 8 bytes each time. Pcmpeqb instruction trashes ;mm2 register contents. movq qword[ebx],mm2 ;Get result qword back to general register. cmp dword[ebx],0 ;Anything from the first dword? jnz .first_dword_has_a_hit ;If non zero value is detected, we have a hit.. cmp dword[ebx+4],0 ;Anything from the second dword? jnz .second_dword_has_a_hit ;If non zero value is detected, we have a hit.. add ecx,8 ;No luck with these eight bytes. add esi,8 ;Update source pointer. jmp .compare_loop ;Test next qword. ;---------------------------------------- ;Now we need to finalize our scan by ;inspecting which byte produced the hit. ;---------------------------------------- .second_dword_has_a_hit: add ecx,4 ;Detected length is least four bytes longer.. mov eax,[ebx+4] ;Get the second dword under inspection. jmp .start_byte_testing .first_dword_has_a_hit: mov eax,[ebx] ;Get the first dword under inspection. .start_byte_testing: ;Test individual bytes. ;--------------------- mov edx,eax ;Load original dword. and edx,0x000000ff ;First byte? jnz .done inc ecx mov edx,eax ;Restore original dword. and edx,0x0000ff00 ;Second byte? jnz .done inc ecx mov edx,eax ;Restore original dword. and edx,0x00ff0000 ;Third byte? jnz .done inc ecx mov edx,eax ;Restore original dword. and edx,0xff000000 ;Fourth byte? ; jnz .done .done: pop esi pop ebx pop eax ret ;============================================================================= ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; DATA ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;============================================================================= section '.data' data readable writeable align 8 temp_qword dq 0 ; A Temporary buffer reservation. Hello DB "Hello World",0 String DB " = %d",13,10,0 ;============================================================================= ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; IDATA ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;============================================================================= section '.idata' import data readable writeable library msvcrt,'msvcrt.dll' import msvcrt,\ printf,'printf' _________________ Nil Volentibus Arduum ![]() |
|||
![]() |
|
Mac2004 12 Nov 2008, 19:45
Quote: Good Smile DJ Mauretto: Thanx, you were pretty fast ![]() regards, Mac2004 |
|||
![]() |
|
LocoDelAssembly 12 Nov 2008, 19:59
Moved to main.
BTW this part Code: movq qword[ebx],mm2 ;Get result qword back to general register. cmp dword[ebx],0 Could be done in such a way that avoids using memory? (Not "movd eax, mm2" since it is an AMD extension of MMX and an SSE2 instruction). [edit]I was wrong... so I suggest changing to this: Code: movd eax, mm2 cmp eax, 0 |
|||
![]() |
|
Mac2004 12 Nov 2008, 22:40
If I remember correctly, the AMD64 optimization manual discourages using
movd eax,mm1 style instruction format due to fact that 'half' mmx register access causes a stall. Some stall problems also occur while mixing mmx and general registers. The manual recommended saving mmx register to memory instead of general register. I'am not sure whether Intel cpu's have similar problems or not. regards, Mac2004 |
|||
![]() |
|
baldr 13 Nov 2008, 08:05
Mac2004,
Things are somewhat worse. Software Optimization Guide for AMD64 wrote: Rationale |
|||
![]() |
|
baldr 13 Nov 2008, 09:25
I just thought: "Why do we need to store that qword? To simply test each byte of it? Hmm…". This is what I contrive:
Code: strlen: ; expects: ; esi == address of ASCIIZ ; ; modifies: ; mm1 ; ; returns: ; ecx == length of ASCIIZ ; mm0 == 0 push esi pxor mm0, mm0 .compare_64: movq mm1, qword [esi] add esi, 8 pcmpeqb mm1, mm0 ; mm1.byte[i] == -1 if byte [esi+i] == 0, 0 otherwise pmovmskb ecx, mm1 ; cl.bit[i] == mm1.byte[i].bit[7] bsf ecx, ecx ; cl == index of rightmost 1 bit, ZF == 1 if none jz .compare_64 ; ZF == 1 if no zero bytes in qword [esi] lea ecx, [esi-8+ecx] ; ecx points to zero byte pop esi sub ecx, esi ret |
|||
![]() |
|
LocoDelAssembly 13 Nov 2008, 13:27
Quote: The PMOVMSKB instruction is an AMD extension to MMX™ instruction set and is an |
|||
![]() |
|
baldr 13 Nov 2008, 13:47
LocoDelAssembly,
Thanks for info, does it imply that Pentium MMX doesn't have pmovmskb? Probably yes, because of SSE reference. I've searched MazeGen's x86 reference, no match to shed some light (no smoking then ![]() |
|||
![]() |
|
LocoDelAssembly 13 Nov 2008, 13:57
Yep, that means but I could later bring back from the death my old PMMX 200 MHz to confirm this
![]() I have searched in http://softpixel.com/~cwright/programming/simd/ before digging inside AMD64 manuals, not sure about the correctness of this site though, but seems that it was right at this one. |
|||
![]() |
|
MazeGen 13 Nov 2008, 14:45
baldr wrote: I've searched MazeGen's x86 reference, no match to shed some light (no smoking then Heck, how did you searched? ![]() It is there, clearly says that PMOVMSKB it is P3+, SSE1: http://ref.x86asm.net/coder32.html#x0FD7 (PMMX is indicated by PX code) |
|||
![]() |
|
baldr 13 Nov 2008, 14:57
MazeGen,
I've searched x86reference.xml, downloaded 2008-11-07. You're right, online version contains it. Sorry. LocoDelAssembly, Anyway, there's some use for bsf… Should I cross-post that code in mattst88's thread? ![]() |
|||
![]() |
|
MazeGen 13 Nov 2008, 15:21
baldr, I'm sorry, I forgot to upload the most recent version of the XML. Will upload it few days. And good to hear that someone is using the XML
![]() |
|||
![]() |
|
baldr 13 Nov 2008, 15:57
MazeGen,
I've already thought of writing some .XSL to transform it to my taste… Would you be interested if I make it? |
|||
![]() |
|
Mac2004 13 Nov 2008, 16:41
baldr wrote: I'm not sure that pmovmskb is available on P-MMX though… I chose to stick with the basic mmx instructions due to a reason that mmx is pretty well supported these days. The mmx instructions have been here over a decade and they are widely supported by the x86 cpu's. SSE instructions are not so largely supported. ![]() Your code seems be nice though. ![]() regards, Mac2004 |
|||
![]() |
|
MazeGen 14 Nov 2008, 09:51
baldr wrote: MazeGen, baldr, check your e-mail, please. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.