flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
r22 19 Aug 2007, 02:50
I don't know about tutorial information, but you can download the opcode descriptions from Intel and AMD. Also Intel and AMD has optimization manuals that cover some SSE (xmm).
A few notes: - Make sure data is 16 byte aligned. - Usually 64 bytes / iteration is the sweet spot for unrolling SSE loops that load and/or store data from/to memory - Optimize the algorithm first, then attempt to make it parallel then implement it using the SSE instructions. Here's a simple StringLength function using XMM registers. Code: strlen: ;;string len with aligned and padded buffer ;;[esp+4] string pointer mov eax,[esp+4] push ebx test eax,eax jz .fail pxor xmm0,xmm0 mov ebx,eax .len: ;;load 32 bytes of the string movdqa xmm2,[eax+16] movdqa xmm1,[eax] ;;search for a null character pcmpeqb xmm2,xmm0 pcmpeqb xmm1,xmm0 ;;move results to 32bit registers pmovmskb edx,xmm2 pmovmskb ecx,xmm1 ;;combine the bit flagged results and increment the loop shl edx,16 add eax,32 or ecx,edx jz .len ;;calculate the length (end-start + byte_offset - 32) sub eax,ebx bsf ecx,ecx lea eax,[eax+ecx-32] pop ebx ret 4 .fail: xor eax,eax pop ebx ret 4 |
|||
![]() |
|
FrozenKnight 20 Aug 2007, 09:48
Thanks, I was looking for something like this are there more?
|
|||
![]() |
|
realcr 23 Aug 2007, 21:40
thanks r22. I will also go over the intel manual to see what's inside.
realcr. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.