flat assembler
Message board for the users of flat assembler.
Index
> Main > Constant generation, SIMD |
Author |
|
revolution 16 Feb 2010, 13:56
IIRC the AMD manuals explain how to generate various constants. Or maybe it is the Intel manuals? Or maybe both? Anyhow, there are some tricks around that help with the tricky business of constant generation in MMX.
|
|||
16 Feb 2010, 13:56 |
|
bitRAKE 16 Feb 2010, 16:16
Code: mov eax,$AABBCCDD movd mm0,eax punpcklbw mm0,mm0 ; AA|AA|BB|BB|CC|CC|DD|DD punpcklbw mm0,mm0 ; CC|CC|CC|CC|DD|DD|DD|DD punpcklbw mm0,mm0 ; DD|DD|DD|DD|DD|DD|DD|DD Code: pcmpeqb mm0,mm0 ;8x(11111111b) pabsb mm0,mm0 ;8x(00000001b) # SSSE3 # |
|||
16 Feb 2010, 16:16 |
|
hopcode 17 Feb 2010, 03:29
bitRAKE wrote: Hm...I thought there was a two instruction method to generate the high bit... Yes, but 1 more register too Code: pcmpeqb mm0,mm0 ;FFFF FFFF FFFF FFFF pavgb mm1,mm0 ;8080 8080 8080 8080 ;<--- average on packbytes ;pavgw mm1,mm0 ;8000 8000 8000 8000 my alternative way, using 1 register Code: pcmpeqb mm0,mm0 paddsb mm0,mm0 ;FEFE FEFE FEFE FEFE packsswb mm0,mm0 ;8080 8080 8080 8080 bitRAKE wrote: ...really two cache misses... Among what instructions precisely ? Is there a general rule (in few words) to avoid cache "miss". Thanks, hopcode EDIT: OK. 100% confirmed for sure. SIMD create dependencies |
|||
17 Feb 2010, 03:29 |
|
hopcode 20 Feb 2010, 07:33
hopcode wrote: Is there a general rule (in few words) to avoid cache "miss". Yes, A. Fog Chapter 11.1 Optimizing Then considering - Number of proess/thread/processors/cores accessing L1/L2 cache - General data alignment bitRAKE wrote: ...really two cache misses: one for loading constant data, and second for having to reload the data that was already in the cache. Why ? Cosidering the following Code: pcmpeqb mm0,mm0 paddsb mm0,mm0 ;FEFE FEFE FEFE FEFE packsswb mm0,mm0 ;8080 8080 8080 8080 I am not 100% sure but 1) MMX registers are renameable 2) As they alias FPU registers ,execution wil be pipelined Or Regards, hopcode . |
|||
20 Feb 2010, 07:33 |
|
Madis731 20 Feb 2010, 15:36
After pcmpeqb xmm,xmm you could use shifting (left or right) to achieve the higest or lowest bits, even any of the 2^n-1.
I'm not sure if there are shifts for all data widths (B, W, D, Q, O), but with some tricks its playable. |
|||
20 Feb 2010, 15:36 |
|
edemko 14 Mar 2010, 05:47
bitRAKE
shuffle instructions in newer processors Code: mov eax,'spam' movd xmm0,eax shufps xmm0,xmm0,0 ; 8x('spam') I've understood it only now(after some wasm.ru boarding), thank you man :) |
|||
14 Mar 2010, 05:47 |
|
revolution 14 Mar 2010, 05:50
serfasm wrote:
|
|||
14 Mar 2010, 05:50 |
|
edemko 14 Mar 2010, 06:10
+1, thanx and pardon for those before
|
|||
14 Mar 2010, 06:10 |
|
edemko 23 Mar 2010, 11:31
updated
Code: format pe gui 4.0 include 'win32ax.inc' section '' code import readable writable executable library kernel32, 'kernel32.dll' include 'api\kernel32.inc' szBuf = 456976*5 buf rb szBuf fnm db 'keygen_aaaa..zzzz.txt',0 ioResult dd ? entry $ cld mov edi,buf stdcall keygen_aaaa..zzzz,feedback invoke CreateFileA,fnm,GENERIC_WRITE,0,0,CREATE_ALWAYS,0,0 mov edi,eax invoke SetFilePointer,edi,0,0,FILE_BEGIN invoke WriteFile,edi,buf,szBuf,ioResult,0 invoke CloseHandle,edi exit: invoke ExitProcess,0 fail: hlt proc feedback bswap eax stosd bswap eax mov byte[edi],13 inc edi ret endp ; Funny keygen made for a scientific research. ; There will be a 'aaaa'..'zzzz' range generation cycle ; thus we'll have ('z'-'a'+1)^4 variant, which having ; been generated, showed you through a "feedback" proc, ; leaving which, restore eax,ebx,ecx ever changed; unin- ; tendively eax brings bswap'ed values! proc keygen_aaaa..zzzz; feedback:dword xchg ebx,[esp+4] push eax ecx pushfd mov eax,'aaaa' ; like 0000 in decimal .0: mov ecx,'z'-'a' ; like 9-0 in decimal call ebx .1: inc eax ; like 0000+1 in decimal call ebx ; inform new value loop .1 ; tick-tock mov ecx,'zzzz' ; like 9999 in decimal sub ecx,eax ; ... -0009 in decimal jz .4 ; ... =9990 in decimal dec ecx ; ... =9989 in decimal mov eax,10000000b shl 8 or ('z'-'a') .2: test cl,ah ; borrow? jz .3 mov cl,al ; like 9 in decimal .3: ror ecx,8 ; like 9989 -> 9998 in decimal add eax,01000000'00000000'00000000'00000000b jnc .2 ; 00b will give carry at step 4 mov eax,'zzzz' ; like 9999 in decimal sub eax,ecx ; ... -9989 in decimal jmp .0 ; ... =0010 in decimal .4: popfd pop ecx eax mov ebx,[esp+4] ret 4 endp |
|||
23 Mar 2010, 11:31 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.