flat assembler
Message board for the users of flat assembler.

Index > Main > Constant generation, SIMD

Author
Thread Post new topic Reply to topic
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 16 Feb 2010, 11:29
Code:
;1 generation
pxor    mm0,mm0 ;8x(00000000b)
pcmpeqd mm1,mm1 ;8x(11111111b)
psubb   mm0,mm1 ;8x(00000001b)

;2^n generation
pxor    mm0,mm0 ;8x(00000000b)
pcmpeqd mm1,mm1 ;8x(11111111b)
psubb   mm0,mm1 ;8x(00000001b)
psllq   mm0,1   ;8x(00000010b) etc.

    

They are a lot...
Rolling Eyes
Post 16 Feb 2010, 11:29
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20302
Location: In your JS exploiting you and your system
revolution 16 Feb 2010, 13:56
IIRC the AMD manuals explain how to generate various constants. Or maybe it is the Intel manuals? Or maybe both? Anyhow, there are some tricks around that help with the tricky business of constant generation in MMX.
Post 16 Feb 2010, 13:56
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 16 Feb 2010, 16:16
Code:
mov eax,$AABBCCDD
movd mm0,eax
punpcklbw mm0,mm0 ; AA|AA|BB|BB|CC|CC|DD|DD
punpcklbw mm0,mm0 ; CC|CC|CC|CC|DD|DD|DD|DD
punpcklbw mm0,mm0 ; DD|DD|DD|DD|DD|DD|DD|DD    
...four combinations of low/high unpck can be used to distribute bytes; or shuffle instructions in newer processors. Faster than a cache miss - really two cache misses: one for loading constant data, and second for having to reload the data that was already in the cache.
Code:
pcmpeqb mm0,mm0 ;8x(11111111b)
pabsb   mm0,mm0 ;8x(00000001b) # SSSE3 #    
Hm...I thought there was a two instruction method to generate the high bit, but can't recall it atm. PCMPEQ/PSLL works for all - except byte size. For bytes, the three instruction method PCMPEQ/PSLLW [7-15]/PACKSSWB works to set high bit.
Post 16 Feb 2010, 16:16
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 17 Feb 2010, 03:29
bitRAKE wrote:
Hm...I thought there was a two instruction method to generate the high bit...

Yes, but 1 more register too
Code:
pcmpeqb mm0,mm0   ;FFFF FFFF FFFF FFFF
pavgb mm1,mm0       ;8080 8080 8080 8080  ;<--- average on packbytes
;pavgw mm1,mm0      ;8000 8000 8000 8000 
    

my alternative way, using 1 register
Code:
pcmpeqb mm0,mm0        
paddsb mm0,mm0     ;FEFE FEFE FEFE FEFE
packsswb mm0,mm0  ;8080 8080 8080 8080
    

bitRAKE wrote:
...really two cache misses...

Among what instructions precisely ? Is there a general rule (in few words) to avoid cache "miss".

Thanks,
hopcode

EDIT: OK. Very Happy 100% confirmed for sure. SIMD create dependencies Very Happy
Post 17 Feb 2010, 03:29
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 20 Feb 2010, 07:33
hopcode wrote:
Is there a general rule (in few words) to avoid cache "miss".

Yes, A. Fog Chapter 11.1 Optimizing
Then considering

    - Number of proess/thread/processors/cores accessing L1/L2 cache
    - General data alignment


bitRAKE wrote:
...really two cache misses: one for loading constant data, and second for having to reload the data that was already in the cache.

Why ? Cosidering the following
Code:
pcmpeqb mm0,mm0  
paddsb mm0,mm0     ;FEFE FEFE FEFE FEFE
packsswb mm0,mm0  ;8080 8080 8080 8080
    

I am not 100% sure but

    1) MMX registers are renameable
    2) As they alias FPU registers ,execution wil be pipelined

Or Question

Regards,
hopcode
.
Post 20 Feb 2010, 07:33
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 20 Feb 2010, 15:36
After pcmpeqb xmm,xmm you could use shifting (left or right) to achieve the higest or lowest bits, even any of the 2^n-1.
I'm not sure if there are shifts for all data widths (B, W, D, Q, O), but with some tricks its playable.
Post 20 Feb 2010, 15:36
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 14 Mar 2010, 05:47
bitRAKE
shuffle instructions in newer processors
Code:
        mov     eax,'spam'
        movd    xmm0,eax
        shufps  xmm0,xmm0,0  ; 8x('spam')
    

I've understood it only now(after some wasm.ru boarding), thank you man :)
Post 14 Mar 2010, 05:47
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20302
Location: In your JS exploiting you and your system
revolution 14 Mar 2010, 05:50
serfasm wrote:
Code:
        mov     eax,'spam'
        movd    xmm0,eax
        shufps  xmm0,xmm0,0  ; 8x('spam')
    
I think you mean 4x('spam').
Post 14 Mar 2010, 05:50
View user's profile Send private message Visit poster's website Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 14 Mar 2010, 06:10
+1, thanx and pardon for those before
Post 14 Mar 2010, 06:10
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 23 Mar 2010, 11:31
updated

Code:
format pe gui 4.0
include 'win32ax.inc'


section '' code import readable writable executable
library kernel32, 'kernel32.dll'
include 'api\kernel32.inc'


szBuf    = 456976*5
buf      rb szBuf
fnm      db 'keygen_aaaa..zzzz.txt',0
ioResult dd ?


entry $
        cld
        mov     edi,buf
        stdcall keygen_aaaa..zzzz,feedback

        invoke  CreateFileA,fnm,GENERIC_WRITE,0,0,CREATE_ALWAYS,0,0
        mov     edi,eax
        invoke  SetFilePointer,edi,0,0,FILE_BEGIN
        invoke  WriteFile,edi,buf,szBuf,ioResult,0
        invoke  CloseHandle,edi

  exit: invoke  ExitProcess,0
  fail: hlt


proc feedback
        bswap   eax
        stosd
        bswap   eax
        mov     byte[edi],13
        inc     edi
        ret
endp


; Funny keygen made for a scientific research.
; There will be a 'aaaa'..'zzzz' range generation cycle
; thus we'll have ('z'-'a'+1)^4 variant, which having
; been generated, showed you through a "feedback" proc,
; leaving which, restore eax,ebx,ecx ever changed; unin-
; tendively eax brings bswap'ed values!
proc keygen_aaaa..zzzz; feedback:dword
        xchg    ebx,[esp+4]
        push    eax ecx
        pushfd

        mov     eax,'aaaa'  ; like 0000 in decimal
  .0:   mov     ecx,'z'-'a' ; like 9-0 in decimal
        call    ebx
  .1:   inc     eax         ; like 0000+1 in decimal
        call    ebx         ; inform new value
        loop    .1          ; tick-tock
        mov     ecx,'zzzz'  ; like 9999 in decimal
        sub     ecx,eax     ; ... -0009 in decimal
        jz      .4          ; ... =9990 in decimal
        dec     ecx         ; ... =9989 in decimal
        mov     eax,10000000b shl 8 or ('z'-'a')
  .2:   test    cl,ah       ; borrow?
        jz      .3
        mov     cl,al       ; like 9 in decimal
  .3:   ror     ecx,8       ; like 9989 -> 9998 in decimal
        add     eax,01000000'00000000'00000000'00000000b
        jnc     .2          ; 00b will give carry at step 4
        mov     eax,'zzzz'  ; like 9999 in decimal
        sub     eax,ecx     ; ... -9989 in decimal
        jmp     .0          ; ... =0010 in decimal
  .4:
        popfd
        pop     ecx eax
        mov     ebx,[esp+4]
        ret     4
endp

    
Post 23 Mar 2010, 11:31
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.