flat assembler
Message board for the users of flat assembler.

Index > Main > Challenge for MMX/SSE experts: PINSR

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
DOS386



Joined: 08 Dec 2006
Posts: 1898
DOS386 26 Jul 2010, 01:38
Code:
0FC4C100       pinsrw mm0,cx,0
    


[1] Trivial task: find a MMX at most (no SSE) alternative to above code with as little bloat as possible.

[2] Suggest or code a tool to check for instruction compatibility (SSE in MMX code, CMOVNTQ in 80386 code, MOVSD in 8086 code, ...) preferably not limited to FASM code.
Post 26 Jul 2010, 01:38
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 19869
Location: In your JS exploiting you and your system
revolution 26 Jul 2010, 01:57
DOS386 wrote:
CMOVNTQ
There is no such instruction. Where do you get that from?
Post 26 Jul 2010, 01:57
View user's profile Send private message Visit poster's website Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 28 Jul 2010, 12:40
Quote:

[1] Trivial task: find a MMX at most (no SSE) alternative to above code with as little bloat as possible.

Code:
        movd    mm1,ecx
        psllq   mm1,0
        por     mm0,mm1
    

or GeneralPurpose?
or MMX?
or why to bloat?
Post 28 Jul 2010, 12:40
View user's profile Send private message Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan 28 Jul 2010, 14:28
modified
Code:
        pcmpeqd    mm1,mm1
        psllq   mm1,16
        pand   mm0,mm1

        pxor    mm1,mm1
        movd    mm1,ecx
        pslld   mm1,16
        psrld   mm1,16
        por     mm0,mm1
    

Or
Code:
        pcmpeqd    mm1,mm1
        psllq   mm1,16
        pand   mm0,mm1

        movd    mm2,ecx
        pandn   mm1,mm2
        por     mm0,mm1
    

Or uniq
Code:
mm0 PINSRw (mm0, ecx, imm8)
{
pcmpeqd        mm1,mm1
psrlq        mm1,48

movd      mm2,ecx
pand mm2,mm1

and      imm8,3
shl   imm8,16

psllq    mm1,imm8
psllq       mm2,imm8

pandn   mm1,mm0
por  mm1,mm2

movq     mm0,mm1
}
    

Should work i guess

_________________
Any offers?
Post 28 Jul 2010, 14:28
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1898
DOS386 29 Jul 2010, 04:34
> movd mm1,ecx
> psllq mm1,0
> por mm0,mm1

Does this work ??? What does the "psllq mm1,0" do what NOP wouldn't ?

I forgot an important detail: no registers should be trashed.

What does the PINSR do ? Just copy 16 bits from CX into 16 lower bits of MM0 leaving the upper 48 bits untouched ?

Code:
PUSH EAX
MOVD EAX, MM0
MOV  AX, CX
MOVD MM0, EAX
POP  EAX
    


Would this work (11 Byte's of bloat) ?
Post 29 Jul 2010, 04:34
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 29 Jul 2010, 06:02
PINSRW, has the functional equivalent of first applying a mask that clears the target word in dest to zero and then ORs the src shifted left (imm8 mod 4) * 8 bits into dest.

Your code does not seem to provide such functionality because of the following:
Intel Vol2 wrote:
MOVD instruction when destination operand is MMX technology register:
DEST[31:0] ← SRC;
DEST[63:32] ← 00000000H;
Post 29 Jul 2010, 06:02
View user's profile Send private message Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 786
Location: Adelaide
sinsi 29 Jul 2010, 06:16
Code:
sub esp,8
movq [esp],mm0
mov [esp],cx
movq mm0,[esp]
add esp,8
    

bloated though...
Post 29 Jul 2010, 06:16
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1898
DOS386 29 Jul 2010, 06:24
Quote:
PINSRW, has the functional equivalent of first applying a mask that clears the target word in dest to zero and then ORs the src shifted left (imm8 mod 4) * 8 bits into dest.


Complicated, but apparently irrelevant sincle my "imm8" is ZERO ...

> Your code does not seem to provide such functionality
> because of the following:
> Intel Vol2 wrote:
> > MOVD instruction when destination operand is
> > MMX technology register:
> > DEST[31:0] ← SRC;
> > DEST[63:32] ← 00000000H;

So my "MOVD MM0, EAX" is in fact "MOVZX MM0, EAX" Sad

Quote:
Code:
sub esp,8 
movq [esp],mm0 
mov [esp],cx 
movq mm0,[esp] 
add esp,8 
    
bloated though...


Looks good, just ugly and bloated ... Neutral
Post 29 Jul 2010, 06:24
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 29 Jul 2010, 06:29
Note that for extra equivalence SUB/ADD should be replaced with LEAs as PINSRW does not affect EFLAGS (even more bloat will result of this replacement, of course).
Post 29 Jul 2010, 06:29
View user's profile Send private message Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 786
Location: Adelaide
sinsi 29 Jul 2010, 06:39
lea gives 18->20 bytes
Code:
movq [esp-8],mm0
mov [esp-8],cx
movq mm0,[esp-8]
    

15 bytes
living dangerously...
Post 29 Jul 2010, 06:39
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 29 Jul 2010, 12:24
"We are the borg", do not bloat.
Code:
        mov     cx,'Smile'
        pinsrw  mm0,ecx,0              ;mm0 = $xxxx'xxxx'xxxx'CX
        pinsrw  mm0,ecx,1              ;mm0 = $xxxx'xxxx'CX  'CX
        pinsrw  mm0,ecx,2              ;mm0 = $xxxx'CX  'CX  'CX
        pinsrw  mm0,ecx,3              ;mm0 = $CX  'CX  'CX  'CX

        lea     esp,[esp-9 -3]         ;to align
        mov     byte[esp+8],0          ;#0
        movq    [esp],mm0              ;'SmileSmileSmileSmile'
        mov     ecx,esp
        invoke  MessageBoxA,0,ecx,0,0
        lea     esp,[esp+9 +3]         ;restore stack

        ret
    
Post 29 Jul 2010, 12:24
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1898
DOS386 30 Jul 2010, 04:42
> lea gives 18->20 bytes
> 15 bytes
> living dangerously...

Right. Considering a function alignment of 16 Byte's the probability to encounter a hole of > 7 Byte's is less than 1/2, and for a hole of > 15 Byte's the probability is ZERO Shocked

> lea esp,[esp-9 -3] ;to align
> movq [esp],mm0

1. Are you sure that gives better alignment than 4 ???
2. Wouldn't "esp-12" also work ???
Post 30 Jul 2010, 04:42
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 30 Jul 2010, 04:51
"sub esp,12" works saving one byte even, unlike "lea".
System functions fail whenever ESP not aligned.
I do not care about code alignment.
Just showed a sample: (sizeof.mm0=8bytes + sizeof.NULL=1byte = 9bytes) mod 4 = 3bytes to ADD

lea was erroneous


Last edited by edemko on 30 Jul 2010, 04:55; edited 1 time in total
Post 30 Jul 2010, 04:51
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 30 Jul 2010, 04:53
yes, i failed with "lea"
Post 30 Jul 2010, 04:53
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1898
DOS386 30 Jul 2010, 04:54
> System functions fail whenever ESP not aligned.

NOT on ME Shocked
Post 30 Jul 2010, 04:54
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 30 Jul 2010, 04:57
try calling MessageBox with sub esp,9
or you staying DOS?
Post 30 Jul 2010, 04:57
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1898
DOS386 30 Jul 2010, 04:59
edemko wrote:
try calling MessageBox with sub esp,9


works on ME (among others)

Quote:
or you staying DOS?


YES Smile
Post 30 Jul 2010, 04:59
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 30 Jul 2010, 05:05
win xp sp3
Image
which DOS?
for now i must leave, good luck
Post 30 Jul 2010, 05:05
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 30 Jul 2010, 05:07
"enter 12,0"
Post 30 Jul 2010, 05:07
View user's profile Send private message Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 786
Location: Adelaide
sinsi 30 Jul 2010, 06:07
>Considering a function alignment of 16 Byte's
Nothing to do with stack

Staying DOS, a cli/sti wrap should be done then.


Is there a point to this?
Post 30 Jul 2010, 06:07
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.