flat assembler
Message board for the users of flat assembler.
  
|  Index
      > Main > Challenge for MMX/SSE experts: PINSR Goto page 1, 2 Next | 
| Author | 
 | 
| revolution 26 Jul 2010, 01:57 DOS386 wrote: CMOVNTQ | |||
|  26 Jul 2010, 01:57 | 
 | 
| edemko 28 Jul 2010, 12:40 Quote: 
 Code: movd mm1,ecx psllq mm1,0 por mm0,mm1 or GeneralPurpose? or MMX? or why to bloat? | |||
|  28 Jul 2010, 12:40 | 
 | 
| asmfan 28 Jul 2010, 14:28 modified
 Code: pcmpeqd mm1,mm1 psllq mm1,16 pand mm0,mm1 pxor mm1,mm1 movd mm1,ecx pslld mm1,16 psrld mm1,16 por mm0,mm1 Or Code: pcmpeqd mm1,mm1 psllq mm1,16 pand mm0,mm1 movd mm2,ecx pandn mm1,mm2 por mm0,mm1 Or uniq Code: mm0 PINSRw (mm0, ecx, imm8) { pcmpeqd mm1,mm1 psrlq mm1,48 movd mm2,ecx pand mm2,mm1 and imm8,3 shl imm8,16 psllq mm1,imm8 psllq mm2,imm8 pandn mm1,mm0 por mm1,mm2 movq mm0,mm1 } Should work i guess _________________ Any offers? | |||
|  28 Jul 2010, 14:28 | 
 | 
| DOS386 29 Jul 2010, 04:34 > movd    mm1,ecx
 > psllq mm1,0 > por mm0,mm1 Does this work ??? What does the "psllq mm1,0" do what NOP wouldn't ? I forgot an important detail: no registers should be trashed. What does the PINSR do ? Just copy 16 bits from CX into 16 lower bits of MM0 leaving the upper 48 bits untouched ? Code: PUSH EAX MOVD EAX, MM0 MOV AX, CX MOVD MM0, EAX POP EAX Would this work (11 Byte's of bloat) ? | |||
|  29 Jul 2010, 04:34 | 
 | 
| LocoDelAssembly 29 Jul 2010, 06:02 PINSRW, has the functional equivalent of first applying a mask that clears the target word in dest to zero and then ORs the src shifted left (imm8 mod 4) * 8 bits into dest.
 Your code does not seem to provide such functionality because of the following: Intel Vol2 wrote: MOVD instruction when destination operand is MMX technology register: | |||
|  29 Jul 2010, 06:02 | 
 | 
| sinsi 29 Jul 2010, 06:16 Code: sub esp,8 movq [esp],mm0 mov [esp],cx movq mm0,[esp] add esp,8 bloated though... | |||
|  29 Jul 2010, 06:16 | 
 | 
| DOS386 29 Jul 2010, 06:24 Quote: PINSRW, has the functional equivalent of first applying a mask that clears the target word in dest to zero and then ORs the src shifted left (imm8 mod 4) * 8 bits into dest. Complicated, but apparently irrelevant sincle my "imm8" is ZERO ... > Your code does not seem to provide such functionality > because of the following: > Intel Vol2 wrote: > > MOVD instruction when destination operand is > > MMX technology register: > > DEST[31:0] ← SRC; > > DEST[63:32] ← 00000000H; So my "MOVD MM0, EAX" is in fact "MOVZX MM0, EAX"   Quote: 
 Looks good, just ugly and bloated ...  | |||
|  29 Jul 2010, 06:24 | 
 | 
| LocoDelAssembly 29 Jul 2010, 06:29 Note that for extra equivalence SUB/ADD should be replaced with LEAs as PINSRW does not affect EFLAGS (even more bloat will result of this replacement, of course). | |||
|  29 Jul 2010, 06:29 | 
 | 
| sinsi 29 Jul 2010, 06:39 lea gives 18->20 bytes
 Code: movq [esp-8],mm0 mov [esp-8],cx movq mm0,[esp-8] 15 bytes living dangerously... | |||
|  29 Jul 2010, 06:39 | 
 | 
| edemko 29 Jul 2010, 12:24 "We are the borg", do not bloat.
 Code: mov cx,' | |||
|  29 Jul 2010, 12:24 | 
 | 
| DOS386 30 Jul 2010, 04:42 > lea gives 18->20 bytes 
 > 15 bytes > living dangerously... Right. Considering a function alignment of 16 Byte's the probability to encounter a hole of > 7 Byte's is less than 1/2, and for a hole of > 15 Byte's the probability is ZERO   > lea esp,[esp-9 -3] ;to align > movq [esp],mm0 1. Are you sure that gives better alignment than 4 ??? 2. Wouldn't "esp-12" also work ??? | |||
|  30 Jul 2010, 04:42 | 
 | 
| edemko 30 Jul 2010, 04:51 "sub esp,12" works saving one byte even, unlike "lea".
 System functions fail whenever ESP not aligned. I do not care about code alignment. Just showed a sample: (sizeof.mm0=8bytes + sizeof.NULL=1byte = 9bytes) mod 4 = 3bytes to ADD lea was erroneous Last edited by edemko on 30 Jul 2010, 04:55; edited 1 time in total | |||
|  30 Jul 2010, 04:51 | 
 | 
| edemko 30 Jul 2010, 04:53 yes, i failed with "lea" | |||
|  30 Jul 2010, 04:53 | 
 | 
| DOS386 30 Jul 2010, 04:54 > System functions fail whenever ESP not aligned.
 NOT on ME  | |||
|  30 Jul 2010, 04:54 | 
 | 
| edemko 30 Jul 2010, 04:57 try calling MessageBox with sub esp,9
 or you staying DOS? | |||
|  30 Jul 2010, 04:57 | 
 | 
| DOS386 30 Jul 2010, 04:59 edemko wrote: try calling MessageBox with sub esp,9 works on ME (among others) Quote: or you staying DOS? YES  | |||
|  30 Jul 2010, 04:59 | 
 | 
| edemko 30 Jul 2010, 05:05 win xp sp3
   which DOS? for now i must leave, good luck | |||
|  30 Jul 2010, 05:05 | 
 | 
| edemko 30 Jul 2010, 05:07 "enter 12,0" | |||
|  30 Jul 2010, 05:07 | 
 | 
| sinsi 30 Jul 2010, 06:07 >Considering a function alignment of 16 Byte's
 Nothing to do with stack Staying DOS, a cli/sti wrap should be done then. Is there a point to this? | |||
|  30 Jul 2010, 06:07 | 
 | 
| Goto page 1, 2  Next < Last Thread | Next Thread > | 
| Forum Rules: 
 | 
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.