flat assembler
Message board for the users of flat assembler.

Index > Main > Move immediate into xmm?

Author
Thread Post new topic Reply to topic
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 06 Aug 2009, 06:53
What's wrong with this macro?
Code:
macro       fillXmm reg,imm{
       mov     reg,(imm shl 8*0)+(imm shl 8*1)+(imm shl 8*2)+(imm shl 8*3)+(imm shl 8*4)+(imm shl 8*5)+(imm shl 8*6)+(imm shl 8*7)+(imm shl 8*8)+(imm shl 8*9)+(imm shl 8*10)+(imm shl 8*11)+(imm shl 8*12)+(imm shl 8*13)+(imm shl 8*14)+(imm shl 8*15)
}

fillXmm xmm0,5    
Why does it say invalid argument, and how do I fix it?

P.S. I also tried movd and movq and movdq..
Post 06 Aug 2009, 06:53
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4079
Location: vpcmpistri
bitRAKE 06 Aug 2009, 06:57
The MMX/SSE instructions don't have immediate moves. Some people avoid data fetches with PCMPEQ* reg,reg and shift - to create a mask. Or other combination:
Code:
mov eax,$AABBCCDD
movd mm0,eax    
...and some shuffling to extract parts.
Post 06 Aug 2009, 06:57
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 06 Aug 2009, 07:08
Thanks.. is there at least a faster way than this?
Code:
macro   fillXmm reg,imm{ 
      mov     rax,(imm shl 8*0)+(imm shl 8*1)+(imm shl 8*2)+(imm shl 8*3)+(imm shl 8*4)+(imm shl 8*5)+(imm shl 8*6)+(imm shl 8*7)
 mov     rcx,(imm shl 8*0)+(imm shl 8*1)+(imm shl 8*2)+(imm shl 8*3)+(imm shl 8*4)+(imm shl 8*5)+(imm shl 8*6)+(imm shl 8*7)
 movq    xmm0,rax 
   pinsrq  xmm0,rcx,1
}

fillXmm xmm0,5    
Or something the same speed that doesn't trash 2 GPRs?
Post 06 Aug 2009, 07:08
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal 06 Aug 2009, 08:47
Well you can move a memory location into a multimedia register e.g.

Code:
bignum     dq      0xDEADC0DEDEADBEEF
          dq      0x1234567890ABCDEF

movdqa        xmm0,dqword [bignum]
    


So if it is a constant value or in memory then that is faster.

P.S. Don't mean to steal your thread but I have a quick question. The difference between aligned and unaligned is that with aligned it must be aligned on a 16-byte boundary, but I don't get exactly what that means. Is that all the align 16 stuff Confused
Post 06 Aug 2009, 08:47
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 06 Aug 2009, 09:16
Thanks. I can't believe I didn't think of that lol.



P.S. aligned to 16 means address modulo 16 must = 0

i.e. the address is perfectly divisible by 16 (no remainder)


...I think
Post 06 Aug 2009, 09:16
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 06 Aug 2009, 12:14
@pal:
There's more, MOVDQA can only fetch data from "align 16" addresses, which you can always test with "TEST mem_pointer,-16".
MOVDQU can fetch data from any location, but you must be careful with this because even if your data is aligned and you used MOVDQU, you will
pay the performace penalty.

You can recognize 16-aligned address by checking the last "0" in hex:
401040h or 12340h or 0FFFFFFF0h
Post 06 Aug 2009, 12:14
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal 06 Aug 2009, 15:28
Ahh yeah so it is address % 16 = 0. So it is like how PE headers are meant to be 512 byte aligned.

Is the speed of using movdqa on aligned data (with having to test) faster than just using movdqu then?
Post 06 Aug 2009, 15:28
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 06 Aug 2009, 15:34
movdqa is faster. I read somewhere that it will crash on unaligned data.. but this doesn't happen for me with my E8400.. movntqd does, though. I think maybe it varies from CPU to CPU or something? But anyways yes, movdqa is definitely faster than movdqu on 16 byte aligned data.
Post 06 Aug 2009, 15:34
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 07 Aug 2009, 07:21
You will get a general exception (crash) in every case where address is not aligned and you use U-version (the unaligned). The A-version (aligned) should always be used on registers xmm1..xmm7(xmm15).

The i7 CPU made U-versions faster, but it still crashes if you do movdqa xmm0,[401001h] or something. In Feryno's debugger it says:
Code:
Address=0000000000401001h ExceptionCode=C0000005h=EXCEPTION_ACCESS_VIOLATION
    

Core, Core 2, Core i7 (I don't know for sure about AMD).


Last edited by Madis731 on 07 Aug 2009, 11:38; edited 1 time in total
Post 07 Aug 2009, 07:21
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 07 Aug 2009, 08:33
Hmm.. it crashes for me in compatibility mode.. but not in x64 mode..
Post 07 Aug 2009, 08:33
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 08 Aug 2009, 04:53
Code:
format pe gui 4.0
movdqa xmm0, [$+1]
ret    

Crash on an Athlon64 Venice. I can't test 64-bit code now because I'm not in Linux at this moment.

[edit]
Code:
loco@athlon64:~/Desktop$ cat movdqa.asm 
format elf64 executable
align 16
movdqa xmm0, [$+1]
xor edi,edi
mov eax,60
syscall
loco@athlon64:~/Desktop$ fasm movdqa.asm 
flat assembler  version 1.69.03  (16384 kilobytes memory)
1 passes, 145 bytes.
loco@athlon64:~/Desktop$ ./movdqa 
Segmentation fault    
Code:
loco@athlon64:~/Desktop$ cat movdqa.asm 
format elf executable
align 16
movdqa xmm0, [$+1]
xor edi,edi
mov eax,1
syscall
loco@athlon64:~/Desktop$ fasm movdqa.asm 
flat assembler  version 1.69.03  (16384 kilobytes memory)
1 passes, 113 bytes.
loco@athlon64:~/Desktop$ ./movdqa 
Segmentation fault    

[/edit]
Post 08 Aug 2009, 04:53
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.