flat assembler
Message board for the users of flat assembler.
Index
> Main > Move immediate into xmm? |
Author |
|
bitRAKE 06 Aug 2009, 06:57
The MMX/SSE instructions don't have immediate moves. Some people avoid data fetches with PCMPEQ* reg,reg and shift - to create a mask. Or other combination:
Code: mov eax,$AABBCCDD movd mm0,eax |
|||
06 Aug 2009, 06:57 |
|
Azu 06 Aug 2009, 07:08
Thanks.. is there at least a faster way than this?
Code: macro fillXmm reg,imm{ mov rax,(imm shl 8*0)+(imm shl 8*1)+(imm shl 8*2)+(imm shl 8*3)+(imm shl 8*4)+(imm shl 8*5)+(imm shl 8*6)+(imm shl 8*7) mov rcx,(imm shl 8*0)+(imm shl 8*1)+(imm shl 8*2)+(imm shl 8*3)+(imm shl 8*4)+(imm shl 8*5)+(imm shl 8*6)+(imm shl 8*7) movq xmm0,rax pinsrq xmm0,rcx,1 } fillXmm xmm0,5 |
|||
06 Aug 2009, 07:08 |
|
pal 06 Aug 2009, 08:47
Well you can move a memory location into a multimedia register e.g.
Code: bignum dq 0xDEADC0DEDEADBEEF dq 0x1234567890ABCDEF movdqa xmm0,dqword [bignum] So if it is a constant value or in memory then that is faster. P.S. Don't mean to steal your thread but I have a quick question. The difference between aligned and unaligned is that with aligned it must be aligned on a 16-byte boundary, but I don't get exactly what that means. Is that all the align 16 stuff |
|||
06 Aug 2009, 08:47 |
|
Azu 06 Aug 2009, 09:16
Thanks. I can't believe I didn't think of that lol.
P.S. aligned to 16 means address modulo 16 must = 0 i.e. the address is perfectly divisible by 16 (no remainder) ...I think |
|||
06 Aug 2009, 09:16 |
|
Madis731 06 Aug 2009, 12:14
@pal:
There's more, MOVDQA can only fetch data from "align 16" addresses, which you can always test with "TEST mem_pointer,-16". MOVDQU can fetch data from any location, but you must be careful with this because even if your data is aligned and you used MOVDQU, you will pay the performace penalty. You can recognize 16-aligned address by checking the last "0" in hex: 401040h or 12340h or 0FFFFFFF0h |
|||
06 Aug 2009, 12:14 |
|
pal 06 Aug 2009, 15:28
Ahh yeah so it is address % 16 = 0. So it is like how PE headers are meant to be 512 byte aligned.
Is the speed of using movdqa on aligned data (with having to test) faster than just using movdqu then? |
|||
06 Aug 2009, 15:28 |
|
Azu 06 Aug 2009, 15:34
movdqa is faster. I read somewhere that it will crash on unaligned data.. but this doesn't happen for me with my E8400.. movntqd does, though. I think maybe it varies from CPU to CPU or something? But anyways yes, movdqa is definitely faster than movdqu on 16 byte aligned data.
|
|||
06 Aug 2009, 15:34 |
|
Madis731 07 Aug 2009, 07:21
You will get a general exception (crash) in every case where address is not aligned and you use U-version (the unaligned). The A-version (aligned) should always be used on registers xmm1..xmm7(xmm15).
The i7 CPU made U-versions faster, but it still crashes if you do movdqa xmm0,[401001h] or something. In Feryno's debugger it says: Code: Address=0000000000401001h ExceptionCode=C0000005h=EXCEPTION_ACCESS_VIOLATION Core, Core 2, Core i7 (I don't know for sure about AMD). Last edited by Madis731 on 07 Aug 2009, 11:38; edited 1 time in total |
|||
07 Aug 2009, 07:21 |
|
Azu 07 Aug 2009, 08:33
Hmm.. it crashes for me in compatibility mode.. but not in x64 mode..
|
|||
07 Aug 2009, 08:33 |
|
LocoDelAssembly 08 Aug 2009, 04:53
Code: format pe gui 4.0 movdqa xmm0, [$+1] ret Crash on an Athlon64 Venice. I can't test 64-bit code now because I'm not in Linux at this moment. [edit] Code: loco@athlon64:~/Desktop$ cat movdqa.asm format elf64 executable align 16 movdqa xmm0, [$+1] xor edi,edi mov eax,60 syscall loco@athlon64:~/Desktop$ fasm movdqa.asm flat assembler version 1.69.03 (16384 kilobytes memory) 1 passes, 145 bytes. loco@athlon64:~/Desktop$ ./movdqa Segmentation fault Code: loco@athlon64:~/Desktop$ cat movdqa.asm format elf executable align 16 movdqa xmm0, [$+1] xor edi,edi mov eax,1 syscall loco@athlon64:~/Desktop$ fasm movdqa.asm flat assembler version 1.69.03 (16384 kilobytes memory) 1 passes, 113 bytes. loco@athlon64:~/Desktop$ ./movdqa Segmentation fault [/edit] |
|||
08 Aug 2009, 04:53 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.