flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
revolution 28 Feb 2010, 11:03
Code: VBROADCASTSS xmm1,[m32] ![]() |
|||
![]() |
|
revolution 28 Feb 2010, 11:40
If you don't want to wait for the later CPUs to be released you can have it now with:
Code: vld1.32 {d0[],d1[]},[r0] although you will need and ARM CPU for it to work ![]() |
|||
![]() |
|
tthsqe 28 Feb 2010, 20:15
Ok, I get it. It is not possible right now. Ha - is arm ahead of x86?
|
|||
![]() |
|
revolution 01 Mar 2010, 01:50
tthsqe wrote: is arm ahead of x86? |
|||
![]() |
|
ass0 01 Mar 2010, 01:52
Anyway you are impliying that ones are evolving faster than others...
_________________ ![]() Nombre: Aquiles Castro. Location2: about:robots |
|||
![]() |
|
Madis731 01 Mar 2010, 07:44
Ofcourse there is:
shufps xmm0, dword [],0 pshufd xmm0, dword [],0 Why wouldn't you use them? It looks just like you are looking for an instruction IMUL4 eax shortcut, when you already have SHL eax,2 ![]() |
|||
![]() |
|
LocoDelAssembly 01 Mar 2010, 16:37
Quote:
Because the operand size in your code is not really available? ![]() |
|||
![]() |
|
baldr 01 Mar 2010, 23:22
tthsqe,
Single pshufd xmmx, dqword [mem32], 0 would suffice, if you don't mind #GP when mem32 is not properly aligned. |
|||
![]() |
|
LocoDelAssembly 02 Mar 2010, 06:17
BTW, besides the problem pshufd has regarding memory alignment (which makes its use with float arrays impossible), and that it has the extra need for SSE2, could it incur in some performance hit? pshufd will probably mark the two halves of the xmm register as INT, so the next floating point operation MAY be penalized for that, no?
|
|||
![]() |
|
Madis731 02 Mar 2010, 07:14
If you want to load unaligned dwords, you can do this:
Code: pshufd xmm0,[mem32],00000000b ;first dword pshufd xmm0,[mem32],01010101b ;second dword pshufd xmm0,[mem32],10101010b ;--- pshufd xmm0,[mem32],11111111b ;last (4th) I know they're immediates, but there's always a way in your code to determine where you load your data. pshufd will switch to INT indeed and you pay a clock for that but in my experience its too small to notice. MOVSS intrinsic is not encouraged by Intel and they say the use of MOVPS/MOVPD is better in this case. And if you don't want to shuffle between INT/FPU, you can always do the all-INT way (and still use MOV*): Code: movdqa xmm0,[mem32] pshufd xmm0,xmm0,0 but I don't see it beating pshufd xmm0,[mem32],0 in speed nor size. |
|||
![]() |
|
tthsqe 02 Mar 2010, 08:03
Silly me for thinking that the integer version
Code: pshufd xmm0,[mem32],0 of Code: shufps xmm0,[mem32],0 would shuffle them the same way. I just assumed they would be consistent. ![]() I think i'll accept the fourfold increase in code size and go with that last one by Madis. |
|||
![]() |
|
Madis731 02 Mar 2010, 08:42
Oh dear - of course - the shufps will take BOTH inputs and shuffle them. That is why I always use the packed one. Oops!
Who's the bad boy here Intel? AMD? both? |
|||
![]() |
|
revolution 02 Mar 2010, 08:51
Madis731 wrote: Who's the bad boy here Intel? AMD? both? |
|||
![]() |
|
LocoDelAssembly 02 Mar 2010, 13:45
Quote: I know they're immediates, but there's always a way in your code to determine where you load your data. ![]() |
|||
![]() |
|
Madis731 02 Mar 2010, 18:05
This simple SSE question has grown out of hands
![]() You can first load xmm0 with movdqa const[1.0,2.0,3.0,4.0] then add const[4.0,4.0,4.0,4.0] to this register every loop. Now you can effectively use these constants to calculate linearly every number you want. Actually Intel C Compiler will optimize float loop counters all by itself. Of course it prefers integer loop indexes. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.