flat assembler
Message board for the users of flat assembler.
Index
> Main > Adding up individual floating point values in a xmm register |
Author |
|
revolution 02 Apr 2006, 07:23
Assuming your source register is xmm0
Code: movhlps xmm1,xmm0 addps xmm0,xmm1 shufps xmm1,xmm0,1 addps xmm0,xmm1 |
|||
02 Apr 2006, 07:23 |
|
Madis731 02 Apr 2006, 09:49
revolution, you have an error in your code. The result of a shuffle gets defined by BOTH operands and you can't get the source's lower into destination's lower.
Give 1.0 2.0 3.0 4.0 IN: Code: movhlps xmm1,xmm0 addps xmm0,xmm1 movaps xmm1,xmm0 shufps xmm1,xmm1,1 addps xmm0,xmm1 ;or even addss because its 1µop faster and get X X X 10.0 OUT Give 1.0 2.0 3.0 4.0 IN: Code: movaps xmm1,xmm0 shufps xmm1,xmm1,00011011b addps xmm0,xmm1 movaps xmm1,xmm0 shufps xmm1,xmm1,01001110b addps xmm0,xmm1 and get 10.0 10.0 10.0 10.0 OUT Maybe useful sometimes. You can use the results in the XMM0 at the beginning of the edge of the 11th and 13th clock respectively. I looked at SSE3 and it really is simpler in instructions to implement: Code: haddps xmm0,xmm0 haddps xmm0,xmm0 The advantage is that you get all the four DWORDs filled with the same result. |
|||
02 Apr 2006, 09:49 |
|
Raedwulf 02 Apr 2006, 11:49
Thanks!
This really helps - I wrote some real crappy SSE code to add them up lol - using too many SHUFPS - because my head really gets spun round trying to work out which goes to which. Btw.... what debugger do you guys use ? - I use Ollydbg but as it doesn't support XMM registers, I have to inconveniently place the values into temporary memory spaces to view their contents (in debug time) Cheers! _________________ Raedwulf |
|||
02 Apr 2006, 11:49 |
|
Raedwulf 02 Apr 2006, 12:04
Yeah SSE3 is bloody useful in these situations - unfortunately my desktop doesn't even support SSE2! Any willing doners for my humble cause are appreciated
LOL Cheers! |
|||
02 Apr 2006, 12:04 |
|
revolution 02 Apr 2006, 13:34
Quote: revolution, you have an error in your code. The result of a shuffle gets defined by BOTH operands and you can't get the source's lower into destination's lower. Code: movhlps xmm1,xmm0 addps xmm0,xmm1 pshufd xmm1,xmm0,1 addps xmm0,xmm1 |
|||
02 Apr 2006, 13:34 |
|
Raedwulf 02 Apr 2006, 16:38
Thanks revolution
I'll keep note of that - because that appears to be faster. However, it uses SSE2 -(PSHUFD) - and my desktop is antique and doesn't support it Cheers anyway....I'll add it into my program once i have better access to a SSE2 computer. _________________ Raedwulf |
|||
02 Apr 2006, 16:38 |
|
Reverend 26 May 2006, 16:06
Raedwulf: OllyDbg can show SSE registers. Just go to 'Debuggin options' -> 'Registers' and match 'Decode SSE registers' . But there is an error in OllyDbg which I found and already noticed Oleh Yuschuk. When you check to decode this register the debugger won't catch some exceptions. Most of the times it isn't even noticeable, but I was once traceing something and I was furious, it didn't catch exceptions! Author himself admitted it's an error, but as the 1.10d version is no longer evolving the error remains unfixed.
|
|||
26 May 2006, 16:06 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.