flat assembler
Message board for the users of flat assembler.

Index > Main > XORPD/XORPS, ORPD/ORPS, and others

Author
Thread Post new topic Reply to topic
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
XORPS's opcode is 0F57 /r and XORPD's is the same with a 66 prefix. They both perform the exact same operation, do they not? I mean, regardless if you're using the two XMM registers as two doubles or 16 8 bit integers, correct?

If this is the case, the only reason you'd use XORPD would be if you needed another byte of padding or something.

Is all this correct?

If so, what are other instructions that have this property?

XORP{D,S}, ORP{D,S}, ANDP{D,S}, what else?

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 24 Jul 2007, 17:05
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I don't know if there is difference neither Confused And also I don't know why the AMD software optimization guide does examples with both.

Example:
Quote:
9.12 Use XOR Operations to Negate Operands of SSE,
SSE2, and 3DNow!™ Instructions
Optimization
For AMD Athlon, AMD Athlon 64, and AMD Opteron processors, use instructions that perform
XOR operations (PXOR, XORPS, and XORPD) instead of multiplication instructions to change the
sign bit of operands of SSE , SSE2, and 3DNow! instructions.
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
On the AMD Athlon 64 and AMD Opteron processors, using XOR-type instructions allows for more
parallelism, as these instructions can execute in either the FADD or FMUL pipe of the floating-point
unit.
Single Precision
For single-precision, you can use either 3DNow! or SSE SIMD XOR operations. The latency of
multiplying by –1.0 in 3DNow! is 4 cycles, while the latency of using the PXOR instruction is only
2 cycles. Similarly, the latency of the MULPS instruction is 5 cycles, while the latency of the XORPS
instruction is 3 cycles. The following code example illustrates how to toggle the sign bit of a number
using 3DNow! instructions:
signmask DQ 8000000080000000h
pxor mm0, [signmask] ; Toggle sign bits of both floats.
This example does the same thing using SSE instructions:
signmask DQ 8000000080000000h,8000000080000000h
xorps xmm0, [signmask] ; Toggle sign bits of all four floats.
Double Precision
To perform double-precision arithmetic, you can use the XORPD instruction—similar to the singleprecision
example—to flip the sign of packed double-precision floating-point operands. The XORPD
instruction takes 3 cycles to execute, whereas the MULPD instruction requires 5 cycles.
signmask DQ 8000000000000000h,8000000000000000h
xorpd xmm0, [signmask] ; Toggle sign bit of both doubles.

Among others like clearing registers, etc that also shows both variants as if they would really needed.
Post 24 Jul 2007, 17:42
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17665
Location: In your JS exploiting you and your system
revolution
mattst88 wrote:
If this is the case, the only reason you'd use XORPD would be if you needed another byte of padding or something.
There is a section in one of the Intel manuals that explains why you should use the proper flavour of instruction for the data that you are using. I don't have the manual available to me now to quote but it seems that the processor keeps an internal flag to say what "type" of data the XMM register contains (int, single or double). The manual also describes that there can be a performance hit by switching to a differently "typed" instruction. This would only make a difference to optimised code timings, the actual data manipulation is still the same regardless of the instruction you use.
Post 25 Jul 2007, 05:39
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
...and if you read Agner's manuals additionally, you notice that you can use "wrong" format instruction to gain performance because of the ports they use and what ports are available at that specific moment.

Optimizing is fine art and don't throw SUB out just yet just because you can do ADD eax, negative constant Wink A lot of instructions are doubled or even tripled.

MOVDQA loads aligned data, but if you care about speed then use MOVAPS because its one byte shorter. If you want padding, you can use PSHUFD xmm,xmm/mm128,00000000b to do the trick Very Happy
Post 25 Jul 2007, 14:44
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I read what revolution said yesterday in the volume 1 of the AMD64 manuals. However it said that there is a hidden bit that tells if the type is integer or float but I not found (maybe because I got tired to read) any reference that the CPU also keeps track of the width of the type.

I think I'll start using the Intel's manuals because the AMD's ones seems to be not clear as the Intel's manuals.
Post 25 Jul 2007, 15:37
View user's profile Send private message Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
i got this when clicked the link to My x86 Instruction Reference
Code:
XML Parsing Error: junk after document element
Location: http://mattst88.com/programming/asmref/
Line Number 2, Column 1:<b>Warning</b>:  include(../../header.inc.php) [<a href='function.include'>function.include</a>]: failed to open stream: No such file or directory in <b>/var/www/mattst88.com/htdocs/programming/asmref/index.php</b> on line <b>11</b><br />
^    
Post 09 Aug 2007, 08:25
View user's profile Send private message Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
0.1 wrote:
i got this when clicked the link to My x86 Instruction Reference
Code:
XML Parsing Error: junk after document element
Location: http://mattst88.com/programming/asmref/
Line Number 2, Column 1:<b>Warning</b>:  include(../../header.inc.php) [<a href='function.include'>function.include</a>]: failed to open stream: No such file or directory in <b>/var/www/mattst88.com/htdocs/programming/asmref/index.php</b> on line <b>11</b><br />
^    


Fixed, thanks for the heads up. Smile

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 09 Aug 2007, 14:16
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
revolution wrote:
There is a section in one of the Intel manuals that explains why you should use the proper flavour of instruction for the data that you are using. I don't have the manual available to me now to quote but it seems that the processor keeps an internal flag to say what "type" of data the XMM register contains (int, single or double). The manual also describes that there can be a performance hit by switching to a differently "typed" instruction. This would only make a difference to optimised code timings, the actual data manipulation is still the same regardless of the instruction you use.

nevertheless, there is no such bit in the MMX/3DNow registers. Does this really speed up execution of the instructions?
I'm very scepticle about its benefit. I wouldn't have introduced such a thing if I have designed the SSE instructions
Post 29 Oct 2007, 14:36
View user's profile Send private message Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
Some researchers like A.Fog claim /afair:)/ that pxor* is preferable than xorp* cuz it breaks dependecy chain on some CPU families while xorp* don't. But other sources as written above claim that appropriate instruction should be used depending on context - what kind of data is used. Only profiling on a certain cpu can prove the right variant.
Post 29 Oct 2007, 17:09
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.