XORPD/XORPS, ORPD/ORPS, and others

Index > Main > XORPD/XORPS, ORPD/ORPS, and others

Author

Thread

mattst88

Joined: 12 May 2006
Posts: 260

mattst88 24 Jul 2007, 17:05

XORPS's opcode is 0F57 /r and XORPD's is the same with a 66 prefix. They both perform the exact same operation, do they not? I mean, regardless if you're using the two XMM registers as two doubles or 16 8 bit integers, correct?

If this is the case, the only reason you'd use XORPD would be if you needed another byte of padding or something.

Is all this correct?

If so, what are other instructions that have this property?

XORP{D,S}, ORP{D,S}, ANDP{D,S}, what else?

_________________
Assembly Programmer's Journal

24 Jul 2007, 17:05

LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4623
Location: Argentina

LocoDelAssembly 24 Jul 2007, 17:42

I don't know if there is difference neither Confused

And also I don't know why the AMD software optimization guide does examples with both.

Example:

Quote:

9.12 Use XOR Operations to Negate Operands of SSE,
SSE2, and 3DNow!™ Instructions
Optimization
For AMD Athlon, AMD Athlon 64, and AMD Opteron processors, use instructions that perform
XOR operations (PXOR, XORPS, and XORPD) instead of multiplication instructions to change the
sign bit of operands of SSE , SSE2, and 3DNow! instructions.
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
On the AMD Athlon 64 and AMD Opteron processors, using XOR-type instructions allows for more
parallelism, as these instructions can execute in either the FADD or FMUL pipe of the floating-point
unit.
Single Precision
For single-precision, you can use either 3DNow! or SSE SIMD XOR operations. The latency of
multiplying by –1.0 in 3DNow! is 4 cycles, while the latency of using the PXOR instruction is only
2 cycles. Similarly, the latency of the MULPS instruction is 5 cycles, while the latency of the XORPS
instruction is 3 cycles. The following code example illustrates how to toggle the sign bit of a number
using 3DNow! instructions:
signmask DQ 8000000080000000h
pxor mm0, [signmask] ; Toggle sign bits of both floats.
This example does the same thing using SSE instructions:
signmask DQ 8000000080000000h,8000000080000000h
xorps xmm0, [signmask] ; Toggle sign bits of all four floats.
Double Precision
To perform double-precision arithmetic, you can use the XORPD instruction—similar to the singleprecision
example—to flip the sign of packed double-precision floating-point operands. The XORPD
instruction takes 3 cycles to execute, whereas the MULPD instruction requires 5 cycles.
signmask DQ 8000000000000000h,8000000000000000h
xorpd xmm0, [signmask] ; Toggle sign bit of both doubles.

Among others like clearing registers, etc that also shows both variants as if they would really needed.

24 Jul 2007, 17:42

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20690
Location: In your JS exploiting you and your system

revolution 25 Jul 2007, 05:39

mattst88 wrote:

If this is the case, the only reason you'd use XORPD would be if you needed another byte of padding or something.

There is a section in one of the Intel manuals that explains why you should use the proper flavour of instruction for the data that you are using. I don't have the manual available to me now to quote but it seems that the processor keeps an internal flag to say what "type" of data the XMM register contains (int, single or double). The manual also describes that there can be a performance hit by switching to a differently "typed" instruction. This would only make a difference to optimised code timings, the actual data manipulation is still the same regardless of the instruction you use.

25 Jul 2007, 05:39

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 25 Jul 2007, 14:44

...and if you read Agner's manuals additionally, you notice that you can use "wrong" format instruction to gain performance because of the ports they use and what ports are available at that specific moment.

Optimizing is fine art and don't throw SUB out just yet just because you can do ADD eax, negative constant Wink

A lot of instructions are doubled or even tripled.

MOVDQA loads aligned data, but if you care about speed then use MOVAPS because its one byte shorter. If you want padding, you can use PSHUFD xmm,xmm/mm128,00000000b to do the trick Very Happy

25 Jul 2007, 14:44

LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4623
Location: Argentina

LocoDelAssembly 25 Jul 2007, 15:37

I read what revolution said yesterday in the volume 1 of the AMD64 manuals. However it said that there is a hidden bit that tells if the type is integer or float but I not found (maybe because I got tired to read) any reference that the CPU also keeps track of the width of the type.

I think I'll start using the Intel's manuals because the AMD's ones seems to be not clear as the Intel's manuals.

25 Jul 2007, 15:37

0.1

Joined: 24 Jul 2007
Posts: 474
Location: India

0.1 09 Aug 2007, 08:25

i got this when clicked the link to My x86 Instruction Reference

Code:

XML Parsing Error: junk after document element
Location: http://mattst88.com/programming/asmref/
Line Number 2, Column 1:<b>Warning</b>:  include(../../header.inc.php) [<a href='function.include'>function.include</a>]: failed to open stream: No such file or directory in <b>/var/www/mattst88.com/htdocs/programming/asmref/index.php</b> on line <b>11</b><br />
^

09 Aug 2007, 08:25

mattst88

Joined: 12 May 2006
Posts: 260

mattst88 09 Aug 2007, 14:16

0.1 wrote:

i got this when clicked the link to My x86 Instruction Reference

Code:

XML Parsing Error: junk after document element
Location: http://mattst88.com/programming/asmref/
Line Number 2, Column 1:<b>Warning</b>:  include(../../header.inc.php) [<a href='function.include'>function.include</a>]: failed to open stream: No such file or directory in <b>/var/www/mattst88.com/htdocs/programming/asmref/index.php</b> on line <b>11</b><br />
^

Fixed, thanks for the heads up. Smile

_________________
Assembly Programmer's Journal

09 Aug 2007, 14:16

MCD

Joined: 21 Aug 2004
Posts: 602
Location: Germany

MCD 29 Oct 2007, 14:36

revolution wrote:

There is a section in one of the Intel manuals that explains why you should use the proper flavour of instruction for the data that you are using. I don't have the manual available to me now to quote but it seems that the processor keeps an internal flag to say what "type" of data the XMM register contains (int, single or double). The manual also describes that there can be a performance hit by switching to a differently "typed" instruction. This would only make a difference to optimised code timings, the actual data manipulation is still the same regardless of the instruction you use.

nevertheless, there is no such bit in the MMX/3DNow registers. Does this really speed up execution of the instructions?
I'm very scepticle about its benefit. I wouldn't have introduced such a thing if I have designed the SSE instructions

29 Oct 2007, 14:36

asmfan

Joined: 11 Aug 2006
Posts: 392
Location: Russian

asmfan 29 Oct 2007, 17:09

Some researchers like A.Fog claim /afair:)/ that pxor* is preferable than xorp* cuz it breaks dependecy chain on some CPU families while xorp* don't. But other sources as written above claim that appropriate instruction should be used depending on context - what kind of data is used. Only profiling on a certain cpu can prove the right variant.

29 Oct 2007, 17:09

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum