flat assembler
Message board for the users of flat assembler.

Index > Main > Redundant logical SSE instructions?

Author
Thread Post new topic Reply to topic
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 21 Jun 2005, 13:59
This is just a small thingy:

It is said that ORPS/ANDPS/XORPS/ANDNPS perform corresponding logical operations on the 4 singles of 2 xmm-registers bitwise.
Also, ORPD/ANDPD/XORPD/ANDNPD do the the same logical operations, but on 2 doubles. But since all of these instructions only work on 128bits, each single version is completely equal to its double version, so that both versions can be randomly exchanged or one version can be spared at all.
It even seems that they return the same mxcsr flags and even have the same exception behaviour.

This leads me to the point, that these 4 double precision logical instructions which were introduced with the SSE2 are completely superfluous.(No need to fill the overbloated datasheet with unneccessary further instructions)
Furthermore, those double precision instructions are 1 byte longer.

Anyway, best we can do is ignoring those instructions.

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 21 Jun 2005, 13:59
View user's profile Send private message Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 21 Jun 2005, 14:30
Hy MCD,
single, double are floating point numbers with different precisions...

but if you can use instructions one byte smaller you can optimize 256 byte intros for example.
Post 21 Jun 2005, 14:30
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 21 Jun 2005, 22:33
If they were integers - there would be no difference between 128bit at a time or 1bit at a time.
But like Matrix said - these work on floating point values so there is a precision difference and also if you have double for example then you don't need to first convert it to single because you already have the appropriate functions.
Post 21 Jun 2005, 22:33
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 22 Jun 2005, 11:22
Madis731 wrote:
If they were integers - there would be no difference between 128bit at a time or 1bit at a time.
But like Matrix said - these work on floating point values so there is a precision difference
As far as I know, the "or", "and", "andn" and "xor" SSE-instructions simply perform logical operations on each of the 128bits of 2 xmm-registers, there is no floating point calculation stuff. This actually means that mantissa, exponent and sign are all treated the same way, like it is just a big 128bit integer.

I just verified this with OllyDbg v1.1. (On a Pentium 4 machine with "double precision" versions of the instructions worked around with a "db 66h" before those since Olly doesn't SSE2/3).

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 22 Jun 2005, 11:22
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 22 Jun 2005, 17:30
if ANDPS, ORPS, XORPS and ANDNPS do the same as their respective suffix PD opcodes, then they are redundant.

I first thought maybe they affect SIMD FP execptions but after looking at documentation I found this was not the case.
NaNs don't seem to be a factor with these instructions.

MAYBE the suffix PD instructions are faster than the suffix PS ones.
Post 22 Jun 2005, 17:30
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 23 Jun 2005, 12:02
r22 wrote:
I first thought maybe they affect SIMD FP execptions but after looking at documentation I found this was not the case.
NaNs don't seem to be a factor with these instructions.

MAYBE the suffix PD instructions are faster than the suffix PS ones.

Well, actually, my TSCBENCW program with those instruction showed for both versions a 1 cycle clock; and actually its unlikely that different binary logical instructions have different timings, cause they are one of the most simplest to add in the ALU.

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 23 Jun 2005, 12:02
View user's profile Send private message Reply with quote
SDragon



Joined: 13 Sep 2005
Posts: 19
Location: Siberia
SDragon 14 Sep 2005, 14:04
From Intel manuals, vol. 1, chapter 11.6.9:
... In this example, XORPS or PXOR can be used instead of XORPD and yield the same correct result. However, because of type mismatch between the operand data type and the instruction data type, a latency penalty will be incurred due to implementations of the instructions at the microarchitecture level.

So, two logical instructions XORPS/XORPD, ANDPS/ANDPD are functionally equivalent, but using single precision instruction on double values is slower than doing double calculations with double values. At least, Intel says so.
Post 14 Sep 2005, 14:04
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.