flat assembler
Message board for the users of flat assembler.

Index > Windows > SSE4 / CMP - PTEST Floating Point Optimisation Question

Author
Thread Post new topic Reply to topic
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 31 May 2009, 16:59
Hi guys,

I'm currently trying to make use of the SSE4.1 instruction PTEST, the same like TEST in normal x86 code. I tried to use it to get rid of the CMPLEPD / MOVMSKPD sequence in my Mandelbrot Bench here, it's a check if one of the two double iteration values in xmm2 exceed the value of 4.0:
Code:
CMPLEPD  xmm2, [.local_radiant]      ; xmm2 <= 4.0 | 4.0 ?
MOVMSKPD edi, xmm2

...snip...

CMP  edi,11b
JNE .end_of_iteration_12_diverged   ; point 1 or 2 diverged -> exit iteration
    


Now with PTEST I only came up with
Code:
SUBPD xmm2, [.local_radiant]       ; xmm2 - 4.0 | 4.0

...snip...

PTEST xmm2, [.xmm_neg1]             ; xmm2 AND negative FLAG of Double 0 (Bit 63 set in xmm_neg1)
JZ   .end_of_iteration_12_diverged  ; point 1 diverged -> exit iteration
PTEST xmm2, [.xmm_neg2]             ; xmm2 AND negative FLAG of Double 1 (Bit 127 set in xmm_neg2)
JZ   .end_of_iteration_12_diverged  ; point 2 diverged -> exit iteration
    


Of course this doesn't help much to make it faster...so my question would be if with some clever idea I can get rid of one of the PTEST commands and only use 1 of them...or even get rid of the SUBPD...my be there's a direct PTEST - idea to check if a double value is > 4.0 ?

May be the solution is obvious and just my mind blocked today Wink

By the way...within my search, I found a good page to play with floating point numbers and their binary representation, just didn't help me at the moment:
http://babbage.cs.qc.edu/IEEE-754/Decimal.html
Post 31 May 2009, 16:59
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 31 May 2009, 18:39
A SUBRPD would fix it - but that instruction doesn't exist. Very Happy Then the branch condition would need to be reversed and the sign bits could be combined to a single PTEST. Any optimal way to reverse the subtraction?

Z <- Z^2 + C; until ABS(Z)>=2 also works.
?High bit of the exponent is set for numbers >= two?
Post 31 May 2009, 18:39
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 31 May 2009, 19:03
IntelĀ® SSE4 Programming Reference wrote:
PTEST sets the ZF flag only if all bits in the result are 0 of the bitwise AND of the
destination operand (first operand) and the source operand (second operand). PTEST
sets the CF flag if all bits in the result are 0 of the bitwise AND of the source operand
(second operand) and the logical NOT of the destination operand.


Code:
SIGN_BIT = 1 shl 63

.xmm_neg dq SIGN_BIT
         dq SIGN_BIT
.
.
.

ptest xmm2, dqword [.xmm_neg]
jnc   .end_of_iteration_12_diverged    


If I understand correctly the jump will be executed when any of the register parts hasn't the sign bit set. Isn't this the same behavior your old code has?


Last edited by LocoDelAssembly on 31 May 2009, 20:30; edited 1 time in total
Post 31 May 2009, 19:03
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 31 May 2009, 19:57
@LocoDelAssembly: I think that doesn't work, my old code branches, when one of the two OR both registers are > 4.0, so after the SUBPD I need to branch when one of the sign bits are set AND also if none of the both is set, like (that's why I think I needed two ptests's):
Code:
(xmm2 - 4) = gives: 
- | -   => continue
+ | -   => branch (ptest signs = neg)  
- | +   => branch (ptest signs = neg)
+ | +   => branch (ptest signs = pos)
    

@Bitrake: Yeah, that's one idea, I think. So to do it I need another MOV xmm, [4.0 | 4.0] but it seems there's still a benefit from my first tests...so with
Code:
(4 - xmm2) = gives:
- | -   => branch (ptest signs = neg)
+ | -   => branch (ptest signs = neg)  
- | +   => branch (ptest signs = neg)
+ | +   => continue
    
Post 31 May 2009, 19:57
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 31 May 2009, 20:30
Kuemmel, are you sure? Please note that I'm testing the carry flag so I'm actually testing if [(NOT xmm2) AND .xmm_neg] is zero or not. If any of the parts hasn't the sign bit set then the NOT of ptest will set it and then will be detected via CF.

I have made a mistake though, it is jnc, not jc, I'll correct that.
Post 31 May 2009, 20:30
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 31 May 2009, 20:50
...oh, I just copy and pasted your code without checking too much, so now with the JNC it works !!! Thanks...though I got to get that NOT logic into my mind...at the moment even with lower instructions both, yours and Bitrake's seem to deliver the same benefit...
Post 31 May 2009, 20:50
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.