flat assembler
Message board for the users of flat assembler.
Index
> Windows > SSE4 / CMP - PTEST Floating Point Optimisation Question |
Author |
|
Kuemmel 31 May 2009, 16:59
Hi guys,
I'm currently trying to make use of the SSE4.1 instruction PTEST, the same like TEST in normal x86 code. I tried to use it to get rid of the CMPLEPD / MOVMSKPD sequence in my Mandelbrot Bench here, it's a check if one of the two double iteration values in xmm2 exceed the value of 4.0: Code: CMPLEPD xmm2, [.local_radiant] ; xmm2 <= 4.0 | 4.0 ? MOVMSKPD edi, xmm2 ...snip... CMP edi,11b JNE .end_of_iteration_12_diverged ; point 1 or 2 diverged -> exit iteration Now with PTEST I only came up with Code: SUBPD xmm2, [.local_radiant] ; xmm2 - 4.0 | 4.0 ...snip... PTEST xmm2, [.xmm_neg1] ; xmm2 AND negative FLAG of Double 0 (Bit 63 set in xmm_neg1) JZ .end_of_iteration_12_diverged ; point 1 diverged -> exit iteration PTEST xmm2, [.xmm_neg2] ; xmm2 AND negative FLAG of Double 1 (Bit 127 set in xmm_neg2) JZ .end_of_iteration_12_diverged ; point 2 diverged -> exit iteration Of course this doesn't help much to make it faster...so my question would be if with some clever idea I can get rid of one of the PTEST commands and only use 1 of them...or even get rid of the SUBPD...my be there's a direct PTEST - idea to check if a double value is > 4.0 ? May be the solution is obvious and just my mind blocked today By the way...within my search, I found a good page to play with floating point numbers and their binary representation, just didn't help me at the moment: http://babbage.cs.qc.edu/IEEE-754/Decimal.html |
|||
31 May 2009, 16:59 |
|
LocoDelAssembly 31 May 2009, 19:03
IntelĀ® SSE4 Programming Reference wrote: PTEST sets the ZF flag only if all bits in the result are 0 of the bitwise AND of the Code: SIGN_BIT = 1 shl 63 .xmm_neg dq SIGN_BIT dq SIGN_BIT . . . ptest xmm2, dqword [.xmm_neg] jnc .end_of_iteration_12_diverged If I understand correctly the jump will be executed when any of the register parts hasn't the sign bit set. Isn't this the same behavior your old code has? Last edited by LocoDelAssembly on 31 May 2009, 20:30; edited 1 time in total |
|||
31 May 2009, 19:03 |
|
Kuemmel 31 May 2009, 19:57
@LocoDelAssembly: I think that doesn't work, my old code branches, when one of the two OR both registers are > 4.0, so after the SUBPD I need to branch when one of the sign bits are set AND also if none of the both is set, like (that's why I think I needed two ptests's):
Code: (xmm2 - 4) = gives: - | - => continue + | - => branch (ptest signs = neg) - | + => branch (ptest signs = neg) + | + => branch (ptest signs = pos) @Bitrake: Yeah, that's one idea, I think. So to do it I need another MOV xmm, [4.0 | 4.0] but it seems there's still a benefit from my first tests...so with Code: (4 - xmm2) = gives: - | - => branch (ptest signs = neg) + | - => branch (ptest signs = neg) - | + => branch (ptest signs = neg) + | + => continue |
|||
31 May 2009, 19:57 |
|
LocoDelAssembly 31 May 2009, 20:30
Kuemmel, are you sure? Please note that I'm testing the carry flag so I'm actually testing if [(NOT xmm2) AND .xmm_neg] is zero or not. If any of the parts hasn't the sign bit set then the NOT of ptest will set it and then will be detected via CF.
I have made a mistake though, it is jnc, not jc, I'll correct that. |
|||
31 May 2009, 20:30 |
|
Kuemmel 31 May 2009, 20:50
...oh, I just copy and pasted your code without checking too much, so now with the JNC it works !!! Thanks...though I got to get that NOT logic into my mind...at the moment even with lower instructions both, yours and Bitrake's seem to deliver the same benefit...
|
|||
31 May 2009, 20:50 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.