flat assembler
Message board for the users of flat assembler.

Index > Main > fastest register zero test?

Author
Thread Post new topic Reply to topic
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
what is the best way to test a long mode
register is 0?

currently I have been using eg:

Code:
          cmp rax, 0
          jne .nonzero
.zero:
          ............
          jmp .done
.nonzero:
          ............
.done:
    


but there must be other ways of doing it eg:

Code:
         add rax,0
         jnz .nonzero:

.zero:
          ...........
          jmp .done


.nonzero:
        ..............

.done:
    

and presumably you put the more likely of .zero or .nonzero
as the first case?



Shocked
Post 01 Mar 2007, 23:18
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
TEST RAX, RAX ?
Post 01 Mar 2007, 23:29
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
it's best to order instruction that way, that you can check ZF directly after last aritmetic instructions
Post 01 Mar 2007, 23:44
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
In most cases, you don't accelerate your code using ADD/SUB/TEST/AND/OR/whatever instead of CMP.

As for conditional forward jump, it should jump in less likely case - if it is less likely having zeroed register, you should jump to .zero label. It depends on the algorithm.
Post 02 Mar 2007, 13:33
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
MazeGen: does it matter even if you use branch hints?
Post 02 Mar 2007, 13:36
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
Branch hint overrides processor's assumption about code flow, so it doesn't matter.
Post 02 Mar 2007, 13:38
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
MazeGen wrote:
Branch hint overrides processor's assumption about code flow, so it doesn't matter.


How many processors support (and honor) those hints, though?

_________________
Image - carpe noctem
Post 04 Mar 2007, 12:27
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
f0dder wrote:
MazeGen wrote:
Branch hint overrides processor's assumption about code flow, so it doesn't matter.


How many processors support (and honor) those hints, though?

What I know is they are documented at least since Pentium (plain). I don't know if they were documented earlier because I don't own i486's manual Sad
Post 05 Mar 2007, 08:19
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Documented since pplain?!

I don't think I saw them appear in the Intel PDFs until the P4?

_________________
Image - carpe noctem
Post 05 Mar 2007, 13:58
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
Embarassed

You're right, f0dder, I had to be drunk or what Confused

The funny thing is that I already have first beta version of the x86 reference (where they are listed since P4) and I haven't even looked at that Rolling Eyes
Post 05 Mar 2007, 15:26
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
I think I read somewhere that even though the instructions were introduced in the P4 reference, the instructions weren't actually honored by the processor... but that might just have been a drunken dream Smile
Post 05 Mar 2007, 15:46
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Guys, maybe you read that those prefixes are reserved for future use and is not guaranteed that in the future those prefixes will remain as segment override.
Post 05 Mar 2007, 15:54
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
If you can avoid the conditional branch, that would probably be the fastest way.

Code:
Function:
MOV R15,.ZERO_LABEL
MOV R14,.NOTZERO_LABEL
...
TEST RAX,RAX
CMOVNZ R14,R15
JMP QWORD R14
.ZERO_LABEL:
...
JMP .DONE
.NOTZER_LABEL:
...
.DONE:
...
RET
    


NoW in a one pass function I'm not sure if it'll actually be faster than a jz or jnz but in a loop I could see it beating branch prediction and conditional jumps. SOMEONE SHOULD PROBABLY TEST THIS KIND OF THING.

Code:
Function:
MOV R15,.LOOP_LABEL
MOV R14,.DONE
...
.LOOP_LABEL:
MOV R13,R14
...
TEST RAX,RAX
CMOVNZ R13,R15
JMP QWORD R13
.DONE:
...
RET
    
Post 06 Mar 2007, 03:01
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
r22 wrote:
If you can avoid the conditional branch, that would probably be the fastest way.

Code:
Function:
MOV R15,.ZERO_LABEL
MOV R14,.NOTZERO_LABEL
...
TEST RAX,RAX
CMOVNZ R14,R15
JMP QWORD R14
.ZERO_LABEL:
...
JMP .DONE
.NOTZER_LABEL:
...
.DONE:
...
RET
    


NoW in a one pass function I'm not sure if it'll actually be faster than a jz or jnz but in a loop I could see it beating branch prediction and conditional jumps. SOMEONE SHOULD PROBABLY TEST THIS KIND OF THING.

Code:
Function:
MOV R15,.LOOP_LABEL
MOV R14,.DONE
...
.LOOP_LABEL:
MOV R13,R14
...
TEST RAX,RAX
CMOVNZ R13,R15
JMP QWORD R13
.DONE:
...
RET
    


thats a wierd trick Surprised

I'll try and benchmark it sometime,
Post 23 Mar 2007, 22:43
View user's profile Send private message Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat
r22 wrote:
If you can avoid the conditional branch, that would probably be the fastest way. [...] JMP QWORD R14
Conditional jumps are only slow when they aren't correctly predicted. Indirect jumps aren't predicted at all, so they're even worse.
Post 25 Mar 2007, 19:26
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:

Indirect jumps aren't predicted at all

Are not predicted anymore? It's supposed that PPro CPUs and newer (except PMMX) are able to predict it because the BTB also stores the destination address. But of course it will surely mispredict the first time.
Post 25 Mar 2007, 19:37
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
My code snippets were only a suggestion, I beleive I made it clear that it was untested.

In a case where branch prediction is impossible would be where my solution would ***probably*** be faster. For instance looping to check every bit of a randomly set register and performing a different task based on if the bit is set or NOT set. In the case describe above you could expect a misprediction 50% of the time.

Although on the Core2 architecture isn't there an optimization along the lines of cmp and condition jmp pairs so maybe my suggestion is slower is all cases in regards to newer hardware.
Post 26 Mar 2007, 00:19
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Hayden



Joined: 06 Oct 2005
Posts: 132
Hayden
Just personal opinion.

Test r/m, -1 is the best way to test for zero. The instruction is small, the result is discarded ( unlike and r/m, -1) and the ZF flag can be used as a bool ( unlike cmp r/m -1).

also... the best way to get hardware to initiate branch prediction is to test for the MOST LIKELY scenario and then jump if that condition is met. this method of test/jump has a major speed improvement on some hardware initiated branch prediction cpu's.

example: if eax is zero and you use test eax, -1 then jnz would be the hardware branch prediction since we are testing eax for -1.
( testing for non-zero )
Post 26 Mar 2007, 01:41
View user's profile Send private message Reply with quote
Hayden



Joined: 06 Oct 2005
Posts: 132
Hayden
ps. footnote: branch prediction comes down to the CPU stepping. The prefix byte isn't needed anymore.
Post 26 Mar 2007, 01:44
View user's profile Send private message Reply with quote
vapourmile



Joined: 30 Mar 2007
Posts: 4
vapourmile
I am an ex 6510 programmer, this answer worked on that CPU, so you may like to try it on yours! : )

On the 6502 you would rarely have to CMP with #0, even when you needed to branch on it because if the result of the previous instruction was #0, the zero flag would be set, so it's all about the what you're doing immediately before your CMP. e.g:

MESSAGE.Length = END - MESSAGE - 1 ; Calculate string length.
LDX #MESSAGE.Length ; Load index value.
LOOP LDA MESSAGE,X ; Load from Messgage-address + X-register.
STA SCREEN,X ; Store at screen-address + X-register
DEX ; Decrement X
BPL LOOP ; Branch while still positive.
RTS ; Return.
MESSAGE .BYTE "Hello you lot!"
END

You don't need to use a CPX.

This works if X hits zero with a BNE (branch on non-zero) if you also take one away from the adresses to LDA and STA to (because the lowest X gets to in the loop then is 1).

Hope this inspires some experimentation, most of all, I hope it works!

_________________
You're groovy. I think you're just great!
Post 31 Mar 2007, 00:10
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.