flat assembler
Message board for the users of flat assembler.
Index
> Main > fastest register zero test? |
Author |
|
LocoDelAssembly 01 Mar 2007, 23:29
TEST RAX, RAX ?
|
|||
01 Mar 2007, 23:29 |
|
vid 01 Mar 2007, 23:44
it's best to order instruction that way, that you can check ZF directly after last aritmetic instructions
|
|||
01 Mar 2007, 23:44 |
|
MazeGen 02 Mar 2007, 13:33
In most cases, you don't accelerate your code using ADD/SUB/TEST/AND/OR/whatever instead of CMP.
As for conditional forward jump, it should jump in less likely case - if it is less likely having zeroed register, you should jump to .zero label. It depends on the algorithm. |
|||
02 Mar 2007, 13:33 |
|
vid 02 Mar 2007, 13:36
MazeGen: does it matter even if you use branch hints?
|
|||
02 Mar 2007, 13:36 |
|
MazeGen 02 Mar 2007, 13:38
Branch hint overrides processor's assumption about code flow, so it doesn't matter.
|
|||
02 Mar 2007, 13:38 |
|
f0dder 04 Mar 2007, 12:27
MazeGen wrote: Branch hint overrides processor's assumption about code flow, so it doesn't matter. How many processors support (and honor) those hints, though? _________________ - carpe noctem |
|||
04 Mar 2007, 12:27 |
|
MazeGen 05 Mar 2007, 08:19
f0dder wrote:
What I know is they are documented at least since Pentium (plain). I don't know if they were documented earlier because I don't own i486's manual |
|||
05 Mar 2007, 08:19 |
|
f0dder 05 Mar 2007, 13:58
Documented since pplain?!
I don't think I saw them appear in the Intel PDFs until the P4? _________________ - carpe noctem |
|||
05 Mar 2007, 13:58 |
|
MazeGen 05 Mar 2007, 15:26
You're right, f0dder, I had to be drunk or what The funny thing is that I already have first beta version of the x86 reference (where they are listed since P4) and I haven't even looked at that |
|||
05 Mar 2007, 15:26 |
|
f0dder 05 Mar 2007, 15:46
I think I read somewhere that even though the instructions were introduced in the P4 reference, the instructions weren't actually honored by the processor... but that might just have been a drunken dream
|
|||
05 Mar 2007, 15:46 |
|
LocoDelAssembly 05 Mar 2007, 15:54
Guys, maybe you read that those prefixes are reserved for future use and is not guaranteed that in the future those prefixes will remain as segment override.
|
|||
05 Mar 2007, 15:54 |
|
r22 06 Mar 2007, 03:01
If you can avoid the conditional branch, that would probably be the fastest way.
Code: Function: MOV R15,.ZERO_LABEL MOV R14,.NOTZERO_LABEL ... TEST RAX,RAX CMOVNZ R14,R15 JMP QWORD R14 .ZERO_LABEL: ... JMP .DONE .NOTZER_LABEL: ... .DONE: ... RET NoW in a one pass function I'm not sure if it'll actually be faster than a jz or jnz but in a loop I could see it beating branch prediction and conditional jumps. SOMEONE SHOULD PROBABLY TEST THIS KIND OF THING. Code: Function: MOV R15,.LOOP_LABEL MOV R14,.DONE ... .LOOP_LABEL: MOV R13,R14 ... TEST RAX,RAX CMOVNZ R13,R15 JMP QWORD R13 .DONE: ... RET |
|||
06 Mar 2007, 03:01 |
|
lazer1 23 Mar 2007, 22:43
r22 wrote: If you can avoid the conditional branch, that would probably be the fastest way. thats a wierd trick I'll try and benchmark it sometime, |
|||
23 Mar 2007, 22:43 |
|
Goplat 25 Mar 2007, 19:26
r22 wrote: If you can avoid the conditional branch, that would probably be the fastest way. [...] JMP QWORD R14 |
|||
25 Mar 2007, 19:26 |
|
LocoDelAssembly 25 Mar 2007, 19:37
Quote:
Are not predicted anymore? It's supposed that PPro CPUs and newer (except PMMX) are able to predict it because the BTB also stores the destination address. But of course it will surely mispredict the first time. |
|||
25 Mar 2007, 19:37 |
|
r22 26 Mar 2007, 00:19
My code snippets were only a suggestion, I beleive I made it clear that it was untested.
In a case where branch prediction is impossible would be where my solution would ***probably*** be faster. For instance looping to check every bit of a randomly set register and performing a different task based on if the bit is set or NOT set. In the case describe above you could expect a misprediction 50% of the time. Although on the Core2 architecture isn't there an optimization along the lines of cmp and condition jmp pairs so maybe my suggestion is slower is all cases in regards to newer hardware. |
|||
26 Mar 2007, 00:19 |
|
Hayden 26 Mar 2007, 01:41
Just personal opinion.
Test r/m, -1 is the best way to test for zero. The instruction is small, the result is discarded ( unlike and r/m, -1) and the ZF flag can be used as a bool ( unlike cmp r/m -1). also... the best way to get hardware to initiate branch prediction is to test for the MOST LIKELY scenario and then jump if that condition is met. this method of test/jump has a major speed improvement on some hardware initiated branch prediction cpu's. example: if eax is zero and you use test eax, -1 then jnz would be the hardware branch prediction since we are testing eax for -1. ( testing for non-zero ) |
|||
26 Mar 2007, 01:41 |
|
Hayden 26 Mar 2007, 01:44
ps. footnote: branch prediction comes down to the CPU stepping. The prefix byte isn't needed anymore.
|
|||
26 Mar 2007, 01:44 |
|
vapourmile 31 Mar 2007, 00:10
I am an ex 6510 programmer, this answer worked on that CPU, so you may like to try it on yours! : )
On the 6502 you would rarely have to CMP with #0, even when you needed to branch on it because if the result of the previous instruction was #0, the zero flag would be set, so it's all about the what you're doing immediately before your CMP. e.g: MESSAGE.Length = END - MESSAGE - 1 ; Calculate string length. LDX #MESSAGE.Length ; Load index value. LOOP LDA MESSAGE,X ; Load from Messgage-address + X-register. STA SCREEN,X ; Store at screen-address + X-register DEX ; Decrement X BPL LOOP ; Branch while still positive. RTS ; Return. MESSAGE .BYTE "Hello you lot!" END You don't need to use a CPX. This works if X hits zero with a BNE (branch on non-zero) if you also take one away from the adresses to LDA and STA to (because the lowest X gets to in the loop then is 1). Hope this inspires some experimentation, most of all, I hope it works! _________________ You're groovy. I think you're just great! |
|||
31 Mar 2007, 00:10 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.