flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Main > branch prediction

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
vivik



Joined: 29 Oct 2016
Posts: 190
branch prediction
So, 0x2E and 0x3E prefixes are actually useful or not?
Post 08 Aug 2017, 10:25
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15299
Location: Bigweld Industries
It depends. I expect in most cases they have negligible effect and have unmeasurable effects. But in some special code situations they can be useful.

I suggest that if you want to know if they improve your code that you measure it both with and without and see if the differences are consistent and significant.
Post 08 Aug 2017, 10:33
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 190
You keep telling "try for yourself", but testing on just one or two computers is pretty useless.

Maybe I'll ask people here to test those, on their machines... Wonder how many people are willing to do that.
Post 08 Aug 2017, 10:47
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15299
Location: Bigweld Industries
Without the tests you can't know if it is worthwhile. You might be including code that has no effect, or has a negative effect.

Test in the expected end-use case. If you don't know the end-use case then I don't know what to suggest. But guessing at the outcome would not be productive, there are too many interacting subsystems inside the CPUs that things become complicated very quickly.
Post 08 Aug 2017, 10:56
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 190
I'm making a game, it's expected to run on all shitty computers without problems. I own only two relatively shitty ones, not shitty enough though.
Post 08 Aug 2017, 11:13
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15299
Location: Bigweld Industries
If it is code that relies upon realtime responses then I would suggest that if the tiny improvement you might get from using branch hints is a make-or-break situation then you need to rethink the approach. There should be sufficient headroom to allow for unpredictable variances.

I'd expect branch hints to be useful where you have some long running task and would like it to finish a bit faster. In such cases the hardware is already known and appropriate tests can easily be run.
Post 08 Aug 2017, 11:22
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 190
I just don't want to do something stupid like ignoring something that cpu makers did for me.

Okay, so, this link
https://stackoverflow.com/questions/14332848/intel-x86-0x2e-0x3e-prefix-branch-prediction-actually-used
tells that those are no longer used, since pentium 4.

But this page still metions those, for some reason
https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts

So, pentium 4 was the first and last processor that actually used those prefixes?
Post 08 Aug 2017, 11:53
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 889
I was going to mention http://www.agner.org/optimize/, but it was already linked in the first link on stackoverflow you saw. Yes they're no longer used.

It's a good thing because it abstracts the whole thing, making newer CPUs able to do more without having to recompile the code at all. Plus, it's one byte less bloat. Wink
Post 08 Aug 2017, 11:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15299
Location: Bigweld Industries
The P4 was "special" in that the pipeline was very long to facilitate higher clock speeds during the clock speed wars. Also the branch table, and other enhancements, were much more limited. So Intel decided to pass the responsibility on to the programmer to ensure good performance in critical loops. Since then things have changed, radically, multiple times. Things are very different today. And this is why I encourage testing. Because tomorrow when someone encounters this discussion with their new-fangled CPU the answer will still be valid: Test it. Don't rely on some long since outdated Intel article. Also, there is no guarantee the branch hints won't be reintroduced in the future.
Post 08 Aug 2017, 12:06
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 190
His microarchitecture.pdf is so detailed, I wouldn't be able to digest it. I guess I'll believe stackoverflow then.


I found some C and C++ code that uses a lot of likely() unlikely() in it. I thought they actually add those prefixes, but looks like they just rearrange branches around. What's the point of those when, I can rearrange it by hand, it'll even be more readable.
Post 08 Aug 2017, 12:11
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 889
Likely/unlikely are high-level code hints, nothing to do with prefixes (unless you compile for P4 I guess). What do you mean with re-arranging by hand? The compiler performs a lot of transformations of your code. A branch could even be eliminated. You can't know that or shouldn't rely on that.

They don't always have an effect, but with HLL code it's to specify intent more so than an actual, direct effect in the output. It could re-arrange it or it could not, it depends on its optimizations. Don't rely on this behavior.

For example a branch doesn't need to be marked unlikely for it to be marked "cold" -- if it calls a cold function only, it is also marked cold in GCC.
Post 08 Aug 2017, 12:24
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 190
Um, that intel link says "when writing if-else or switch statements, check the most common cases first and work progressively down to the least common".

Doesn't that mean that this code

Code:
if (someFunc()) {
    expectedThis();
else {
    notExpectedThis();
}


and this code

Code:
if (likely(someFunc())) {
    expectedThis();
else {
    notExpectedThis();
}


are equivalent?

And maybe this one is same too.


Code:
if (unlikely(someFunc())) {
    notExpectedThis();
else {
    expectedThis();
}



The library I've seen it is supposed to be well tested, so I wonder wth.

Yes, I'll test it, I'll test it. This one is actually possible to test. Not sure how I'm supposed to study compiler's output, gdb is a torture device.


Last edited by vivik on 08 Aug 2017, 12:41; edited 1 time in total
Post 08 Aug 2017, 12:37
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15299
Location: Bigweld Industries
If you test it I think you will find the difference is unmeasurable. Really. These things make no measurable difference for most cases. These are meant for micro-improvements in critical tight loops.
Post 08 Aug 2017, 12:40
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 889

vivik wrote:
Um, that intel link says "when writing if-else or switch statements, check the most common cases first and work progressively down to the least common".

An if-chain short-circuits. The first if, if true, will never evaluate all the other "else if" chains, so of course it's faster to be first.

likely() is a hint to the compiler's optimizer. A compiler's optimizer is allowed to change and transform your code as long as the output is the same as defined by the language. It cannot change the chain of ifs differently because then the program can do something else (imagine if a function has side effects, like printing something) and the language explicitly forbids it.

likely() won't change the if-else chain, it will only hint the optimizer to optimize differently -- for example, unrolling the likely branch even if it doesn't happen as long as no side-effects are done (i.e. function calls with side effects won't participate in this optimization).


For switch statements, it doesn't make much sense, especially for large ones. They will be mostly implemented as a lookup jump table. Confused
Post 08 Aug 2017, 16:17
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 190
Tried an example from this page http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html

Assembly looks the same for correct and incorrect prediction. Result is also mostly same. Weird.
Post 09 Aug 2017, 19:28
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 889
__builtin_expect (what likely/unlikely use) is not about branch predicting in the CPU. It's about the compiler. Of course, the compiler could use such information to improve branch prediction in the CPU, but it does it anyway if it knows which path is likely or not -- various optimizations provide such information (and there's also profiling). That's in the optimization passes of course.

You can use __builtin_expect on variables as well though, if you expect them to have a certain value or be lower than something, the compiler could in theory generate code using such assumption as "likely".

Either way it's a compiler thing and doesn't translate to anything in the CPU directly. For certain code, it may do nothing at all. (I mean, literally, same asm code output)
Post 09 Aug 2017, 20:34
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 673
@vivik

The design of your code's branch prediction is NOT centered around the if-else constructs (or anything similar) per-se. The design is done much earlier and much bigger than that. By the time it reaches the if-else statement, it is probably, and at some point, useless to think about prediction/misprediction because the overall design of such code in question is probably not prediction-friendly to begin with in the first place.

Let's take FuncA(). This is designed around a "Yes" logical path. So whatever you do in your if-else statement, this function will always be prediction-friendly function because it was designed around the "truth". Any "false" jumps will not mess up with the main path most of the time.

FuncB(). No hot path consideration. It's up to the variables to decide the jumps. This will suffer the most and any micro design of if-else constructs inside it will not improve the speed much.
Post 10 Aug 2017, 03:15
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 673
EDIT:
To reduce the risk of branch mispredictions, you need to introduce some bias in your code design, favoring the most likely path that can do the main task to be solved at hand. Letting the code freely decides what branches to take, without any bias from the programmers will not help you crafting a better if-else constructs. The main strategy is always to lay out the main sequence of logical constructs that can achieve the final objective and insert any "unlikely" cases later in the code, where necessary.

For example, lets take a main "solution" sequence, biased (True case) towards a positive input from (RAX) and see how it can avoid misprediction that may occur.



Code:
        mov     rbx,10
        xor     rdx,rdx     
        mul     rbx
        nop
        nop             
        add rax,1               
        ret



Now, let's build an if-else construct around it (False case)


Code:
format elf64 

public main
main:
        mov     rcx,100_000_000
@@:     mov     rax,5 ;-5       ;test value     
        call    t_biased_function
        sub     rcx,1
        jnz     @b
        ret

;------------------------------------
;In favor of T
;Design: RAX is around +ve value
;------------------------------------
t_biased_function:
        mov     rbx,10
        cmp     rax,0     
        jl      .ok             ;don't take the jump
        xor     rdx,rdx     
        mul     rbx
        nop
        nop             
.ok:    add rax,1               
        ret




The result

Code:
perf stat -B ./testcode
0.21xxxx secs for T (true)
0.26xxxx secs for F (false)



This despite the facts that;

i) T-case has far more codes
ii) T-case has an obvious dependency chain (MUL/ADD)
iii) T-case has a MUL instruction


Last edited by system error on 10 Aug 2017, 13:12; edited 2 times in total
Post 10 Aug 2017, 10:57
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 889
I have no idea what your test is supposed to accomplish? Of course the mixed case is slower because the CPU fails to branch predict it. It's faster since in that case the CPU will branch predict it properly in all cases. Branch mispredictions are extremely expensive because the entire predicted operations in the pipeline have to be flushed when it is proven wrong. The more a branch is taken in one way (i.e. always taken or always not taken) the more the CPU can predict it the next time to be the same way. If it is the same way, it's 10-20+ clock cycles faster. Read Agner's pdf it explains this and the mispredicted hits.

The branch is not slower by itself. The fact is that the CPU does not wait for the operations to finish. Even if they have a dependency. It looks for more instructions to decode and execute. But which path should it decode? That's the whole point of branch prediction. If it's proven false, everything it decoded past that point is discarded. All that time was wasted. It has to start at the branch again after such a long time. Of course it's slower.

Note that most CPUs assume the branch by default is not taken. So it's better to make it so that it's not taken in the hot case. After a large amount of branches the differences get smaller though.


PS: your code is extremely badly written, I hope it was just for demonstration, it has a lot of needless bloat and useless instructions (even for what it does), a compiler would easily outdo it. Especially Intel's Compiler.

xor rdx, rdx is useless also and can be removed completely -- it doesn't even make the code slower, except for worse cache (in this case it doesn't matter since it's small enough), since it just renames the register and doesn't use any port. (nevermind that it should be xor edx, edx -- one byte less to decode, better cache utilization). In general avoid the r* registers unless you absolutely need them, since they require REX prefix, and all operations on e* registers zero-extend by design anyway (GCC even emits xor r8d, r8d for example, even if it also uses REX prefix anyway). This is why "int" is 32-bit in C on amd64, since it's what type promotion would use (almost all operations, except those using int64_t or pointers, are 32-bit).

Also add reg,1 is a stupid antiquated practice from the days of P4, please use inc/dec. The only time they will ever be slower is when the add would break the code (depending on carry flag being unchanged). All decent processors track flag bits individually, except P4. Intel's Compiler, obviously, emits inc/dec. Fairly sure they know their CPUs best.
Post 10 Aug 2017, 11:22
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 673
^
^

as usual, ALL TALK. No proof. Wait till that guy above me quoting something from Microsoft / Intel to make him looks "smart" and "important" while he himself remain largely clueless and probably takes 2 full threads to finally understand the simplest thing even a Gorilla can understand! xD
Post 10 Aug 2017, 11:36
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2005 phpBB Group.

Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2016, Tomasz Grysztar.