flat assembler
Message board for the users of flat assembler.
Index
> Main > Branch prediction prefixes, are they of any use? |
Author |
|
revolution 13 Feb 2009, 11:44
In short, yes they are useful in the right situations. I doubt that the CPU makers would include them if the net benefit was zero, that just wouldn't make sense.
|
|||
13 Feb 2009, 11:44 |
|
MazeGen 13 Feb 2009, 11:53
They have effect only on trace cache (NetBurst microarchitecture) what makes them useful only on Pentium 4 processors.
|
|||
13 Feb 2009, 11:53 |
|
revolution 13 Feb 2009, 11:59
MazeGen wrote: They have effect only on trace cache (NetBurst microarchitecture) what makes them useful only on Pentium 4 processors. |
|||
13 Feb 2009, 11:59 |
|
MazeGen 13 Feb 2009, 13:24
A Detailed Look Inside the Intel® NetBurst™ Micro-Architecture of the Intel Pentium® 4 Processor wrote: Branch hints are interpreted by the translation engine, and are used to assist branch prediction and trace construction I have also asked this Agner Fog by e-mail. He said: Quote: branch hints don't work in core 1 or core 2. |
|||
13 Feb 2009, 13:24 |
|
revolution 13 Feb 2009, 13:49
So, in that case, I guess the question is will it harm the decoding for non-P4 CPUs?
If Intel removed the support for them in the Core1/2 then I imagine that they found there to be no benefit with the prediction engine used there. Seems kind of weird to remove it because if you predict wrongly then there is a major penalty to pay to recover. |
|||
13 Feb 2009, 13:49 |
|
MazeGen 13 Feb 2009, 15:02
Well, I don't know the differences between P4 decoder and PM/Core decoder in deep details. I'm not sure how they affected BTB in P4, but I assume that there is no connection between them and BTB.
The answer should be in Agner's famous manuals. As I understand it, those prefixes took effect only shortly before the microops were stored into trace cache. Since there is no trace cache in PM/Core microarchitecture, they became obsolete. On non-P4 CPUs, they just make the jcc instruction code longer. It is similar to: Code: ds mov eax, ebx |
|||
13 Feb 2009, 15:02 |
|
f0dder 15 Feb 2009, 01:32
revolution wrote: In short, yes they are useful in the right situations. I doubt that the CPU makers would include them if the net benefit was zero, that just wouldn't make sense. _________________ - carpe noctem |
|||
15 Feb 2009, 01:32 |
|
revolution 15 Feb 2009, 01:50
But loop has a different effect than dec/jnz doesn't it.
|
|||
15 Feb 2009, 01:50 |
|
rugxulo 18 Feb 2009, 22:36
revolution wrote: But loop has a different effect than dec/jnz doesn't it. Flags? (That's not much.) What about "add si,1" and "inc si"? (x86-64 doesn't count) Or "lodsb" vs. "mov al,[si] ; inc si"? Or "mov eax,0" vs. "xor eax,eax" (and tom_tobias rises from his hibernation to inform us all ... heh) |
|||
18 Feb 2009, 22:36 |
|
revolution 19 Feb 2009, 02:39
rugxulo wrote: Flags? (That's not much.) |
|||
19 Feb 2009, 02:39 |
|
bitRAKE 19 Feb 2009, 03:17
Luckily DEC/INC don't effect the carry flag. So, in the most common flag case LOOP is still not needed. If CL is needed for a shift instruction then LOOP also looses it's usefulness. If we are talking size optimization then LOOP has that going for it. Or, if we consider the other LOOP instructions that also query the Z-Flag then LOOP could replace two branches.
|
|||
19 Feb 2009, 03:17 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.