flat assembler
Message board for the users of flat assembler.

Index > Main > Branch prediction prefixes, are they of any use?

Author
Thread Post new topic Reply to topic
Plue



Joined: 15 Dec 2005
Posts: 151
Plue
Are the branch taken/not taken prefixes actually useful for speeding up code, or does the extra time used for instruction decoding outweigh the benefits? Are there certain situations where they should be used?

_________________
Roses are red
Violets are blue
Some poems rhyme
And some don't.
Post 13 Feb 2009, 11:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
In short, yes they are useful in the right situations. I doubt that the CPU makers would include them if the net benefit was zero, that just wouldn't make sense.
Post 13 Feb 2009, 11:44
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
They have effect only on trace cache (NetBurst microarchitecture) what makes them useful only on Pentium 4 processors.
Post 13 Feb 2009, 11:53
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
MazeGen wrote:
They have effect only on trace cache (NetBurst microarchitecture) what makes them useful only on Pentium 4 processors.
Erm, I thought they were for the BTB? Which is before the instruction gets to the decoder and well before the trace cache. It would seem to late for it to be useful in the trace cache.
Post 13 Feb 2009, 11:59
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
A Detailed Look Inside the Intel® NetBurst™ Micro-Architecture of the Intel Pentium® 4 Processor wrote:
Branch hints are interpreted by the translation engine, and are used to assist branch prediction and trace construction
hardware. They are only used at trace build time, and have no effect within already-built traces.

I have also asked this Agner Fog by e-mail. He said:
Quote:
branch hints don't work in core 1 or core 2.
core 1 is very similar to pentium m.
Post 13 Feb 2009, 13:24
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
So, in that case, I guess the question is will it harm the decoding for non-P4 CPUs?

If Intel removed the support for them in the Core1/2 then I imagine that they found there to be no benefit with the prediction engine used there. Seems kind of weird to remove it because if you predict wrongly then there is a major penalty to pay to recover.
Post 13 Feb 2009, 13:49
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
Well, I don't know the differences between P4 decoder and PM/Core decoder in deep details. I'm not sure how they affected BTB in P4, but I assume that there is no connection between them and BTB.

The answer should be in Agner's famous manuals.

As I understand it, those prefixes took effect only shortly before the microops were stored into trace cache. Since there is no trace cache in PM/Core microarchitecture, they became obsolete.

On non-P4 CPUs, they just make the jcc instruction code longer. It is similar to:
Code:
ds mov eax, ebx    
Post 13 Feb 2009, 15:02
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
revolution wrote:
In short, yes they are useful in the right situations. I doubt that the CPU makers would include them if the net benefit was zero, that just wouldn't make sense.
LOOP vs. DEC/JNZ? Smile

_________________
Image - carpe noctem
Post 15 Feb 2009, 01:32
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
But loop has a different effect than dec/jnz doesn't it.
Post 15 Feb 2009, 01:50
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
revolution wrote:
But loop has a different effect than dec/jnz doesn't it.


Flags? (That's not much.)

What about "add si,1" and "inc si"? (x86-64 doesn't count)

Or "lodsb" vs. "mov al,[si] ; inc si"?

Or "mov eax,0" vs. "xor eax,eax" (and tom_tobias rises from his hibernation to inform us all ... heh)
Post 18 Feb 2009, 22:36
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
rugxulo wrote:
Flags? (That's not much.)
Enough of a difference to make it worth an entire extra set of instructions.
Post 19 Feb 2009, 02:39
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
Luckily DEC/INC don't effect the carry flag. So, in the most common flag case LOOP is still not needed. If CL is needed for a shift instruction then LOOP also looses it's usefulness. If we are talking size optimization then LOOP has that going for it. Or, if we consider the other LOOP instructions that also query the Z-Flag then LOOP could replace two branches.
Post 19 Feb 2009, 03:17
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.