flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
revolution
In short, yes they are useful in the right situations. I doubt that the CPU makers would include them if the net benefit was zero, that just wouldn't make sense.
|
|||
![]() |
|
MazeGen
They have effect only on trace cache (NetBurst microarchitecture) what makes them useful only on Pentium 4 processors.
|
|||
![]() |
|
revolution
MazeGen wrote: They have effect only on trace cache (NetBurst microarchitecture) what makes them useful only on Pentium 4 processors. |
|||
![]() |
|
MazeGen
A Detailed Look Inside the Intel® NetBurst™ Micro-Architecture of the Intel Pentium® 4 Processor wrote: Branch hints are interpreted by the translation engine, and are used to assist branch prediction and trace construction I have also asked this Agner Fog by e-mail. He said: Quote: branch hints don't work in core 1 or core 2. |
|||
![]() |
|
revolution
So, in that case, I guess the question is will it harm the decoding for non-P4 CPUs?
If Intel removed the support for them in the Core1/2 then I imagine that they found there to be no benefit with the prediction engine used there. Seems kind of weird to remove it because if you predict wrongly then there is a major penalty to pay to recover. |
|||
![]() |
|
MazeGen
Well, I don't know the differences between P4 decoder and PM/Core decoder in deep details. I'm not sure how they affected BTB in P4, but I assume that there is no connection between them and BTB.
The answer should be in Agner's famous manuals. As I understand it, those prefixes took effect only shortly before the microops were stored into trace cache. Since there is no trace cache in PM/Core microarchitecture, they became obsolete. On non-P4 CPUs, they just make the jcc instruction code longer. It is similar to: Code: ds mov eax, ebx |
|||
![]() |
|
f0dder
revolution wrote: In short, yes they are useful in the right situations. I doubt that the CPU makers would include them if the net benefit was zero, that just wouldn't make sense. ![]() _________________ ![]() |
|||
![]() |
|
revolution
But loop has a different effect than dec/jnz doesn't it.
|
|||
![]() |
|
rugxulo
revolution wrote: But loop has a different effect than dec/jnz doesn't it. Flags? (That's not much.) What about "add si,1" and "inc si"? (x86-64 doesn't count) Or "lodsb" vs. "mov al,[si] ; inc si"? Or "mov eax,0" vs. "xor eax,eax" (and tom_tobias rises from his hibernation to inform us all ... heh) |
|||
![]() |
|
revolution
rugxulo wrote: Flags? (That's not much.) |
|||
![]() |
|
bitRAKE
Luckily DEC/INC don't effect the carry flag. So, in the most common flag case LOOP is still not needed. If CL is needed for a shift instruction then LOOP also looses it's usefulness. If we are talking size optimization then LOOP has that going for it. Or, if we consider the other LOOP instructions that also query the Z-Flag then LOOP could replace two branches.
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.