flat assembler
Message board for the users of flat assembler.
Index
> Main > speculative execution and unconditional branches |
| Author |
|
|
sylware 01 Mar 2026, 12:16
It is the follow up of my previous thread:
In the general case, does speculative execution have "barriers" (like a "too close" second conditional branch?)? And as barriers I am talking about unconditional branches. what does happen when a machine instruction which is speculatively executed is a unconditional branch? Will the CPU speculatively execute it, that which could implied some serious work if the branch is far away with cache loading, new instruction fetch window, etc, aka there is more than pipeline flushing like cache loading? I ask that because orchestrating code to please static branch prediction involves much more unconditional branches in some specific cases, for instance while checking the return value of a external call/syscall: basically, I would have to put the return value check code before the external call/syscall. (AMD has a manual for software developers in order to be friendly with speculative execution, but I cannot download it https://www.amd.com/system/files/documents/software-techniques-for-managing-speculation.pdf) (XXX: This is unrelated to speculative execution hardware vulnerabilities, aka not "SLS") EDIT: After a lot of reading, it seems on large CPU imlementations, the code which is the target of an unconditional branch gets speculatively executed with everything which is involved: cache loading, prefetching, etc |
|||
|
|
Mike Gonta 01 Mar 2026, 14:46
sylware wrote: (AMD has a manual for software developers in order to be friendly with speculative execution https://web.archive.org/web/20230127145939/https://www.amd.com/system/files/documents/software-techniques-for-managing-speculation.pdf |
|||
|
|
sylware 01 Mar 2026, 16:15
Mike Gonta wrote:
Thx! Unfortunately this document is related only to the mitigations of the bazillions of speculative execution exploits. I found more on the web (noscript/basic (x)html please): why the static branch prediction is this one, and I got even a recent verilog implementation of a new branch predictor dwarfing the efficiency of the current neural branch prediction that on "common" code (on _their_ corpus of common code, and for RISC-V 32bits processors). Now, I am looking on how much address space the BHT&BTB do span (I guess the instruction cache size or a code prefetch window, but I am probably wrong). This is what I got for Zen4: "Each BTB entry can hold up to two branches if the branches reside in the same 64-byte aligned cache line and the first branch is a conditional branch." "... branch targets tracked when branches are spaced by 8 bytes." |
|||
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2026, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.