flat assembler
Message board for the users of flat assembler.
Index
> Main > register allocation strategies |
Author |
|
revolution 01 Oct 2019, 09:12
I think the Itanium idea of using a rotating pool of registers is good. Then you only need to spill to memory when needed. If there are enough registers available then you might be able to span three or four call levels deep before spilling values from further up the call chain.
The Itanium renaming was done in hardware so it could dynamically allocate the physical registers with perfect knowledge of the call chain at runtime. If we try to do that with macros at compile time then you'd have more restrictions because the call chain would be unknown at that time. But it is doable I guess. |
|||
01 Oct 2019, 09:12 |
|
redsock 01 Oct 2019, 09:32
Doesn't the Itanium idea of a rotating pool of registers bear similarities to the "internal x86_64" register renaming process (as in, physical register renaming/pooling?)
How then can that affect a procedure that willingly allocates space on the stack for its local variables and reads/writes to it ... surely the Itanium couldn't get rid of same... Or, are you saying that Itanium compiler optimisations were like the MIPSPro guys' version and did the same kinda thing? |
|||
01 Oct 2019, 09:32 |
|
revolution 01 Oct 2019, 09:39
The Itanium had a way to tell the CPU "this procedure uses ten registers" and the CPU would spill the ten eldest registers to memory (if needed) and allocate your procedure those ten registers. Within the procedure you would use a fixed set of names, say a0-a9 or whatever, and those would then be invisibly mapped to whatever the CPU had allocated to you.
Each time you enter the procedure you could get a completely different set of physical registers made available. It depended upon where in the call chain it was currently at. This mechanism was separate from the register renaming we have in the current x86 CPUs. And it could be used in addition to it. |
|||
01 Oct 2019, 09:39 |
|
redsock 01 Oct 2019, 09:43
I had the pleasure of my then-C/C++ code running on Itanium, and it was truly incredible given its relatively lower clockrates ... when compared to the MIPS64 architecture that preceded it, it was also very cool.
I don't feel the same "interconnect" between programming level and CPU with x86 ... the enter/leave instructions (does -anyone- use them still?) were the only real effort toward that, no? |
|||
01 Oct 2019, 09:43 |
|
revolution 01 Oct 2019, 09:48
It is not the same as enter/leave. It wasn't designed to allow access to values from higher in the call chain.
And, I don't think anyone could figure out how to use enter to its full potential. It is just too weird. |
|||
01 Oct 2019, 09:48 |
|
redsock 01 Oct 2019, 10:09
Any recollection of how this Itanium magic actually worked? Was it a specialised instruction that said in advance "here are how many local registers I wish to use?" ... While I totally get the idea, trying to imagine how it physically worked is mysterious ... can this kinda thing be simulated externally in a meaningful way? (would satisfy my initial question of whether a macro-fasmg-madness thing might work)
|
|||
01 Oct 2019, 10:09 |
|
revolution 01 Oct 2019, 10:39
I don't know the internals but it would seem to be relatively simple. Just add an offset to the register numbers, this would be the start point for the procedure's registers. And keep an internal value of which registers are currently spilled and where the head of the queue is.
As for the instruction. I don't remember. But it was kind of like the ENTER where you specify various things you want the CPU to do for you. Note that it is designed to decrease the spill/restore overhead. Whereas the x86 renaming is designed to break the dependency chains. Two different goals. You can't do this at compile time unless all procedures are always called in the same way from the same points. Then you could statically allocate. But for any non-trivial program such things can't be done at compile time with such finesse as the CPU could do it. |
|||
01 Oct 2019, 10:39 |
|
st 01 Oct 2019, 11:56
I guess Itanium's register file is somehow similar to x87 FPU stack. Besides it is bigger, provides flexible access and automatically swaps out to RAM when it is full. At least MCST Elbrus (which is also VLIW) works this way.
|
|||
01 Oct 2019, 11:56 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.