flat assembler
Message board for the users of flat assembler.

Index > Main > register allocation strategies

Author
Thread Post new topic Reply to topic
redsock



Joined: 09 Oct 2009
Posts: 430
Location: Australia
redsock 01 Oct 2019, 07:35
Do we have any compiler devs lurking here I wonder?

Back in the "olden days", I used to use a lot of SGI IRIX machines in production, and one of the cool features of the MIPSPro C++ Compiler back then was that it did this thing called "Interprocedural Optimisation" ... -IPO or something from memory was the compiler option that enabled it. My understanding then (and now) is that this mode of the compiler brought in all of the shared library functions, static functions, and of course static local functions, and then excepting the main entry point, would happily completely disregard the ABI, and figure out leaf function register allocation, and optimise to minimise stack usage for function locals.

AFAIK there are similar options in modern day tooling, though I don't think any of them offer to violate the shared library spec and bring them in (happy to be corrected here).

This is one of the easier optimisations to do when looking at a program: determine actual call traces, leaf functions, and sprinkle register allocation correctly to minimise push/mov [rsp]/etc (because even though they are on the stack, they still carry normal memory access penalties in most cases).

So, two questions based on this line of thinking:

1) Would it be theoretically possible for a -macro- or some madness in fasmg that we could do semi-automatic register allocation based on call depth/leaf functions/etc? (this is an unfinished but 'awe-inspiring' rabbit hole IMO) haha

and 2) When you design a program (in assembly language), do you adhere to the ABI (as I mostly did with my HeavyThing library) so that it can interop with other languages/etc, or do you completely ignore the ABI and do your own thing (parameter passing, returns, etc)

Your commentary is most appreciated Smile

_________________
2 Ton Digital - https://2ton.com.au/
Post 01 Oct 2019, 07:35
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20335
Location: In your JS exploiting you and your system
revolution 01 Oct 2019, 09:12
I think the Itanium idea of using a rotating pool of registers is good. Then you only need to spill to memory when needed. If there are enough registers available then you might be able to span three or four call levels deep before spilling values from further up the call chain.

The Itanium renaming was done in hardware so it could dynamically allocate the physical registers with perfect knowledge of the call chain at runtime. If we try to do that with macros at compile time then you'd have more restrictions because the call chain would be unknown at that time. But it is doable I guess.
Post 01 Oct 2019, 09:12
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 430
Location: Australia
redsock 01 Oct 2019, 09:32
Doesn't the Itanium idea of a rotating pool of registers bear similarities to the "internal x86_64" register renaming process (as in, physical register renaming/pooling?)

How then can that affect a procedure that willingly allocates space on the stack for its local variables and reads/writes to it ... surely the Itanium couldn't get rid of same...

Or, are you saying that Itanium compiler optimisations were like the MIPSPro guys' version and did the same kinda thing?

_________________
2 Ton Digital - https://2ton.com.au/
Post 01 Oct 2019, 09:32
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20335
Location: In your JS exploiting you and your system
revolution 01 Oct 2019, 09:39
The Itanium had a way to tell the CPU "this procedure uses ten registers" and the CPU would spill the ten eldest registers to memory (if needed) and allocate your procedure those ten registers. Within the procedure you would use a fixed set of names, say a0-a9 or whatever, and those would then be invisibly mapped to whatever the CPU had allocated to you.

Each time you enter the procedure you could get a completely different set of physical registers made available. It depended upon where in the call chain it was currently at.

This mechanism was separate from the register renaming we have in the current x86 CPUs. And it could be used in addition to it.
Post 01 Oct 2019, 09:39
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 430
Location: Australia
redsock 01 Oct 2019, 09:43
I had the pleasure of my then-C/C++ code running on Itanium, and it was truly incredible given its relatively lower clockrates ... when compared to the MIPS64 architecture that preceded it, it was also very cool.

I don't feel the same "interconnect" between programming level and CPU with x86 ... the enter/leave instructions (does -anyone- use them still?) were the only real effort toward that, no?

_________________
2 Ton Digital - https://2ton.com.au/
Post 01 Oct 2019, 09:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20335
Location: In your JS exploiting you and your system
revolution 01 Oct 2019, 09:48
It is not the same as enter/leave. It wasn't designed to allow access to values from higher in the call chain.

And, I don't think anyone could figure out how to use enter to its full potential. It is just too weird.
Post 01 Oct 2019, 09:48
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 430
Location: Australia
redsock 01 Oct 2019, 10:09
Any recollection of how this Itanium magic actually worked? Was it a specialised instruction that said in advance "here are how many local registers I wish to use?" ... While I totally get the idea, trying to imagine how it physically worked is mysterious ... can this kinda thing be simulated externally in a meaningful way? (would satisfy my initial question of whether a macro-fasmg-madness thing might work)

_________________
2 Ton Digital - https://2ton.com.au/
Post 01 Oct 2019, 10:09
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20335
Location: In your JS exploiting you and your system
revolution 01 Oct 2019, 10:39
I don't know the internals but it would seem to be relatively simple. Just add an offset to the register numbers, this would be the start point for the procedure's registers. And keep an internal value of which registers are currently spilled and where the head of the queue is.

As for the instruction. I don't remember. But it was kind of like the ENTER where you specify various things you want the CPU to do for you.

Note that it is designed to decrease the spill/restore overhead. Whereas the x86 renaming is designed to break the dependency chains. Two different goals.

You can't do this at compile time unless all procedures are always called in the same way from the same points. Then you could statically allocate. But for any non-trivial program such things can't be done at compile time with such finesse as the CPU could do it.
Post 01 Oct 2019, 10:39
View user's profile Send private message Visit poster's website Reply with quote
st



Joined: 12 Jul 2019
Posts: 49
Location: Russia
st 01 Oct 2019, 11:56
I guess Itanium's register file is somehow similar to x87 FPU stack. Besides it is bigger, provides flexible access and automatically swaps out to RAM when it is full. At least MCST Elbrus (which is also VLIW) works this way.
Post 01 Oct 2019, 11:56
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.