flat assembler
Message board for the users of flat assembler.

Index > Main > the register dance

Author
Thread Post new topic Reply to topic
sylware



Joined: 23 Oct 2020
Posts: 81
Location: Marseille/France
sylware
While coding on my app, instead of going mostly memory (heap and stack) for my variables, I wanted to try the following "policy": fill all the registers as much as possible, sparing some scratch ones (general purpose and vector) with some smart shuffling based on the usage of their content and the call/jump graph of the code "unit". Ofc, once I needed room in the register space I would use a bit of stack space. While filling the registers, I would "roughly" prioritize callee-saved registers (rbx rbp r12-r15), just in case a future C ABI external calls slips in, then reverse-order fill the argument passing registers (r9 r8 r10/rcx rdx rsi rdi, xmm7->xmm0), that excluding some scratch registers: I was ok with 3/4 general purpose registers, rax rcx r11 r9 and sometimes 4 vector registers, xmm7->4. Leaf code paths would use all argument passing registers as they see fit.
Ok, my app is memory bound then putting as many variables as possible in registers is not really worth it: loads/stores from memory related to those variables are supposed to be neglictible compared to the loads/stores of the bulk. But... training!
To do that, I source-written-tracked the content of the registers all along the call/jump graph.

And you guys? What is you register management policy?

P.S. I wish my laptop was recent enough to handle AVX2 like my workstation...
Post 25 Sep 2021, 23:40
View user's profile Send private message Reply with quote
sts-q



Joined: 29 Nov 2018
Posts: 44
sts-q
Even after years an unsolved open question! Confused

I like programming in assembler. But there are two things i haven' t found a good solution for:
* when push to stack
* ( nested) if-then-else constructions

What i do is:
have low-level, mid-level and higher-level functions:
low-level uses: a b c d
mid-level uses: si di k v ( source index, destination index, key, value )
higher-level uses: tos sos bp ... and RAM
i do a lot of register renaming:
k equ r8
v equ r8
...

I think a better solution would be to do more paper-and-pencil-programming, that is first under stand
what i would like to do, than hack it into the computer.

https://codeberg.org/sts-q/minal/src/branch/master/fasm/declarations.fasm
Post 26 Sep 2021, 04:47
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3307
Location: vpcmipstrm
bitRAKE
It's like an ebb-and-flow when I am trying to optimize register usage. Top->down and then bottom->up -- repeat. External APIs are like a boundary condition - it's very strict and much of the state afterward is undefined, and so we can work backward from the API as well as starts over when it is complete.

I too like to use register renaming as well as memory renaming with VIRTUAL blocks.

Macros can be scaled (i.e. called by other macros) if all the operands are constant or parametric. These type of macros are quite universal, whereas using registers or memory directly in a macro restricts it's utility.

Another pattern is to code for both positive and negative logic as a type of control flow optimization. Sometimes the code works out better with the polarity reversed. It's possible to encapsulate this within a macro as well (contrived example):
Code:
macro PARITY? true,false,regmem*,bits
        assert 0 < bits & bits < 9
        test regmem,(1 shl bits) - 1
        match ,true
                jpo false
        else match ,false
                jpe true
        else
                err
        end match
end macro

macro ODD? even*,regmem*
        PARITY even,<>,regmem,1
end macro
macro EVEN? odd*,regmem*
        PARITY <>,odd,regmem,1
end macro    
Post 26 Sep 2021, 05:57
View user's profile Send private message Visit poster's website Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 81
Location: Marseille/France
sylware
@sts-q This is roughly what I wrote down in the code, but for each significant code paths (usually following the call/jump graph). I did use the "define" instruction for tracking though, like "define node_p r12".

@bitRAKE yep, multi-pass seems kind of mandatory, and "external" calls are "expensive" if many registers are to be "saved". What I try to do, is to have a "first pass" kind of already trying to roughly optimize register usage, that with this "fill them all" policy.

I noticed too that I tend to avoid to use "call" and then go on the stack to save the RIP, I am more and more using a "link" register with jumps.

And I follow the paranoid "nops align everything" I saw in the HeavyThing assembly project.
Post 26 Sep 2021, 12:03
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18222
Location: In your JS exploiting you and your system
revolution
The "nops to align" can be beneficial, and it can be harmful also.

Be aware of your cache wastage when you start inserting nops. Some code segments might see an advantage, others might see a disadvantage. A lot of it is CPU dependent. On some CPUs it makes no difference, on others it can have an impact, either negative and/or positive and/or opposite from other CPUs.

IME putting nops everywhere is a bad strategy overall. I found it just made code larger and more cluttered, with any "benefits" so tiny to be unmeasurable. But I guess on some critical code there might be a use case that can see something useful.
Post 26 Sep 2021, 12:22
View user's profile Send private message Visit poster's website Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 81
Location: Marseille/France
sylware
@revolution the thing with the nops everything: I got my hands on AMD and Intel optimizing manuals, the "16bytes paragraph" seems really critical for both microarchitectures.
I did set up some RDTSC related macros, just in case. I may fool around to torture the alignment of my code to see if it has an obvious impact (on 2 CPUs, zen2 and oooold intel). To be fair, since this code is memory bound, I am not supposed to see anything significant though.

Back to how to track register usage: I did post already something about such of an software assistant. Now, it starts to look like it would be overkill since writing down the register usage at pertinent points in the code did feel like a better compromise.
Post 26 Sep 2021, 20:51
View user's profile Send private message Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 81
Location: Marseille/France
sylware
I have a pseudo mechanic way (namely it varies a lot based on the context) to use the general registers for variables: first the callee-saved regs, with rbp last (for rsp saving upon C ABI external calls), then the argument passing registers in reverse order. The basic scratch general registers would be rax, rcx and r11.
Post 25 Oct 2021, 20:58
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.