flat assembler
Message board for the users of flat assembler.

Index > Main > Intel plans doubling 16 general purpose registers to 32

Goto page 1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
Feryno



Joined: 23 Mar 2005
Posts: 508
Location: Czech republic, Slovak republic
Feryno 28 Jul 2023, 10:54
Post 28 Jul 2023, 10:54
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2453
Furs 28 Jul 2023, 11:06
Doubling for 10% less loads. Sounds like typical bloat with such diminishing returns. Dumb.

Let's not forget the instruction encodings will be much larger with these registers, so it will incur more loads on instructions / cache.
Post 28 Jul 2023, 11:06
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20142
Location: In your JS exploiting you and your system
revolution 28 Jul 2023, 11:13
This is a win with no loss.

The extra REX2 prefix uses 0xd5 (legacy instruction AAD) so does not impact existing x64 instructions. AAD has always been invalid in 64-bit code, so now it will be repurposed to serve as REX2.

If you can find a benefit to using 32 registers in your code then use REX2, else if you can find no benefit you can ignore REX2 and continue as normal.
Post 28 Jul 2023, 11:13
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3946
Location: vpcmipstrm
bitRAKE 28 Jul 2023, 14:36
Absolutely, the internal registers already exceed 16. This is just a way to remove loads and control renaming. Total win. (Compilers can rejoice, lol.)
Post 28 Jul 2023, 14:36
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 29 Jul 2023, 03:43
Any guesses as to whether r15-31 will be volatile or non-volatile in the major ABIs?
Post 29 Jul 2023, 03:43
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3946
Location: vpcmipstrm
bitRAKE 29 Jul 2023, 06:50
With the SIMD registers YMM6+ and ZMM, Microsoft has made them volatile (non-preserved), but for general purpose registers R12+ they are non-volatile. I'm going to wager R16-R31 will be non-volatile as well. As wonky as the Windows x64 ABI is already - it's a very small wager. Seeing some sort of split would not surprise me one bit.

Isn't there some sort of systems theory which favors balanced resource on both sides of the caller/callee - assuming infinite layers to the system. Feels like something like that should exist. There isn't infinite layers though - not every call is deep-dive. Interesting to think about.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 29 Jul 2023, 06:50
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1588
Location: Toronto, Canada
AsmGuru62 29 Jul 2023, 12:12
Question: these additional registers will be available only for x64 coding?
Post 29 Jul 2023, 12:12
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20142
Location: In your JS exploiting you and your system
revolution 29 Jul 2023, 12:48
AsmGuru62 wrote:
Question: these additional registers will be available only for x64 coding?
See the encoding of REX2 (0xd5), it is impossible to encode in 16-bit or 32-bit. You will get AAD instead.

Same for REX, you get INC and DEC instead.

Don't expect any new instructions or registers for 32-bit code. The future is 64-bit only, apparently.
Post 29 Jul 2023, 12:48
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1588
Location: Toronto, Canada
AsmGuru62 29 Jul 2023, 13:12
Thanks.
I wonder about the compatibility.
There are a lot of x64 CPUs which do not have these registers, so lets say I code an app for something.
I want it to run on old CPUs (no registers R16-R31) and on new ones, so I need to detect the CPU and then code some (complex) functions in two 'incarnations'.
I guess, this is the way to go.
Should not be very hard -- make the older CPU version first, where some variables are in local memory (stack).
Then just copy/paste and replace the locals with R16-R31.
Seems reasonable.
Post 29 Jul 2023, 13:12
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20142
Location: In your JS exploiting you and your system
revolution 29 Jul 2023, 13:17
The only time you need to detect anything is if you want to use R16-R31. If you don't use them then there is nothing for you to do, it will run as normal.
Post 29 Jul 2023, 13:17
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3946
Location: vpcmipstrm
bitRAKE 29 Jul 2023, 14:35
bitRAKE wrote:
With the SIMD registers YMM6+ and ZMM, Microsoft has made them volatile (non-preserved), but for general purpose registers R12+ they are non-volatile. I'm going to wager R16-R31 will be non-volatile as well. As wonky as the Windows x64 ABI is already - it's a very small wager. Seeing some sort of split would not surprise me one bit.
The bottom-up thinking would wonder why preserve additional state for context switches? R16-R31 should be volatile for this reason alone.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 29 Jul 2023, 14:35
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20142
Location: In your JS exploiting you and your system
revolution 29 Jul 2023, 14:42
bitRAKE wrote:
... why preserve additional state for context switches?
If you don't then using R16+ becomes unreliable (and unusable). Context switches are the one place where you have to preserve everything.
Post 29 Jul 2023, 14:42
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 660
Ali.Z 29 Jul 2023, 15:11
bitRAKE wrote:
I'm going to wager R16-R31 will be non-volatile as well. As wonky as the Windows x64 ABI is already - it's a very small wager. Seeing some sort of split would not surprise me one bit.


based on current x64 abi, it is likely they gonna eat most registers, and keep only few for the application itself; this covers both unix, unix-like and windows.
they tend to be very aggressive in their side.

but it is still a win for both sides, except care should be taken for future software that want their thing to run on non-REX2 CPU; for me nothing will change, I still use 32-bit.

_________________
Asm For Wise Humans
Post 29 Jul 2023, 15:11
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2453
Furs 29 Jul 2023, 17:02
revolution wrote:
This is a win with no loss.

The extra REX2 prefix uses 0xd5 (legacy instruction AAD) so does not impact existing x64 instructions. AAD has always been invalid in 64-bit code, so now it will be repurposed to serve as REX2.
That's 1 extra byte already, and then you have encoding the registers themselves which uses more bits.

Sure, loads take offset to encode as well (from rbp or rsp), but they said 10% loads versus doubling the regs. So smells like bloat to me.
Post 29 Jul 2023, 17:02
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2453
Furs 29 Jul 2023, 17:04
bitRAKE wrote:
Absolutely, the internal registers already exceed 16. This is just a way to remove loads and control renaming. Total win. (Compilers can rejoice, lol.)
"Total win"? Why not give out 128 addressable registers then?

Think carefully for the answer to that question.

"Total win" means literally no downsides. Longer encodings is a downside.
Post 29 Jul 2023, 17:04
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20142
Location: In your JS exploiting you and your system
revolution 29 Jul 2023, 17:58
Furs wrote:
Sure, loads take offset to encode as well (from rbp or rsp), but they said 10% loads versus doubling the regs. So smells like bloat to me.
That is a consequence of the diminishing returns of providing more registers. You don't get half the loads by doubling registers. This applies regardless of how you encode the instructions. All architectures experience this same effect.

Going from 1 reg to 2 regs, gives a great boost. Going from 2 to 4 a good boost. From 4 to 8 a moderate boost. etc. ... from 1G regs to 2G regs you get effectively to zero benefit (and probably a big loss from all the overheads).
Post 29 Jul 2023, 17:58
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3946
Location: vpcmipstrm
bitRAKE 29 Jul 2023, 19:20
revolution wrote:
bitRAKE wrote:
... why preserve additional state for context switches?
If you don't then using R16+ becomes unreliable (and unusable). Context switches are the one place where you have to preserve everything.
Too bad they don't implement something like CR0.TS (Task Switched) flag for these other registers.
Intel wrote:
We propose to define the new GPRs as caller-saved (volatile) state in application binary interfaces (ABIs), facilitating interoperability with legacy binaries.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 29 Jul 2023, 19:20
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3946
Location: vpcmipstrm
bitRAKE 29 Jul 2023, 19:56
PUSH2/POP2 require the stack to be 0mod16. Longer encoding.

SETcc.zu is usually what I'm after anyhow. All the different Boolean types are silly. MS BOOL is 32-bit, HRESULT S_OK = 0, S_FALSE = 1, 32-bit, C/C++ bool is 8-bit. Using the flags directly is still a better option.

Fixing CMOVcc memory access is nice, revolution has mentioned this before a couple times.

Is there really a need for JMPABS?

The three operand forms are quite a large benefit.
(Intel calls it "new data destination (NDD)".)

* This might accelerate my move to FASM2 - needing more control over instruction encoding.


Last edited by bitRAKE on 29 Jul 2023, 20:25; edited 3 times in total
Post 29 Jul 2023, 19:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20142
Location: In your JS exploiting you and your system
revolution 29 Jul 2023, 20:20
bitRAKE wrote:
Fixing CMOVcc memory access is nice, revolution has mentioned this before a couple times.
Ya. And it only took a few decades to fix.
Post 29 Jul 2023, 20:20
View user's profile Send private message Visit poster's website Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 422
Location: Marseille/France
sylware 29 Jul 2023, 21:28
CMOVcc memory access fixed? You mean it won't segfault when pointing on invalid memory even though it is not supposed to be executed (speculation fix)?

And seriously, I am thinking about the assembly code I wrote lately, and doing that with 32 regs... yeah... it does turn me on.

But my eyes are now looking at 64bits risc-v...
Post 29 Jul 2023, 21:28
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.