flat assembler
Message board for the users of flat assembler.

Index > Non-x86 architectures > RISC-V is still gaining momentum.

Author
Thread Post new topic Reply to topic
sylware



Joined: 23 Oct 2020
Posts: 462
Location: Marseille/France
sylware 05 Jan 2023, 11:18
We cannot predict the future, but RISC-V is only gaining momentum.

If everything goes well, fasmg will need a clean 64bits RISC-V port, or fasmg super macro language will need a "mathematical grade" specifications to keep x86_64 and RISC-V implementations in sync.
Post 05 Jan 2023, 11:18
View user's profile Send private message Reply with quote
al_Fazline



Joined: 24 Oct 2018
Posts: 54
al_Fazline 05 Jan 2023, 22:44
I suspect that answer is going to be the same, to use qemu-usermode.
Post 05 Jan 2023, 22:44
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 06 Jan 2023, 03:47
It feels weird to me to have an assembly architecture without a carry or other flags. For me those things have always been a defining advantage over HLLs like C that omit them.

My feeling might have been made worse by the omission of almost all conditional instructions. Only the lonely branch can be executed conditionally, and then only by directly comparing two values. So the branch predictor needs to be really good to keep performance high.

I like the extensible-by-design nature of the architecture. Very forward thinking. It is unlikely anyone will run out of space to add more instructions.

The truly orthogonal register/instruction layout gives a lot of flexibility to programs to optimise some cases. But the recommend ABI specifies fixed registers, and thus somewhat nullifies that advantage. Hardware features will probably be optimised around the ABI. For example a hidden return stack will probably only activate for the recommended x1/x5 usage.

I am more excited by the prospect of lower power usage. With the simplified ISA the amount of logic required is markedly reduced. Almost like the original version of the ARM chips. I wonder if RISC-V will also evolve to become bloated just like x86 and ARM have. It almost feels like if a product is not continually changing and becoming more complex people think it is dead.
Post 06 Jan 2023, 03:47
View user's profile Send private message Visit poster's website Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 462
Location: Marseille/France
sylware 06 Jan 2023, 12:51
If "ppl" think a "finished" product is a "dead" product... the pb lies with the "ppl": innovation is often brutal planned obsolescence, sometimes it is really hard to make those apart.

If I understood well, without the "carry flag", there is a lot of design complexity literaly trashed for "out-of-order" CPUs.

Namely, from a assembly programmer point of view, some RISC-V design tradeoffs will feel off, and the real reasons lie deep in advanced CPU design.

And RISC-V is mostly an "average" modern ISA, but without any toxic IP you can have with x86_64/arm, and that worldwide.

ABIs, whatever the architecture is a pain, but I know I'll breath a lot better with twice the amount of registers. In my everyday coding, 16 registers is often too "short" by a few ones, and I don't like spilling to the stack. And I started to have some common code paths not at all using the ABI, namely using a "link" register instead of a call/ret.
Post 06 Jan 2023, 12:51
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2568
Furs 06 Jan 2023, 13:54
I don't really get its call/ret implementation.
Post 06 Jan 2023, 13:54
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 06 Jan 2023, 14:06
Furs wrote:
I don't really get its call/ret implementation.
Call: You can copy the PC into a register and branch to a new address. This is done automatically by a single instruction.

Ret: Jump to the return address saved in the register.

If you need nested calls then you need to save the return address before calling again. Or if you don't care about standard ABIs, you can save to a new register. Eventually you will run out of registers, but perhaps you could go 30 deep into a call chain before needing to save to stack. Depends upon what you are doing.
Post 06 Jan 2023, 14:06
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 06 Jan 2023, 22:13
But since at least a few registers might be needed to even decide if the next level of nested calls is necessary, the real maximum feels closer to 15 or even less for real-world code.

BTW, the register-based return address treatment, although quite popular in non-x86 ISAs, reminds me of a “great” feature of C-based languages which inherit its stupid string literals and their escaping. Like, y’know, we can’t just have one special character, the double quotation mark " and make it escape itself:
Code:
"This ""text"" contains self-escaped quotes"    
We’ll better introduce another special character and put all our effort into also escaping this one:
Code:
"\\\\?\\some\\long\\path\\with\\file\name\\at\\the.end"    
See that two-line filename? Magnifique !

We can’t just save the return address to the stack, we should introduce another place to save it. So that you will save it onto the stack and be self-blamed for that, not us. No, having memory near the top of the stack cached all the time is not enough. All programs for our architecture are going to suffer from the call-ret penalties, the real calculations are ε→0 in the whole execution time. Why am I not surprised that C is among the frequently used programming languages for such architectures?
Post 06 Jan 2023, 22:13
View user's profile Send private message Visit poster's website Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 462
Location: Marseille/France
sylware 07 Jan 2023, 12:35
...

The code paths using a link register, which usually don't follow the ABI, is for code (usually leaf code) which is middle ground between "inline" code and "full blown ABI-ed" function (non static function from a C compilation unit).

Basically, you still will have probably some register dance to fit that code from the "jumper"(aka the caller), which will be assembly... or you would have coded in your favorite C compiler an new "ABI" attribute for this code "calling" convention (ez, lol).

BTW, the link register on x86_64 is usually r10 since syscall does clobber r11 and you don't want a callee-saved register from the full blow ABI.

It the end, the caller may have to do less register/stack-spilling dance to fit this code, you may spare some work usually done by call/ret:store/load in the stack (L1 cache) and some rsp arithmetics.

Yeah, "middle ground".

I use _conservatively_ the C preprocessor, and sometimes I have several instances of those code paths with slightly different register usage (register renaming), it helps reduce the register dance from some the "jumpers"(callers) if it is a "middly" hot code path.
Post 07 Jan 2023, 12:35
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2568
Furs 07 Jan 2023, 18:09
revolution wrote:
Furs wrote:
I don't really get its call/ret implementation.
Call: You can copy the PC into a register and branch to a new address. This is done automatically by a single instruction.

Ret: Jump to the return address saved in the register.

If you need nested calls then you need to save the return address before calling again. Or if you don't care about standard ABIs, you can save to a new register. Eventually you will run out of registers, but perhaps you could go 30 deep into a call chain before needing to save to stack. Depends upon what you are doing.
I see. I'm not sure if this is such a good idea honestly.

Sure, for leaf functions it helps, but majority of functions are not leaf, and they'll have to save the register themselves. Resulting in code bloat and more code cache issues.

Sounds to me like the retarded 16-byte alignment requirement on x86_64 ABIs. Very few functions even make use of 128-bit SSE (but not even 256-bit AVX or more!), but then all functions pay the price of the alignment.

Instead of, you know, letting those few functions that actually use 128-bit SSE re-align the stack themselves. Truly a tragedy.

256-bit AVX use functions already have to do it anyway, so if your program doesn't even use 128-bit SSE but only uses AVX+, the ABI alignment provides ZERO benefits. So not only is it bad for functions not using vectors at all, it's bad even for new functions using newer CPU vectors.

It's even obsoleted already and just a pathetic bloat for 99.99999% cases.

Maybe I'm missing the point of "call register" other than for leaf functions, but the 16-byte alignment has no excuse. Who the hell designs this shit? I hope he gets shot and ends up in hell.
Post 07 Jan 2023, 18:09
View user's profile Send private message Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 462
Location: Marseille/France
sylware 07 Jan 2023, 20:08
ok, now I am lost, we are going in all directions.

That said, AVX-512 feels right, since 512bits is 64bytes hence a cache line.

There may be a 16bytes aligned pick window, the branch predictor works on a cache line (from Zen optimization manual).

Put that spinlock into a cache line! Razz
Post 07 Jan 2023, 20:08
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 08 Jan 2023, 00:13
Furs wrote:
I see. I'm not sure if this is such a good idea honestly.

Sure, for leaf functions it helps, but majority of functions are not leaf, and they'll have to save the register themselves. Resulting in code bloat and more code cache issues.
It is RISC so as a general guiding principle the architecture won't do more than a minimum. So the code has to be more explicit about all the things it needs done. If you need to access the memory subsystem then don't expect a jump instruction to do that.

In x86 we have a single instruction:
Code:
add [mem32], imm32    
In RISC ISAs you need much more code. (In x86 style opcodes):
Code:
mov reg1, imm32 and 0xffff
or  reg1, imm32 and 0xffff0000
mov reg2, mem32 and 0xffff
or  reg2, mem32 and 0xffff0000
mov reg3, [reg2]
add reg3, reg1
mov [reg2], reg3    
The code gets even worse if you need atomic access. In x86 we just need to prefix with lock, in RISC ... it's complicated.

Welcome to the world of RISC. Smile
Post 08 Jan 2023, 00:13
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 08 Jan 2023, 01:32
Furs wrote:
... but majority of functions are not leaf,
Expanding on this. I guess you might be correct that the majority of written code is not leaf. But I think that this is reversed when you consider the majority of executed code likely will be leaf functions. So for runtime performance it can be a big win.
Post 08 Jan 2023, 01:32
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.