flat assembler
Message board for the users of flat assembler.
Index
> Non-x86 architectures > RISC-V is still gaining momentum. |
Author |
|
al_Fazline 05 Jan 2023, 22:44
I suspect that answer is going to be the same, to use qemu-usermode.
|
|||
05 Jan 2023, 22:44 |
|
revolution 06 Jan 2023, 03:47
It feels weird to me to have an assembly architecture without a carry or other flags. For me those things have always been a defining advantage over HLLs like C that omit them.
My feeling might have been made worse by the omission of almost all conditional instructions. Only the lonely branch can be executed conditionally, and then only by directly comparing two values. So the branch predictor needs to be really good to keep performance high. I like the extensible-by-design nature of the architecture. Very forward thinking. It is unlikely anyone will run out of space to add more instructions. The truly orthogonal register/instruction layout gives a lot of flexibility to programs to optimise some cases. But the recommend ABI specifies fixed registers, and thus somewhat nullifies that advantage. Hardware features will probably be optimised around the ABI. For example a hidden return stack will probably only activate for the recommended x1/x5 usage. I am more excited by the prospect of lower power usage. With the simplified ISA the amount of logic required is markedly reduced. Almost like the original version of the ARM chips. I wonder if RISC-V will also evolve to become bloated just like x86 and ARM have. It almost feels like if a product is not continually changing and becoming more complex people think it is dead. |
|||
06 Jan 2023, 03:47 |
|
sylware 06 Jan 2023, 12:51
If "ppl" think a "finished" product is a "dead" product... the pb lies with the "ppl": innovation is often brutal planned obsolescence, sometimes it is really hard to make those apart.
If I understood well, without the "carry flag", there is a lot of design complexity literaly trashed for "out-of-order" CPUs. Namely, from a assembly programmer point of view, some RISC-V design tradeoffs will feel off, and the real reasons lie deep in advanced CPU design. And RISC-V is mostly an "average" modern ISA, but without any toxic IP you can have with x86_64/arm, and that worldwide. ABIs, whatever the architecture is a pain, but I know I'll breath a lot better with twice the amount of registers. In my everyday coding, 16 registers is often too "short" by a few ones, and I don't like spilling to the stack. And I started to have some common code paths not at all using the ABI, namely using a "link" register instead of a call/ret. |
|||
06 Jan 2023, 12:51 |
|
Furs 06 Jan 2023, 13:54
I don't really get its call/ret implementation.
|
|||
06 Jan 2023, 13:54 |
|
revolution 06 Jan 2023, 14:06
Furs wrote: I don't really get its call/ret implementation. Ret: Jump to the return address saved in the register. If you need nested calls then you need to save the return address before calling again. Or if you don't care about standard ABIs, you can save to a new register. Eventually you will run out of registers, but perhaps you could go 30 deep into a call chain before needing to save to stack. Depends upon what you are doing. |
|||
06 Jan 2023, 14:06 |
|
DimonSoft 06 Jan 2023, 22:13
But since at least a few registers might be needed to even decide if the next level of nested calls is necessary, the real maximum feels closer to 15 or even less for real-world code.
BTW, the register-based return address treatment, although quite popular in non-x86 ISAs, reminds me of a “great” feature of C-based languages which inherit its stupid string literals and their escaping. Like, y’know, we can’t just have one special character, the double quotation mark " and make it escape itself: Code: "This ""text"" contains self-escaped quotes" Code: "\\\\?\\some\\long\\path\\with\\file\name\\at\\the.end" We can’t just save the return address to the stack, we should introduce another place to save it. So that you will save it onto the stack and be self-blamed for that, not us. No, having memory near the top of the stack cached all the time is not enough. All programs for our architecture are going to suffer from the call-ret penalties, the real calculations are ε→0 in the whole execution time. Why am I not surprised that C is among the frequently used programming languages for such architectures? |
|||
06 Jan 2023, 22:13 |
|
sylware 07 Jan 2023, 12:35
...
The code paths using a link register, which usually don't follow the ABI, is for code (usually leaf code) which is middle ground between "inline" code and "full blown ABI-ed" function (non static function from a C compilation unit). Basically, you still will have probably some register dance to fit that code from the "jumper"(aka the caller), which will be assembly... or you would have coded in your favorite C compiler an new "ABI" attribute for this code "calling" convention (ez, lol). BTW, the link register on x86_64 is usually r10 since syscall does clobber r11 and you don't want a callee-saved register from the full blow ABI. It the end, the caller may have to do less register/stack-spilling dance to fit this code, you may spare some work usually done by call/ret:store/load in the stack (L1 cache) and some rsp arithmetics. Yeah, "middle ground". I use _conservatively_ the C preprocessor, and sometimes I have several instances of those code paths with slightly different register usage (register renaming), it helps reduce the register dance from some the "jumpers"(callers) if it is a "middly" hot code path. |
|||
07 Jan 2023, 12:35 |
|
Furs 07 Jan 2023, 18:09
revolution wrote:
Sure, for leaf functions it helps, but majority of functions are not leaf, and they'll have to save the register themselves. Resulting in code bloat and more code cache issues. Sounds to me like the retarded 16-byte alignment requirement on x86_64 ABIs. Very few functions even make use of 128-bit SSE (but not even 256-bit AVX or more!), but then all functions pay the price of the alignment. Instead of, you know, letting those few functions that actually use 128-bit SSE re-align the stack themselves. Truly a tragedy. 256-bit AVX use functions already have to do it anyway, so if your program doesn't even use 128-bit SSE but only uses AVX+, the ABI alignment provides ZERO benefits. So not only is it bad for functions not using vectors at all, it's bad even for new functions using newer CPU vectors. It's even obsoleted already and just a pathetic bloat for 99.99999% cases. Maybe I'm missing the point of "call register" other than for leaf functions, but the 16-byte alignment has no excuse. Who the hell designs this shit? I hope he gets shot and ends up in hell. |
|||
07 Jan 2023, 18:09 |
|
sylware 07 Jan 2023, 20:08
ok, now I am lost, we are going in all directions.
That said, AVX-512 feels right, since 512bits is 64bytes hence a cache line. There may be a 16bytes aligned pick window, the branch predictor works on a cache line (from Zen optimization manual). Put that spinlock into a cache line! |
|||
07 Jan 2023, 20:08 |
|
revolution 08 Jan 2023, 00:13
Furs wrote: I see. I'm not sure if this is such a good idea honestly. In x86 we have a single instruction: Code: add [mem32], imm32 Code: mov reg1, imm32 and 0xffff or reg1, imm32 and 0xffff0000 mov reg2, mem32 and 0xffff or reg2, mem32 and 0xffff0000 mov reg3, [reg2] add reg3, reg1 mov [reg2], reg3 Welcome to the world of RISC. |
|||
08 Jan 2023, 00:13 |
|
revolution 08 Jan 2023, 01:32
Furs wrote: ... but majority of functions are not leaf, |
|||
08 Jan 2023, 01:32 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.