flat assembler
Message board for the users of flat assembler.
Index
> Main > instruction memory bandwidth |
| Author |
|
|
revolution 10 Apr 2026, 08:40
What does "instruction memory bandwidth" mean? Is it the overhead of reading instruction bytes into the CPU from memory?
Unrolling loops, and very long sections of code without branches, can render the caches useless. Caches are only useful when the data is read more than once, so the CPU is always reading from DRAM. I imagine that will create huge slowdowns. AKA cache thrashing. |
|||
|
|
sylware 11 Apr 2026, 08:32
I meant real life benchmarks which show a significant speep impact of code density for modern ISAs.
Is that even a thing? |
|||
|
|
revolution 11 Apr 2026, 08:42
sylware wrote: Is that even a thing? |
|||
|
|
bitRAKE 11 Apr 2026, 10:36
I'm confused as well. "code density" seems like you want to benchmark the decoder. Compiled code isn't very dense. Size optimized code can be very dense.
Usually what you want to do is pick some part of the pipeline you want to test and then design or search for a test that does that -- it probably exists. |
|||
|
|
sylware 11 Apr 2026, 17:34
It translates to this question: On modern CPU, can memory bandwidth can be an issue for feeding instructions to the CPU? That in real life use cases.
|
|||
|
|
revolution 11 Apr 2026, 21:37
Isn't the first reply the answer? If not then there needs to be more clarity on exactly what is meant to be measured and what criteria are used to decide what is an issue.
|
|||
|
|
sylware 12 Apr 2026, 01:50
Basically, the CPU stalls because it cannot load from memory fast enough the machine instructions, that in real life use cases.
(risc-v and arm have "compressed" and "thumb" machine instructions, but I am really not convince it makes a performance difference on non-niche hardware) |
|||
|
|
bitRAKE 12 Apr 2026, 02:01
sylware wrote: On modern CPU, can memory bandwidth can be an issue for feeding instructions to the CPU? That in real life use cases. Code: use64 align 64 shr eax, 1 jz @F jnc .even lea eax, [3*rax+2] .even: imul edx, eax, 64 add rdx, rbx ; memory base jmp rdx @@: retn Edit: actually, the trajectories seem too sparse for a given bit length - a more complex attenuation of the signal would be needed, or multiple trajectories is easier. _________________ ¯\(°_o)/¯ AI may [not] have aided with the above reply. Last edited by bitRAKE on 12 Apr 2026, 02:26; edited 4 times in total |
|||
|
|
revolution 12 Apr 2026, 02:08
ARM Thumb (16-bit) instructions are generally slower than the full length 32-bit instructions for a given task. Thumb is more constrained and requires more instructions to complete a task. Thumb wasn't designed for performance though, it was intended for memory constrained systems. So any performance comparison is flawed because it was never the intent to be a performance advantage.
x86 has single byte instructions, but a single byte can only encode a small amount of detail, and thus those instructions are limited in scope. Adding more bytes gives more expressibility, reducing instruction counts, but increases byte counts. There is a trade-off. Where should that trade-off be made? Depends upon the task. |
|||
|
|
revolution 12 Apr 2026, 02:12
bitRAKE wrote: ... the Collatz (3x+1) ... always reach one ... |
|||
|
|
bitRAKE 12 Apr 2026, 02:25
pcbarina.fit.vutbr.cz wrote: 2025-01-15 the convergence of all numbers below 2^71 is verified (Until the processor manufacturer starts copying benchmark methodologies.) |
|||
|
|
revolution 12 Apr 2026, 05:29
I am reminded of the Itanium EPIC instruction encoding: 128 bits per "bundle".
It was very powerful and could potentially be quite compact. But it turned out to be too complex and too hard for the compilers to make good code. Maybe 32-bits/instruction is the sweet spot? Many RISC encodings use 32-bits. For non-performance applications the field is much more diverse. Especially the older Z80, 6502, etc. and the current PIC (8, 10, 12, 14, 16 bit) and AVR (16 bit). |
|||
|
|
sylware 12 Apr 2026, 10:25
So, this is what I expected, those benchmarks do not exist, memory bandwidth is hardly a thing while dealing with CPU machine instructions.
While I am thinking of it, what about the memory alignment of the machine instruction fetch window? Because, if I am not too much mistaken, modern CPUs are using this instruction fetch window for their own optimizations (branch prediction, etc). Maybe it is worth a forum thread on its own as it is what matters in the end? |
|||
|
|
revolution 12 Apr 2026, 10:58
Benchmarks are useless anyway, so I don't think it matters much.
|
|||
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2026, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.