flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2, 3 |
Author |
|
Roman 09 Apr 2021, 18:37
Why in 64 bit you not wondered rcx,rdx,r8,r9 for Call ?
And i propose for CallPrLoop regs pr0-pr15 and regLoop and regBreak. |
|||
![]() |
|
revolution 09 Apr 2021, 19:01
So we know what you are suggesting.
Now please show your design. I suggest you start with the instruction encoding. How will you encode the instructions? |
|||
![]() |
|
DimonSoft 09 Apr 2021, 19:08
Not for Call but for Microsoft x64 calling convention. I didn’t ask Microsoft about that. They chose to do so but they’re also known to make bad decisions from performance point of view quite often. We can’t say anything about performance for sure but the complications are quite severe. I’m not sure they’re worth the effort so I’m happy 32-bit applications are more “cross-platform” than 64-bit ones for now and for the next 10+ years.
I gave you explanation why register-based calling conventions might be slower in certain cases. Good caching capabilities really make messing around with registers to pass parameters less worth. |
|||
![]() |
|
revolution 09 Apr 2021, 19:23
From what I have seen in the past, the most important feature in a CPU that gives the most performance boost is ... cache.
Everything else might possibly give a few percent here or there, but the cache can easily give up to ten times the performance. You can test this yourself. Disable the cache and then run some code. Notice how it now runs really really slowly. Adding extra register banks? Meh, whatever. Increasing the clock 100Mhz? Meh, whatever. Buying 4000Mhz RAM? Meh, whatever. Disabling the cache? Don't you dare alter my caching! ![]() |
|||
![]() |
|
Furs 10 Apr 2021, 13:58
revolution wrote: From what I have seen in the past, the most important feature in a CPU that gives the most performance boost is ... cache. |
|||
![]() |
|
Roman 10 Apr 2021, 14:49
revolution you not afraid L1 cache 32 kb, but afraid more additional 18 regs.
This is very strange. L1 cache slow in 3 times than register. |
|||
![]() |
|
revolution 10 Apr 2021, 17:02
Roman: You not aware that adding 18 registers has other consequences that make the whole CPU slower and more power hungry?
L1 more than 200 times larger than 18 registers. |
|||
![]() |
|
Roman 10 Apr 2021, 17:19
Quote:
And how about 64 bits and new 8 regs ? They slowed down a lot new CPU in 2006 year ? Last edited by Roman on 10 Apr 2021, 17:41; edited 1 time in total |
|||
![]() |
|
Overclick 10 Apr 2021, 17:40
Roman,It is not that simple to add new registers. Each register multiplies the mesh of multiway connections. Enlarges decoding blocks etc
|
|||
![]() |
|
Roman 10 Apr 2021, 17:41
I know this.
But if stop development new CPU, we would be stuck on ZX-Spectrum CPU ! Last edited by Roman on 10 Apr 2021, 18:07; edited 1 time in total |
|||
![]() |
|
revolution 10 Apr 2021, 18:07
Roman wrote:
Randomly adding more registers necessarily means removing or reducing other functionality and/or lowering clock speeds. BTW: You still haven't shown us your design. |
|||
![]() |
|
Overclick 10 Apr 2021, 18:28
Quote:
Actually they do improvement all the time. CPU is not scalar for example. Decoding is biggest problem between back compatibility and RISC inside. Intel tried their Itanium as something new and it was dead end. Also huge army of ARM manufacturers improves on that way. x86 cannot be flexible too much but look at SSE/AVX it is almost you looking for. Registers growing up not for their quantity but their size. |
|||
![]() |
|
Roman 11 Apr 2021, 04:15
By the way.
Avx 512 have 32 xmms registers! And we could use them to store values for procs. Some Direct2d functions use xmm for params. |
|||
![]() |
|
Overclick 11 Apr 2021, 07:15
All you can do is prepare params in right registers to avoid redirection by Invoke itself.
Try to open any winapi in debugger. There is a tons of operations and registers reuse. All of Call optimization is pointless until Microsoft redesigned their functions more accurate. Also some winapi can be redesigned by yourself to macros to avoid extra code and compile to exact task. But it is huge work to do and better to look at this way where such optimization extremelly required. Actually some tasks we already do by own code instead of Winapi, don't we? )) |
|||
![]() |
|
Goto page Previous 1, 2, 3 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.