flat assembler
Message board for the users of flat assembler.

Index > Main > Modern CPU about registers.

Goto page Previous  1, 2, 3
Author
Thread Post new topic Reply to topic
DimonSoft



Joined: 03 Mar 2010
Posts: 958
Location: Belarus
DimonSoft
Roman wrote:
Very simple. If you needed do push regs. If no needed not do push.

So, basically you just ask for more registers and the whole thing about procedure parameters is just an attempt to tie them to particular use cases.

Now imagine a program where Proc1 has parameters a, b, c and d and calls Proc2 with parameters e, f, g and h which are somehow related to Proc1’s parameters but are not the same. It is quite uncommon to call another procedure with (partially?) the same parameter set. And in the common case like above one would have to spill register values every now and then.

fastcall used to be a good thing for old hardware. These days it’s arguable if fastcall is fast. If it is faster than plain stack-based calling convention. After all, in the case above one has to put parameters into registers just to push them onto the stack a moment later, instead of directly putting them there. Time and code size spent to put data to registers have their impact.

Roman wrote:
Proc could change(if need ) num loops or set reg break for stop CallPrLoop.

Frankly speaking, I’m not sure I understand what behaviour you expect from CallPrLoop… instruction?

Roman wrote:
Quote:
Is that often enough to make everyone pay for a more complex and expensive processor?

You think AVX 512 cost chip ?
But AMD made in new CPU AVX 512.

DimonSoft how often do you use AVX 512 ?

We’re not talking about me, we’re talking about the world as a whole. There’re many tasks that involve processing large amounts of data where SIMD and stuff are useful. Multimedia has made its way from luxury feature to common everyday necessity. Making audio/image/video processing faster pays back. Making certain scientific calculations faster pays back. It’s optimizing for common case, not rare one. In general.
Post 09 Apr 2021, 18:17
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1082
Roman
Why in 64 bit you not wondered rcx,rdx,r8,r9 for Call ?
And i propose for CallPrLoop regs pr0-pr15 and regLoop and regBreak.
Post 09 Apr 2021, 18:37
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18222
Location: In your JS exploiting you and your system
revolution
So we know what you are suggesting.

Now please show your design. I suggest you start with the instruction encoding. How will you encode the instructions?
Post 09 Apr 2021, 19:01
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 958
Location: Belarus
DimonSoft
Not for Call but for Microsoft x64 calling convention. I didn’t ask Microsoft about that. They chose to do so but they’re also known to make bad decisions from performance point of view quite often. We can’t say anything about performance for sure but the complications are quite severe. I’m not sure they’re worth the effort so I’m happy 32-bit applications are more “cross-platform” than 64-bit ones for now and for the next 10+ years.

I gave you explanation why register-based calling conventions might be slower in certain cases. Good caching capabilities really make messing around with registers to pass parameters less worth.
Post 09 Apr 2021, 19:08
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18222
Location: In your JS exploiting you and your system
revolution
From what I have seen in the past, the most important feature in a CPU that gives the most performance boost is ... cache.

Everything else might possibly give a few percent here or there, but the cache can easily give up to ten times the performance.

You can test this yourself. Disable the cache and then run some code. Notice how it now runs really really slowly.

Adding extra register banks? Meh, whatever.
Increasing the clock 100Mhz? Meh, whatever.
Buying 4000Mhz RAM? Meh, whatever.
Disabling the cache? Don't you dare alter my caching!

Laughing
Post 09 Apr 2021, 19:23
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1647
Furs
revolution wrote:
From what I have seen in the past, the most important feature in a CPU that gives the most performance boost is ... cache.

Everything else might possibly give a few percent here or there, but the cache can easily give up to ten times the performance.

You can test this yourself. Disable the cache and then run some code. Notice how it now runs really really slowly.

Adding extra register banks? Meh, whatever.
Increasing the clock 100Mhz? Meh, whatever.
Buying 4000Mhz RAM? Meh, whatever.
Disabling the cache? Don't you dare alter my caching!

Laughing
Yeah, and speculative execution.
Post 10 Apr 2021, 13:58
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1082
Roman
revolution you not afraid L1 cache 32 kb, but afraid more additional 18 regs.
This is very strange.

L1 cache slow in 3 times than register.
Post 10 Apr 2021, 14:49
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18222
Location: In your JS exploiting you and your system
revolution
Roman: You not aware that adding 18 registers has other consequences that make the whole CPU slower and more power hungry?

L1 more than 200 times larger than 18 registers.
Post 10 Apr 2021, 17:02
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1082
Roman
Quote:

You not aware that adding 18 registers has other consequences that make the whole CPU slower and more power hungry?

And how about 64 bits and new 8 regs ? They slowed down a lot new CPU in 2006 year ?


Last edited by Roman on 10 Apr 2021, 17:41; edited 1 time in total
Post 10 Apr 2021, 17:19
View user's profile Send private message Reply with quote
Overclick



Joined: 11 Jul 2020
Posts: 394
Location: Ukraine
Overclick
Roman,It is not that simple to add new registers. Each register multiplies the mesh of multiway connections. Enlarges decoding blocks etc
Post 10 Apr 2021, 17:40
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1082
Roman
I know this.
But if stop development new CPU, we would be stuck on ZX-Spectrum CPU !


Last edited by Roman on 10 Apr 2021, 18:07; edited 1 time in total
Post 10 Apr 2021, 17:41
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18222
Location: In your JS exploiting you and your system
revolution
Roman wrote:
Quote:

You not aware that adding 18 registers has other consequences that make the whole CPU slower and more power hungry?

And how about 64 bits and new 8 regs ? They slowed down a lot new CPU in 2006 year ?
Yes, most probably. The heat generation was increased from more decoder circuitry, and the register addressing circuitry. Plus it might have displaced some space that could be used for cache. So the combination together means slower clocks, less cache and other stuff. Without that we could have higher clocking, lower power 32-bit CPUs. So there is a trade-off to make.

Randomly adding more registers necessarily means removing or reducing other functionality and/or lowering clock speeds.

BTW: You still haven't shown us your design.
Post 10 Apr 2021, 18:07
View user's profile Send private message Visit poster's website Reply with quote
Overclick



Joined: 11 Jul 2020
Posts: 394
Location: Ukraine
Overclick
Quote:

But if stop development new CPU, we would be stuck on ZX-Spectrum CPU

Actually they do improvement all the time. CPU is not scalar for example. Decoding is biggest problem between back compatibility and RISC inside.
Intel tried their Itanium as something new and it was dead end. Also huge army of ARM manufacturers improves on that way. x86 cannot be flexible too much but look at SSE/AVX it is almost you looking for. Registers growing up not for their quantity but their size.
Post 10 Apr 2021, 18:28
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1082
Roman
By the way.
Avx 512 have 32 xmms registers!
And we could use them to store values for procs.
Some Direct2d functions use xmm for params.
Post 11 Apr 2021, 04:15
View user's profile Send private message Reply with quote
Overclick



Joined: 11 Jul 2020
Posts: 394
Location: Ukraine
Overclick
All you can do is prepare params in right registers to avoid redirection by Invoke itself.

Try to open any winapi in debugger. There is a tons of operations and registers reuse. All of Call optimization is pointless until Microsoft redesigned their functions more accurate.
Also some winapi can be redesigned by yourself to macros to avoid extra code and compile to exact task. But it is huge work to do and better to look at this way where such optimization extremelly required.
Actually some tasks we already do by own code instead of Winapi, don't we? ))
Post 11 Apr 2021, 07:15
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.