flat assembler
Message board for the users of flat assembler.

Index > Main > Modern CPU about registers.

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
Roman



Joined: 21 Apr 2012
Posts: 1024
Roman
How did registers are represented in modern CPU ?

I mean this is buffer or separate blocks.

For example pseudocode:
Code:
RegsBuffer dq 0,0,0,0,0,0,0,0
RegEAX = 0
RegEBX = 8
RegECX = 16
mov EAX,ECX
;translate to this
mov CPUreg1,[RegsBuffer+RegECX]
mov [RegsBuffer+RegEAX],CPUreg1

    


If this looking like this pseudocode.
Then I not see any problems to implement many registers to CPU.


Last edited by Roman on 13 Jan 2021, 10:14; edited 1 time in total
Post 13 Jan 2021, 10:10
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17997
Location: In your JS exploiting you and your system
revolution
Actually, that is quite close to what it is.

The register file is a very fast, and very small, region of SRAM inside the CPU.

Although there are many more details (like renaming, multi-porting, etc.) that complicate it a bit, but on a basic level it is just RAM.
Post 13 Jan 2021, 10:14
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1024
Roman
Quote:

but on a basic level it is just RAM.

I think the same way.
Post 13 Jan 2021, 10:16
View user's profile Send private message Reply with quote
MaoKo



Joined: 07 May 2019
Posts: 91
Location: Paris/French
MaoKo
revolution wrote:

Although there are many more details (like renaming, multi-porting, etc.) that complicate it a bit

Have you at hand some documentation?
Post 13 Jan 2021, 21:35
View user's profile Send private message Reply with quote
sts-q



Joined: 29 Nov 2018
Posts: 35
sts-q
Post 14 Jan 2021, 05:53
View user's profile Send private message Visit poster's website Reply with quote
MaoKo



Joined: 07 May 2019
Posts: 91
Location: Paris/French
MaoKo
sts-q Thanks you Smile
Post 14 Jan 2021, 17:19
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1024
Roman
I read 128 virtual registers used AMD or Intel on modern CPU.

But 128 regs for all CPU cores or only for one core ?

If 128 regs for all CPU cores, its very bad.
128 regs/6 cores = 21 regs for one core.
Post 05 Apr 2021, 14:12
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17997
Location: In your JS exploiting you and your system
revolution
Each core has its own set of registers. They are not shared.
Post 05 Apr 2021, 14:25
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1024
Roman
revolution wrote:
Each core has its own set of registers. They are not shared.

128 registers for each Core ?
Post 06 Apr 2021, 05:09
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17997
Location: In your JS exploiting you and your system
revolution
Yes, something like that. We have discussed about register renaming in another thread.

You are using them without even realising, it all happens automatically in the background.
Post 06 Apr 2021, 05:22
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1024
Roman
I thought it would be nice to have this asm command on CPU.
Code:
Val1 dd 0
Val2 dd 2
Val3 dd 1
Val4 dd 0
AdrTab dd Val1,Val2,Val3,Val4
;in code
movArrReg eax,3 or mem or reg,AdrTab or reg;get adr from AdrTab and do mov [Val1],eax then mov [Val2],eax etc
;another asm command 
movArrToArr AdrTab2,3 or mem or reg,AdrTab or reg
    

We dynamically change number outputs values and change AdrTab offset ! Very good command !
We dynamically change pointers in AdrTab or another table lists of values !
No need loops ! Good for if and cases !

One CPU tick for millions values ! Very Happy
Post 08 Apr 2021, 07:15
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1587
Furs
Roman wrote:
One CPU tick
Just because it's an instruction in the CPU doesn't mean it's one CPU tick. Only the simplest instructions are one CPU tick.

You want magic, which doesn't exist.
Post 08 Apr 2021, 11:46
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17997
Location: In your JS exploiting you and your system
revolution
Furs wrote:
You want magic, which doesn't exist.
fsin is a single instruction, so is fsqrt and fdiv. People could use them more. But they don't, right. Why? Because they are complex and "slow", taking many many ticks to run.

Roman's suggested instructions could be implemented. But they would also be slow and no one will use them for that reason.
Post 08 Apr 2021, 15:35
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1024
Roman
fdiv slow but sse divss faster !
If Intel want best result, then they are doing well !
Post 08 Apr 2021, 16:38
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17997
Location: In your JS exploiting you and your system
revolution
Roman wrote:
fdiv slow but sse divss faster ...
... and less precise. There is always a trade-off to be made.
Post 08 Apr 2021, 16:41
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3132
Location: vpcmipstrm
bitRAKE
All div performance has changed over the years. Currently, it's optimized in surprising ways. For example, try dividing by a power of two (we know a shift could be used). Now plot a graph with timing versus bit's set in divisor. The graph will be different on different cores.

_________________
¯\(°_o)/¯ unlicense.org
Post 08 Apr 2021, 23:44
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1024
Roman
Another my idea is special 16 registers for Call.
Lets say regs pr0 to pr15. pr mean params

Profit no need do push(because esp changed !) or use rcx\rdx\r8\r9 !
And repeat second call without doubling params and pushes !
Code:
Val1 dd 5

proc applyValue
       mov  eax,[pr1]
       add   eax,pr0
       mov  [pr1],eax
       ret
;or more simple variant applyValue
       add   [pr1],pr0
       ret
endp

mov   pr0,10
mov   pr1,Val1
Call  applyValue ;after Val1 = 15
Call  applyValue ;after Val1 = 25
mov   pr0,2
Call  applyValue ;after Val1 = 27
mov   pr0,1
;CallLoop changed special register loopReg !
;We could use loopReg in procs ! And special reg Break for canceled CallLoop.
CallLoop applyValue,3 ;after Val1 = 30
    


And easy get curent params for call !
And we get more regs. Profit faster code and more comfortable programing !


Last edited by Roman on 09 Apr 2021, 06:58; edited 1 time in total
Post 09 Apr 2021, 05:30
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 913
Location: Belarus
DimonSoft
So, you basically ask for more registers. Take a look at Dalvik/ART and its parameter registers: it’s a lot more interesting but is generally possible only for software (virtual) machines since implementation would require “unlimited” managed memory which is too expensive to implement in hardware.

(Not) Doubling parameters is not really a thing since even procedures with similar parameter sets tend to have them a bit different: sequences of parameters might be the same but their positions in the overall parameter lists might not. So, the next thing to ask is a special instruction to shift parameters here and there. And then we quickly get stack x87-like architechture. And then you suddenly realize that 16 parameters are too few: a procedure with 3 parameters that calls CreateFont in Windows (14 parameters) asks for a mechanism to spill some registers to memory, so we get back to where we started (plain stack for parameter passing), just with another mechanism that needs to be implemented in hardware (additional cost), needs to be supported by compilers (additional cost), etc. to… not really solve any problem, just make it occur a few nested calls later at the cost of additional mechanism implementation.

Intel once tried to implement a processor with cool features in hardware—Itanium—and it failed. I guess, they won’t make the same mistake in the nearest future.
Post 09 Apr 2021, 06:10
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1024
Roman
Stack slow and some times not comfortable to work with stack(because changed esp) !
I show this in prevision post.

PS: And i not forbid stack. Some time stack needed for program.


Last edited by Roman on 09 Apr 2021, 06:31; edited 2 times in total
Post 09 Apr 2021, 06:15
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17997
Location: In your JS exploiting you and your system
revolution
The Itanium had a scheme similar to that.

But why do you want it? Why do you think it would be "faster"?

If you look into CPU design more you might see why just adding more addressable registers won't necessarily help, and it probably would make it slower. You can't simply add registers and suddenly everything is awesomely fast. If it was so easy it would already have been done.

And those registers need space in the instruction encoding, where would you place those bit encodings?
Post 09 Apr 2021, 06:18
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.