flat assembler
Message board for the users of flat assembler.
Index
> Main > Modern CPU about registers. Goto page 1, 2, 3 Next |
Author |
|
revolution 13 Jan 2021, 10:14
Actually, that is quite close to what it is.
The register file is a very fast, and very small, region of SRAM inside the CPU. Although there are many more details (like renaming, multi-porting, etc.) that complicate it a bit, but on a basic level it is just RAM. |
|||
13 Jan 2021, 10:14 |
|
Roman 13 Jan 2021, 10:16
Quote:
I think the same way. |
|||
13 Jan 2021, 10:16 |
|
MaoKo 13 Jan 2021, 21:35
revolution wrote:
Have you at hand some documentation? |
|||
13 Jan 2021, 21:35 |
|
sts-q 14 Jan 2021, 05:53
|
|||
14 Jan 2021, 05:53 |
|
MaoKo 14 Jan 2021, 17:19
sts-q Thanks you
|
|||
14 Jan 2021, 17:19 |
|
Roman 05 Apr 2021, 14:12
I read 128 virtual registers used AMD or Intel on modern CPU.
But 128 regs for all CPU cores or only for one core ? If 128 regs for all CPU cores, its very bad. 128 regs/6 cores = 21 regs for one core. |
|||
05 Apr 2021, 14:12 |
|
revolution 05 Apr 2021, 14:25
Each core has its own set of registers. They are not shared.
|
|||
05 Apr 2021, 14:25 |
|
Roman 06 Apr 2021, 05:09
revolution wrote: Each core has its own set of registers. They are not shared. 128 registers for each Core ? |
|||
06 Apr 2021, 05:09 |
|
revolution 06 Apr 2021, 05:22
Yes, something like that. We have discussed about register renaming in another thread.
You are using them without even realising, it all happens automatically in the background. |
|||
06 Apr 2021, 05:22 |
|
Roman 08 Apr 2021, 07:15
I thought it would be nice to have this asm command on CPU.
Code: Val1 dd 0 Val2 dd 2 Val3 dd 1 Val4 dd 0 AdrTab dd Val1,Val2,Val3,Val4 ;in code movArrReg eax,3 or mem or reg,AdrTab or reg;get adr from AdrTab and do mov [Val1],eax then mov [Val2],eax etc ;another asm command movArrToArr AdrTab2,3 or mem or reg,AdrTab or reg We dynamically change number outputs values and change AdrTab offset ! Very good command ! We dynamically change pointers in AdrTab or another table lists of values ! No need loops ! Good for if and cases ! One CPU tick for millions values ! |
|||
08 Apr 2021, 07:15 |
|
Furs 08 Apr 2021, 11:46
Roman wrote: One CPU tick You want magic, which doesn't exist. |
|||
08 Apr 2021, 11:46 |
|
revolution 08 Apr 2021, 15:35
Furs wrote: You want magic, which doesn't exist. Roman's suggested instructions could be implemented. But they would also be slow and no one will use them for that reason. |
|||
08 Apr 2021, 15:35 |
|
Roman 08 Apr 2021, 16:38
fdiv slow but sse divss faster !
If Intel want best result, then they are doing well ! |
|||
08 Apr 2021, 16:38 |
|
revolution 08 Apr 2021, 16:41
Roman wrote: fdiv slow but sse divss faster ... |
|||
08 Apr 2021, 16:41 |
|
bitRAKE 08 Apr 2021, 23:44
All div performance has changed over the years. Currently, it's optimized in surprising ways. For example, try dividing by a power of two (we know a shift could be used). Now plot a graph with timing versus bit's set in divisor. The graph will be different on different cores.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
08 Apr 2021, 23:44 |
|
Roman 09 Apr 2021, 05:30
Another my idea is special 16 registers for Call.
Lets say regs pr0 to pr15. pr mean params Profit no need do push(because esp changed !) or use rcx\rdx\r8\r9 ! And repeat second call without doubling params and pushes ! Code: Val1 dd 5 proc applyValue mov eax,[pr1] add eax,pr0 mov [pr1],eax ret ;or more simple variant applyValue add [pr1],pr0 ret endp mov pr0,10 mov pr1,Val1 Call applyValue ;after Val1 = 15 Call applyValue ;after Val1 = 25 mov pr0,2 Call applyValue ;after Val1 = 27 mov pr0,1 ;CallLoop changed special register loopReg ! ;We could use loopReg in procs ! And special reg Break for canceled CallLoop. CallLoop applyValue,3 ;after Val1 = 30 And easy get curent params for call ! And we get more regs. Profit faster code and more comfortable programing ! Last edited by Roman on 09 Apr 2021, 06:58; edited 1 time in total |
|||
09 Apr 2021, 05:30 |
|
DimonSoft 09 Apr 2021, 06:10
So, you basically ask for more registers. Take a look at Dalvik/ART and its parameter registers: it’s a lot more interesting but is generally possible only for software (virtual) machines since implementation would require “unlimited” managed memory which is too expensive to implement in hardware.
(Not) Doubling parameters is not really a thing since even procedures with similar parameter sets tend to have them a bit different: sequences of parameters might be the same but their positions in the overall parameter lists might not. So, the next thing to ask is a special instruction to shift parameters here and there. And then we quickly get stack x87-like architechture. And then you suddenly realize that 16 parameters are too few: a procedure with 3 parameters that calls CreateFont in Windows (14 parameters) asks for a mechanism to spill some registers to memory, so we get back to where we started (plain stack for parameter passing), just with another mechanism that needs to be implemented in hardware (additional cost), needs to be supported by compilers (additional cost), etc. to… not really solve any problem, just make it occur a few nested calls later at the cost of additional mechanism implementation. Intel once tried to implement a processor with cool features in hardware—Itanium—and it failed. I guess, they won’t make the same mistake in the nearest future. |
|||
09 Apr 2021, 06:10 |
|
Roman 09 Apr 2021, 06:15
Stack slow and some times not comfortable to work with stack(because changed esp) !
I show this in prevision post. PS: And i not forbid stack. Some time stack needed for program. Last edited by Roman on 09 Apr 2021, 06:31; edited 2 times in total |
|||
09 Apr 2021, 06:15 |
|
revolution 09 Apr 2021, 06:18
The Itanium had a scheme similar to that.
But why do you want it? Why do you think it would be "faster"? If you look into CPU design more you might see why just adding more addressable registers won't necessarily help, and it probably would make it slower. You can't simply add registers and suddenly everything is awesomely fast. If it was so easy it would already have been done. And those registers need space in the instruction encoding, where would you place those bit encodings? |
|||
09 Apr 2021, 06:18 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.