flat assembler
Message board for the users of flat assembler.
Index
> Main > Intel plans doubling 16 general purpose registers to 32 Goto page Previous 1, 2, 3, 4 Next |
Author |
|
bitRAKE 29 Jul 2023, 23:00
Intel has tipped their hand at how they will be implementing the 64-bit only processor.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
29 Jul 2023, 23:00 |
|
Furs 30 Jul 2023, 13:58
revolution wrote:
Not sure what 3 operand instructions are for, considering moves are renamed and cost 0 cycles, not like you have a million moves due to destination operand. But at least it won't bloat the instruction stream since the mov costs bytes to encode as well. I have a bad feeling Intel will just disable a lot of the current optimizations, expecting you to use their new bullshit instructions/encodings, and so current code will run like shit on their new CPUs. In that case I will permanently switch to AMD until it changes. |
|||
30 Jul 2023, 13:58 |
|
revolution 31 Jul 2023, 01:22
Furs wrote: Not sure what 3 operand instructions are for, considering moves are renamed and cost 0 cycles, not like you have a million moves due to destination operand. But at least it won't bloat the instruction stream since the mov costs bytes to encode as well. Furs wrote: I have a bad feeling Intel will just disable a lot of the current optimizations, expecting you to use their new bullshit instructions/encodings, and so current code will run like shit on their new CPUs. |
|||
31 Jul 2023, 01:22 |
|
revolution 31 Jul 2023, 03:36
revolution wrote:
|
|||
31 Jul 2023, 03:36 |
|
Furs 31 Jul 2023, 13:04
revolution wrote: Not sure what your beef is. If you don't want to "bloat the instruction stream" then simply don't use any of the new instructions or registers. Your code can continue to be "un-bloated" and you don't have to do anything different. Is this social media where only "upvotes" exist and people aren't allowed to express disapproval now? revolution wrote: What is your basis for assuming Intel will sabotage themselves by making their CPUs undesirable? In hindsight, it's easy to act like a know-it-all of why it failed. But that was not the sentiment back then, except for a few people like me. Most people were super hyped about it. Guess how it turned out? |
|||
31 Jul 2023, 13:04 |
|
revolution 31 Jul 2023, 13:32
Furs wrote: And I'm allowed to complain and explain why they're stupid and I wouldn't use them in the first place. Furs wrote: Itanium |
|||
31 Jul 2023, 13:32 |
|
Furs 01 Aug 2023, 13:12
revolution wrote: No one said you can't complain. But your "complaint" is silly. You aren't forced to "bloat" your code at all. You can choose to "bloat" your code if you want to, by using R16-R31, or the NDD thing. And you can also choose to never use R16-R31 or NDD. How does adding choice hurt you? revolution wrote: You are moving the goal posts and talking about a different thing. Your suggestion above was that Intel would deliberately make non-REX2 stuff worse. But there is at least one competitor, AMD, so any deliberate reduction in performance would be suicide. That is entirely different form Itanium, a whole new architecture, with no competitor, and no idea if it would be work. So let's get back to the original question before you distracted the discussion, what makes you think Intel will sabotage themselves by deliberately making their stuff worse than their competitor's? Intel could make it "slow" to get more transistor budget for this new bloated bs and since a reason could be "we now have 3 operand instructions, no need for mov to be fast". It's not rocket science to figure it out. Look what happened to x87 because it has a "replacement" (SSE). So what about old apps using x87 and their performance? They had no reason huh? |
|||
01 Aug 2023, 13:12 |
|
revolution 01 Aug 2023, 14:45
If you want to suggest that Intel intends to use REX2/EEVEX to replace all other encodings, then you have to show your evidence. If you did replace everything with REX2/EEVEX then currently you will only get a very small subset of available instructions, and almost none of the "normal" simple instructions.
So your argument that it is like Itanium makes no sense. Itanium was a replacement, not an extension. It didn't work out, but that is the way of things, sometimes things just don't go as planned. AMD came along and extended x86 with the x86-64. That was a great success. And now Intel have extended x86-64 to this new thing. Suggesting that Intel will sabotage themselves by somehow making their CPUs worse is silly, like I mentioned, it has no basis in reality. Plus the estimates of 10% improvement makes your whole argument moot. If it is "bloat", then it is bloat that works to make stuff better. Embrace the bloat if it works, reject the bloat if it harms. |
|||
01 Aug 2023, 14:45 |
|
bitRAKE 01 Aug 2023, 14:55
My perspective on Intel is little different, but with a similar conclusion:
Intel has optimized for their business position. Which means building processors for compilers - initially this meant their own compiler, but later it follows from compiler research more generally. The majority of code is compiled - so, this is an efficient way to produce better results for their customers. Do they intentionally have poor performance elsewhere in their processor designs? No, this is a result of low priority and neglect. Could they do better. Sure. To make it more concrete, let us look at just control flow instructions. Compilers don't use LOOPcc or J[RE]CXZ == very low priority for Intel. AMD has a more "wholistic" approach in their design, imho. Which results in these instructions still being performant. This isn't something new - it's been this way for decades. The "knock-on" effect is that compilers aren't going to use these low priority instructions in the future either. (Should be a caveat here, but that's another discussion.) Code: uops.info - Table Alder Lake-P AMD Zen+/2/3/4 Instruction Lat TP Uops Ports Lat TP Uops Ports LOOP (Rel8) BASE 2 2.00 / 4.94 7 / 6 1*p0156B+4*p06+1*p1 1 0.50 1 LOOPE (Rel8) BASE [1;3] 3.00 / 6.00 12 / 10 3*p0156B+6*p06+1*p1 1 0.50 1 LOOPNE (Rel8) BASE [1;3] 3.00 / 5.97 12 / 10 3*p0156B+6*p06+1*p1 1 0.50 1 JRCXZ (Rel8) BASE 0.50 / 0.50 2 / 2 1*p0156B+1*p06 0.50 1 (LOOPcc on Intel is shit. Yet, on AMD same as CMP/Jcc - wow!) Go through the whole ISA and find a similar result. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
01 Aug 2023, 14:55 |
|
Roman 01 Aug 2023, 18:57
New 16 registers
And new 16 xmm16 to xmm31 Nice |
|||
01 Aug 2023, 18:57 |
|
revolution 01 Aug 2023, 19:28
Latency and throughput numbers are meaningless on their own.
It all depends upon how they interact within the entire code stream, mixed in with all the other instructions surrounding them. Combine that with the previous states and the content of the caches and buffers and things, that is where the real performance benefits and hazards come from. |
|||
01 Aug 2023, 19:28 |
|
Ali.Z 01 Aug 2023, 20:01
calling the option to use extended register set as a bloat is invalid.
why didnt someone complain against some instructions set rather than extensions? if adding an option causes the enitre architecture to be slow, then modern CPUs should be slower than 8086. what would you say, invalid comparision? surely it is for obvious reasons, I didnt take into account that modern CPUs are much faster, can execute instructions in parallel, OEEE, tiny transistors, different internal design, cahce... and among many other optimizations. so if you call my arg invalid, then so does yours as you didnt take into account what intel would change... as if intel tells you and keeps you up to date with all of their internal secret design of the architecture, which is bs non sense. (and intel always been an ass in sharing details, docs, secrets, and when they say a word it is likely to be vague) ... _________________ Asm For Wise Humans Last edited by Ali.Z on 01 Aug 2023, 21:00; edited 2 times in total |
|||
01 Aug 2023, 20:01 |
|
bitRAKE 01 Aug 2023, 20:01
Interpreting my post as being about performance is to completely miss the point.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
01 Aug 2023, 20:01 |
|
revolution 01 Aug 2023, 20:14
bitRAKE wrote: Interpreting my post as being about performance is to completely miss the point. bitRAKE wrote: Which results in these instructions still being performant. <snip latency and throughput numbers> |
|||
01 Aug 2023, 20:14 |
|
bitRAKE 01 Aug 2023, 20:34
revolution wrote:
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
01 Aug 2023, 20:34 |
|
revolution 01 Aug 2023, 20:49
bitRAKE wrote: By reading all the words that came before it. bitRAKE wrote: Do they intentionally have poor performance elsewhere in their processor designs? ... My English isn't perfect, but I think it is okay enough for most purposes. But your comment about it not being about performance completely baffles me. |
|||
01 Aug 2023, 20:49 |
|
bitRAKE 01 Aug 2023, 21:08
If I had a thesis it would be, "Intel designs for the compiler and neglects instruction not used by the compiler." The metrics presented are an indication of this. Look at the other non-compiler instructions in the ISA and you will see a similar pattern.
We are both aware of the complexity of measuring performance, but to claim that LOOPcc performs similarly on Intel and AMD is dishonest. We don't need to get lost down that alley though - that's not the point. We can just look at non-compiler instructions to see what Intel does. |
|||
01 Aug 2023, 21:08 |
|
Furs 02 Aug 2023, 13:12
revolution wrote: If you want to suggest that Intel intends to use REX2/EEVEX to replace all other encodings, then you have to show your evidence. If you did replace everything with REX2/EEVEX then currently you will only get a very small subset of available instructions, and almost none of the "normal" simple instructions. Itanium might have replaced x86, but the point from an end user perspective was that existing apps (x86) were slow, due to emulator. They don't care it was emulated. All they cared about is that they were slow, and to get performance they'd have to recompile for it. So I simply said, if they drop optimizations (such as 0 latency move renames) because of this new crap (3 operand instructions for instance), then they will have very similar situation to Itanium. Existing apps will be slow. "Recompiling" will make them fast. And so on. Same with x87 (and I mean scalar SSE obviously). Existing apps would become slow (though much later down the line), "recompiling" to scalar SSE would make them fast, and so on. What's so confusing about what I said? |
|||
02 Aug 2023, 13:12 |
|
Furs 02 Aug 2023, 13:15
Ali.Z wrote: calling the option to use extended register set as a bloat is invalid. x87 is mostly micro-coded right now hence extremely slow. You need more proof…? |
|||
02 Aug 2023, 13:15 |
|
Goto page Previous 1, 2, 3, 4 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.