flat assembler
Message board for the users of flat assembler.
Index
> Main > The most useless instruction Goto page Previous 1, 2, 3, 4, 5, 6, 7 |
Author |
|
revolution 02 Jan 2016, 18:05
shutdownall wrote: What about mov eax,eax ? |
|||
02 Jan 2016, 18:05 |
|
Xorpd! 03 Jan 2016, 07:08
Actually mov eax, eax can be more useful than a NOP. Look in Agner Fog's stuff where he shows that it can speed up performance of loops sometimes.
|
|||
03 Jan 2016, 07:08 |
|
Xorpd! 05 Jan 2016, 06:26
You would think so, but there is a strange issue here. Normally one struggles to schedule instructions such that the result of one instruction is available by the time an instruction that uses that result needs it. But there is the opposite problem: after an instruction produces a result and writes it to a register, it is available for other instructions to be used pretty much at will until that register is actually written back to the physical register file.
For registers whose values are only available on the physical register file, there are only a limited number (only two or three) register read ports, so if more than that are required in a single clock cycle, the processor will stall until a register read port becomes available. If eax needs to be read a couple of times in one clock cycle in a loop, and the loop never writes eax, putting a mov eax, eax instruction shortly before those reads of eax can prevent a stall because eax will not yet be written back so it doesn't take up register read port bandwidth. This is all in http://agner.org/optimize/microarchitecture.pdf but I notice that he says that this is not a problem for Sandy Bridge on up, but in fact this issue still can slow down code on Haswell processors. So mov eax, eax and mov ebx, ebx are different in that one makes eax available to a few subsequent instructions without requiring a register read port and the other does the same for ebx. Also in 64-bit code those instructions clear the high 32 bits of the register in question. |
|||
05 Jan 2016, 06:26 |
|
revolution 05 Jan 2016, 06:33
Erm, all those CPU internal details are going to be different for the AMD stuff. Also such things change regularly with newer models coming out. So as usual if you really need to save that last nanosecond make sure that you test it on the target system. And don't simply assume your code will run the best on every system just because it runs well on one system.
|
|||
05 Jan 2016, 06:33 |
|
Goto page Previous 1, 2, 3, 4, 5, 6, 7 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.