flat assembler
Message board for the users of flat assembler.

Index > Main > The most useless instruction

Goto page Previous  1, 2, 3, 4, 5, 6, 7
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20454
Location: In your JS exploiting you and your system
revolution 02 Jan 2016, 18:05
shutdownall wrote:
What about mov eax,eax ?
That is good as a NOP.
Post 02 Jan 2016, 18:05
View user's profile Send private message Visit poster's website Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd! 03 Jan 2016, 07:08
Actually mov eax, eax can be more useful than a NOP. Look in Agner Fog's stuff where he shows that it can speed up performance of loops sometimes.
Post 03 Jan 2016, 07:08
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 05 Jan 2016, 03:51
Xorpd! wrote:
Actually mov eax, eax can be more useful than a NOP. Look in Agner Fog's stuff where he shows that it can speed up performance of loops sometimes.
mov eax, eax is same to mov ebx, ebx, isn't it?
Post 05 Jan 2016, 03:51
View user's profile Send private message Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd! 05 Jan 2016, 06:26
You would think so, but there is a strange issue here. Normally one struggles to schedule instructions such that the result of one instruction is available by the time an instruction that uses that result needs it. But there is the opposite problem: after an instruction produces a result and writes it to a register, it is available for other instructions to be used pretty much at will until that register is actually written back to the physical register file.

For registers whose values are only available on the physical register file, there are only a limited number (only two or three) register read ports, so if more than that are required in a single clock cycle, the processor will stall until a register read port becomes available.

If eax needs to be read a couple of times in one clock cycle in a loop, and the loop never writes eax, putting a mov eax, eax instruction shortly before those reads of eax can prevent a stall because eax will not yet be written back so it doesn't take up register read port bandwidth.

This is all in http://agner.org/optimize/microarchitecture.pdf but I notice that he says that this is not a problem for Sandy Bridge on up, but in fact this issue still can slow down code on Haswell processors.

So mov eax, eax and mov ebx, ebx are different in that one makes eax available to a few subsequent instructions without requiring a register read port and the other does the same for ebx. Also in 64-bit code those instructions clear the high 32 bits of the register in question.
Post 05 Jan 2016, 06:26
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20454
Location: In your JS exploiting you and your system
revolution 05 Jan 2016, 06:33
Erm, all those CPU internal details are going to be different for the AMD stuff. Also such things change regularly with newer models coming out. So as usual if you really need to save that last nanosecond make sure that you test it on the target system. And don't simply assume your code will run the best on every system just because it runs well on one system.
Post 05 Jan 2016, 06:33
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6, 7

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.