flat assembler
Message board for the users of flat assembler.

Index > Main > Choosing MOV or LEA - optimization

Author
Thread Post new topic Reply to topic
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 09 Jan 2005, 22:53
Hi,
A question from me over a long time Smile
ok, I happend to stumble upon a LEA operation which I optimized
to LEA edi,[esi] which is equal to MOV edi,esi. I started thinking
filosophically - which one is better. They both take two bytes, they
act the same and they issue in 1clock both (taking 1µop) and
neither of them change any flags or cause exeptions. Is there
some logic like LEA is newer and causes less trasistors to switch or
produces less heat averaging less CPU usage? Razz I know the change
is so little that anyone hardly notices, but it would be interesting to
know. Same question goes for xor eax,eax versus sub eax,eax.
Of course this should be obvious that xor's circuitry is simpler.

Hypothetically I have a 1Hz computer with x86 instruction set Cool and
it was built using only lamps (those pre-transistor ones).

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 09 Jan 2005, 22:53
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
vbVeryBeginner



Joined: 15 Aug 2004
Posts: 884
Location: \\world\asia\malaysia
vbVeryBeginner 10 Jan 2005, 03:42
you build your own computer , madis731! [shocked]

oh men, u are great Smile wish to learn from you more Smile
Post 10 Jan 2005, 03:42
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 10 Jan 2005, 07:59
Hello,
i guess LEA is a complex microcode,
MOV is a simple one, so i'd prefer MOV instead, if you have a benchmarkable PC processor, you could measure the heat generated, while executing those instructions with the same rate.

i'm not sure hm C whould be the difference. ( another method whould be to measure the Processor's current, i whouldn't do that to my motherboard Smile )
Post 10 Jan 2005, 07:59
View user's profile Send private message Visit poster's website Reply with quote
proveren



Joined: 24 Jul 2004
Posts: 68
Location: Bulgaria
proveren 10 Jan 2005, 10:18
I wondered, might seem stupid:
why XOR EAX,EAX or SUB EAX,EAX
why not just mov eax,0 or lea eax,0 or and eax,0 ... // more bytes Wink, but arent they faster?
Post 10 Jan 2005, 10:18
View user's profile Send private message Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 10 Jan 2005, 12:41
hmm, now you got a question there,
it should be faster, but ...
not for those bytes, but i didn't see any adventages, generally, you move 0 into register to zero it out.
Post 10 Jan 2005, 12:41
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 10 Jan 2005, 15:11
Razz I didn't build a computer - I'm just interested.

Obviously I can't go into measuring currents on my current PC, but I think the serialized instruction-issuing would be great. Of course this can't be LEA and JMP to LEA, but more like times 1024 LEA and JMP to first LEA.
C though is not an option here because its too complex - if I have time and any ideas I would try to boot a diskette with this kind of code and measure CPU-heating.

EDIT: to proveren - XOR is simple circuitry so you can have your result the same time you input your signal. SUB contrary is very complicated and needs carries, which can't be calculated before you have input from the last bit and so on. Using "fast-carries" can improve speed, but the heat generated is about 4times greater.
MOV, AND, LEA take 5-6 bytes (I think AND, LEA have short forms too, but not sure...) so it means CPU has to make at least 2 32bit reads. XOR/SUB r32,r32 take 2 bytes meaning MAXimum of 2 reads.
I don't wish to think about cache fills from RAM or beyond that, but either way its slower and/or more CPU-heating.
Post 10 Jan 2005, 15:11
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 10 Jan 2005, 15:50
MOV can be larger than XOR, but in some docs, it takes 0 ticks,
however the decoding in the background can consume more power, because of more bytes ... Smile , its a good thing Processors are getting more Riscy,
Photo-Processors are on the way ! ( THz CPU Cool )
Post 10 Jan 2005, 15:50
View user's profile Send private message Visit poster's website Reply with quote
proveren



Joined: 24 Jul 2004
Posts: 68
Location: Bulgaria
proveren 12 Jan 2005, 06:37
Thanks for the responses,
I thought that there can't be a simpler circuit than mov, and also mov can be executed in either pipe.
But almost everybody uses xor eax,eax , even C/C++ compilers, so I now understood the idea.
By the way the size of the instructions is not a small deal in my opinion, it does matter indeed.
Post 12 Jan 2005, 06:37
View user's profile Send private message Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85 12 Jan 2005, 14:55
On selecting instructions, Intel recommends the shorter instruction/sequence.
Post 12 Jan 2005, 14:55
View user's profile Send private message Reply with quote
snifit



Joined: 10 Dec 2004
Posts: 12
Location: Sweden
snifit 19 Jan 2005, 22:11
LEA doesn't modify any flags.
LEA reg,mem uses 2+EA clock cycles when used on a 808x system.

MOV doesn't modify any flags.
MOV reg,mem uses 8+EA clock cycles when used on a 808x system.

This info is taken from a microsoft help file called "44op.hlp".
Post 19 Jan 2005, 22:11
View user's profile Send private message Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85 19 Jan 2005, 22:47
This is not representative anymore. Current pipelines are very deep, and together with other tricks these instructions are overall faster.

Also, the MOV reg,mem fetches another word from memory, while LEA LEA reg,mem only act upon registers, even with a memory operand.
Post 19 Jan 2005, 22:47
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 27 Jan 2005, 18:48
According to AMD's doc (22007.pdf), on K7 core LEA has latency of 2, while MOV has latency of 1.
My measurement with CodeAnalyst shown: the simple form of LEA (e.g. LEA edi,[esi]) can take sometimes 0..1 ticks (as MOV does), so I belive it's as fast as MOV is...
But personally I prefer to use MOV wherever posible, beause lea eax, [ebp] & lea eax, [esp] have 3 bytes opcodes.
Post 27 Jan 2005, 18:48
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.