flat assembler
Message board for the users of flat assembler.

Index > Main > lea vs add?

Author
Thread Post new topic Reply to topic
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 22 May 2009, 04:20
In situations where either will work and they're the same size.. is one better then the other? And if so, why?


E.G. add eax,4 vs lea eax,[eax+4]
Post 22 May 2009, 04:20
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
manfred



Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred 22 May 2009, 05:50
Better in what?

_________________
Sorry for my English...
Post 22 May 2009, 05:50
View user's profile Send private message Visit poster's website Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 22 May 2009, 06:13
Same clocks, but lea won't change flags.
Post 22 May 2009, 06:13
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 22 May 2009, 12:51
I mean like which one is optimized better/can run at the same time as other instructions the most?
Post 22 May 2009, 12:51
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20486
Location: In your JS exploiting you and your system
revolution 22 May 2009, 14:15
It depends. You have to profile your code to see which is faster for your situation. Sometimes it will be one and sometimes it will be the other and also sometimes they will be the same.

Just make two versions of your code with the only difference being the lea/mov, and time it to see which works best for that code on that machine.

My personal guess is that you will never notice the difference.
Post 22 May 2009, 14:15
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4121
Location: vpcmpistri
bitRAKE 22 May 2009, 17:14
One of the great things about FASM is the ability to redefine instructions. So, to simplify testing, ADD could be used and then an ADD macro (changing ADD to LEA where applicable) could be conditionally included to compare the results.
Post 22 May 2009, 17:14
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 23 May 2009, 01:35
Thank you Smile


Lea is faster when I test it, I don't have a bunch of CPUs to test on though, so was wondering which is faster on average.
Post 23 May 2009, 01:35
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20486
Location: In your JS exploiting you and your system
revolution 23 May 2009, 02:59
I don't think it is correct to ask "on average". What is an average computer?
Post 23 May 2009, 02:59
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 23 May 2009, 03:03
Like if lea is faster in 30% of the cases and mov is faster in 70% of them, then mov is faster on average.
Post 23 May 2009, 03:03
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20486
Location: In your JS exploiting you and your system
revolution 23 May 2009, 03:09
But there is no average case in general. It all depends upon your code, your algorithms and your computer that you are testing everything on.


Last edited by revolution on 23 May 2009, 14:21; edited 1 time in total
Post 23 May 2009, 03:09
View user's profile Send private message Visit poster's website Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal 23 May 2009, 13:50
Why not do a benchmark test of a few codes where you change which instruction you use. E.g. execute the same arbituary statement like a million times or something and time the difference using a high precision timer. If it depends on the situation then you would need a variety of different codes I guess.
Post 23 May 2009, 13:50
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 23 May 2009, 22:53
don't they result in same micro ops except the flag thing anyway?
Post 23 May 2009, 22:53
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 23 May 2009, 22:58
It depends of the regs actually, I don't have the AMD manuals with me now but if I recall correctly "lea reg, [reg*1+reg]" will incurr in two cycles of latency, so even if you don't have an scaled index it can still be there (multiplying only by 1) because the encoding of the address required the SIB byte.

When what I've said above does not apply (i.e., the reg isn't implicitly scaled by one), perhaps the code could perform faster because you are releasing an ALU unit thanks to the fact that the address is calculated via an AGU?
Post 23 May 2009, 22:58
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 24 May 2009, 19:03
OK, what's happens here?

Code:
format PE console 4.0

entry start

include 'win32ax.inc'

macro tester func
{
  local ..loop

  invoke Sleep, 1000

  xor eax, eax
  cpuid

  call [GetTickCount]
  mov [timestart], eax
  mov ebx, $80000000

  call func

; Serialize
  xor eax, eax
  cpuid

  call [GetTickCount]
  sub eax, [timestart]
  push eax
  call @f
  db `func, 0
@@:
  push fmt
  call [printf]
  add esp, 12
  align 16
}

  fmt db "%s: ", "%dms", 10, 0
  timestart dd 0

start:

  invoke  GetCurrentProcess
  invoke  SetPriorityClass, eax, REALTIME_PRIORITY_CLASS
  invoke  GetCurrentThread
  invoke  SetThreadPriority, eax, THREAD_PRIORITY_TIME_CRITICAL

  tester  lea_adder
  tester  add_adder

  tester  lea_adder_longer_chain
  tester  add_adder_longer_chain


  invoke  ExitProcess, 0


lea_adder:
  xor     ebx, ebx
  mov     eax, 1

align 16
.loop:
  lea     ebx, [ebx+eax+1]
  imul    ecx, ebx, 3
  imul    ecx, ebx, 3
  test    ecx, ecx
  jnz     .loop

ret

add_adder:
  xor     ebx, ebx
  mov     eax, 2

align 16
.loop:
  add     ebx, eax
  imul    ecx, ebx, 3
  imul    ecx, ebx, 3
  test    ecx, ecx
  jnz     .loop

ret



add_adder_longer_chain:
  xor     ebx, ebx
  mov     eax, 2

align 16
.loop:
  add     ebx, eax
  imul    ecx, ebx, 3
  imul    edx, ecx, 3
  test    edx, edx
  jnz     .loop

ret

lea_adder_longer_chain:
  xor     ebx, ebx
  mov     eax, 1

align 16
.loop:
  lea     ebx, [ebx+eax+1]
  imul    ecx, ebx, 3
  imul    edx, ecx, 3
  test    edx, edx
  jnz     .loop

ret

align 4
data import

  library msvcrt, 'msvcrt.dll',\
    kernel32, 'kernel32.dll'
    
  import msvcrt,\
    printf, 'printf'
    
  include 'api/kernel32.inc'
end data    


Results:
Code:
C:\Documents and Settings\Hernan\Escritorio\Assembly>bench.exe
lea_adder: 3781ms
add_adder: 5406ms
lea_adder_longer_chain: 4328ms
add_adder_longer_chain: 3781ms

C:\Documents and Settings\Hernan\Escritorio\Assembly>bench.exe
lea_adder: 3781ms
add_adder: 5406ms
lea_adder_longer_chain: 4328ms
add_adder_longer_chain: 3782ms

C:\Documents and Settings\Hernan\Escritorio\Assembly>bench.exe
lea_adder: 3781ms
add_adder: 5390ms
lea_adder_longer_chain: 4312ms
add_adder_longer_chain: 3781ms    

(AMD Athlon64 Venice 2.0 GHz)

[edit]Added two more tests[/edit]


Last edited by LocoDelAssembly on 24 May 2009, 20:55; edited 1 time in total
Post 24 May 2009, 19:03
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 24 May 2009, 20:01
That's really interesting, my guess would be because the micro ops would be different?

_________________
Previously known as The_Grey_Beast
Post 24 May 2009, 20:01
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 24 May 2009, 20:39
Actually Core 2 can schedule three consecutive ADDs in the same time as LEA. What has happened here is a misfortunate micro-op scheduling. LEA needs to go to port0 always, ADD can go to any 0,1 or 5. If for whatever reason ADD "chose" 1 or 5 and forced the IMULs to schedule in the following clocks instead then the loop went - for this reason - 1 clock longer.

My tests on 65nm Core 2 (T7200) showed:
Code:
D:\Programs\FASM\Proged\Bench_ADD_LEA>bench
lea_adder: 2360ms
add_adder: 2281ms

D:\Programs\FASM\Proged\Bench_ADD_LEA>bench
lea_adder: 2235ms
add_adder: 2234ms

D:\Programs\FASM\Proged\Bench_ADD_LEA>bench
lea_adder: 2344ms
add_adder: 2234ms
    


If I changed the line LEA EBX,[EBX+EAX+1] to LEA EBX,[EBX+EAX] then the bench showed:
Code:
D:\Programs\FASM\Proged\Bench_ADD_LEA>bench
lea_adder: 4375ms
add_adder: 2235ms

D:\Programs\FASM\Proged\Bench_ADD_LEA>bench
lea_adder: 4547ms
add_adder: 2234ms

D:\Programs\FASM\Proged\Bench_ADD_LEA>bench
lea_adder: 4391ms
add_adder: 2234ms
    
Post 24 May 2009, 20:39
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 24 May 2009, 20:57
I have edited my post, please check.
Post 24 May 2009, 20:57
View user's profile Send private message Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal 24 May 2009, 21:20
Intel Core2 Quad CPU 2.40 GHz:

Code:
J:\My Files\Programming\fasmw16738\Codes>leaaddbench.exe
lea_adder: 1794ms
add_adder: 1904ms
lea_adder_longer_chain: 1872ms
add_adder_longer_chain: 1856ms

J:\My Files\Programming\fasmw16738\Codes>leaaddbench.exe
lea_adder: 1856ms
add_adder: 1826ms
lea_adder_longer_chain: 1856ms
add_adder_longer_chain: 1825ms

J:\My Files\Programming\fasmw16738\Codes>leaaddbench.exe
lea_adder: 1856ms
add_adder: 1841ms
lea_adder_longer_chain: 1857ms
add_adder_longer_chain: 1841ms
    


So they are around the same. I would work out the standard deviation but I aint that bored.

Gawd damn thing, I had to log in as an administrator and disable my AV to get this to work. I'll maybe have a test with my PS3 later if I can be bothered (just I'll have to configure it all).
Post 24 May 2009, 21:20
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 24 May 2009, 21:55
Seems that Intel does not exhibit the same behavior then.

Madis wrote:

If I changed the line LEA EBX,[EBX+EAX+1] to LEA EBX,[EBX+EAX] then the bench showed:

But have you changed the "mov eax, 1" to "mov eax, 2" to keep the comparison fair?

[edit] If I change lea ocurrencies to "lea ebx, [ebx+eax]" and change ocurrencies of "mov eax, 1" to "mov eax, 2" I get this results:
Code:
C:\Documents and Settings\Hernan\Escritorio\Assembly>bench.exe
lea_adder: 5406ms
add_adder: 5406ms
lea_adder_longer_chain: 3782ms
add_adder_longer_chain: 3782ms

C:\Documents and Settings\Hernan\Escritorio\Assembly>bench.exe
lea_adder: 5406ms
add_adder: 5406ms
lea_adder_longer_chain: 3781ms
add_adder_longer_chain: 3781ms

C:\Documents and Settings\Hernan\Escritorio\Assembly>bench.exe
lea_adder: 5391ms
add_adder: 5391ms
lea_adder_longer_chain: 3781ms
add_adder_longer_chain: 3781ms    
[/edit]
Post 24 May 2009, 21:55
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20486
Location: In your JS exploiting you and your system
revolution 25 May 2009, 01:24
All of those tests are artificial. None of those results will help you in a real program.
Post 25 May 2009, 01:24
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.