flat assembler
Message board for the users of flat assembler.

Index > Main > add eax,ebx VS lea eax[esi+ebx]

Author
Thread Post new topic Reply to topic
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY
which is faster?
C compilers usually get this code:
Code:
int x=2;
int y=3;
z=x+y;
    

into this code:
Code:
mov esi,2
mov ebx,3
lea eax,[esi+ebx]
    

is it really faster then:
Code:
mov eax,2
mov ebx,3
add eax,ebx
    

?

Thanks
Post 12 Aug 2007, 05:06
View user's profile Send private message Reply with quote
rCX



Joined: 29 Jul 2007
Posts: 166
Location: Maryland, USA
rCX
I'm not sure myself, but I think it's important to point out that the C code modifies eax, ebx and esi while the 2nd one modifies only eax and ebx. If esi contained a value you want to keep you would have to put it somewhere else, increasing the number instructions needed.

Edit: maybe you could use something like this...
Code:
mov esi,2 
mov ebx,3 
lea ebx,[esi+ebx] 
    


(If anyone knows where to find documentation on the number of clock ticks per instruction please post it. I've been looking for one for a long time and have not been able to find one Sad )
Post 12 Aug 2007, 17:34
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
Ozzy, don't bother yourself with ADD vs. LEA speed. This hardly speeds up your algorithm.

rCX wrote:

(If anyone knows where to find documentation on the number of clock ticks per instruction please post it. I've been looking for one for a long time and have not been able to find one Sad )

http://www.agner.org/optimize/#manuals

instruction_tables.pdf
Post 13 Aug 2007, 08:15
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
OzzY: for your code sequence, the compiler should generate neither ADD nor LEA, it should simply move the constant 5 into the destination. I guess a reason to use three registers could be to avoid a dependency.

Anyway, to avoid the it's-a-constant optimization, let's see what code VC2005sp1 generates for this code:
Code:
extern int x, y, z;
int main()
{
  z = x + y;
  return z;
}
    

With /Ox optimization, it turns into the following:
Code:
mov       eax, DWORD PTR ?x@@3HA                  ; x
mov      ecx, DWORD PTR ?y@@3HA                  ; y
add      eax, ecx
mov DWORD PTR ?z@@3HA, eax                  ; z
ret 0
    


Instead of saying C compilers usually get this code, you should really state which compiler that generates the code.
Post 13 Aug 2007, 08:27
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
I've got instriction_tables.pdf printed out Very Happy ...for Core 2 only...for now Smile

I think Agner's site has been mentioned before repeatedly on these boards!!!
Post 13 Aug 2007, 11:15
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
BTW, Ozzy's code is just this?
Code:
void main(){int x=2; 
int y=3; 
z=x+y;
} 
    


Because if there is more code below then the reason could be that X and Y will be used later. Look the registers that the compiler choosed for those variables, EBX and ESI, those are preserved across calls.
Post 13 Aug 2007, 15:47
View user's profile Send private message Reply with quote
rCX



Joined: 29 Jul 2007
Posts: 166
Location: Maryland, USA
rCX
I was looking at the instruction table. Are "uops" or "latency" the best measure of an instruction's speed?
Post 13 Aug 2007, 22:11
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
LocoDelAssembly: good question - and another good question would be which compiler + settings he used Smile

For the code snippet you posted, assuming that 'z' is an extern, any decent compiler will simply generate a "mov [z], 5".
Post 13 Aug 2007, 23:20
View user's profile Send private message Visit poster's website Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY
I really meant that code to be wrapped in a function. Not already declared x and y.

Example:
Code:
int add(int x, int y)
{
   return x+y;
}    
Post 14 Aug 2007, 02:25
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Interesting, that code snippet generates the following assembly with VC2005:
Code:
mov   eax, DWORD PTR _y$[esp-4]
mov        ecx, DWORD PTR _x$[esp-4]
add        eax, ecx
    


Probably a compiler heuristic that says "load arguments to register before using"?
Post 14 Aug 2007, 11:36
View user's profile Send private message Visit poster's website Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY
Code:
int add(int x, int y) 
{ 
   return x+y; 
}
    

From this, Pelles C generates this:
Code:
[global _add]
[section .text]
#line 1 "test.c"
[function _add]
_add:
push ebp
mov ebp,esp
mov eax,dword [ebp+(8)]
mov edx,dword [ebp+(12)]
lea eax,[edx+eax]
@1:
pop ebp
ret
..?X_add:
[section .drectve]
db " -defaultlib:crt"
[cpu pentium]    

using
Quote:
pocc -Tx86-asm test.c -Fotest.asm


But it I use -Os to optimize for size, it generates:
Code:
[global _add]
[section .text]
#line 1 "test.c"
[function _add]
_add:
[fpo _add, ..?X_add-_add, 0, 2, 0, 0, 0, 0]
mov eax,dword [esp+(4)]
mov edx,dword [esp+(8)]
add eax,edx
@1:
ret
..?X_add:
[section .drectve]
db " -defaultlib:crt"
[cpu pentium]
    


With -Ot (for speed) it generates the same code of -Os.

GCC always generates this:
Code:
   .file   "test.c"
  .text
.globl _add
        .def    _add;   .scl    2;      .type   32;     .endef
_add:
 pushl   %ebp
        movl    %esp, %ebp
  movl    12(%ebp), %eax
      addl    8(%ebp), %eax
       popl    %ebp
        ret
    
Post 14 Aug 2007, 14:29
View user's profile Send private message Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY
If I use 3 numbers (int x, int y, int z), Pelles C still tries to keep everything on different registers:
Code:
[global _add]
[section .text]
#line 1 "test.c"
[function _add]
_add:
[fpo _add, ..?X_add-_add, 0, 3, 0, 0, 0, 0]
mov eax,dword [esp+(4)]
mov edx,dword [esp+(8)]
add eax,edx
mov edx,dword [esp+(12)]
add eax,edx
@1:
ret
..?X_add:
[align 16]
[section .drectve]
db " -defaultlib:crt"
[cpu ppro]
    


While GCC, adds directly from memory to the register:
Code:
        .file   "test.c"
  .text
.globl _add
        .def    _add;   .scl    2;      .type   32;     .endef
_add:
 pushl   %ebp
        movl    %esp, %ebp
  movl    12(%ebp), %eax
      addl    8(%ebp), %eax
       addl    16(%ebp), %eax
      popl    %ebp
        ret
    


I wonder which is faster? Add directly from memory to EAX or MOV everything to registers and then Add? I suppose on that sample, GCC wins, right?
But if we would keep adding numbers in a loop, then it would be better to keep them in registers, right?
Post 14 Aug 2007, 14:36
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Silly that GCC sets up a stack frame since no locals are used...

Personally I see no reason to move into registers for this code snippet, so I'm guessing that it's a generic compiler heuristic. Makes sense if the arguments are reused or if they're pointers (you need to load to register then, for indirection), and I guess most code are closer to that than this silly little snippet Smile
Post 15 Aug 2007, 11:21
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Ozzy probably forgots to use proper params, gcc does suppress it, just use "-fomit-frame-pointer" (I don't remember if that param is including in -o3, -o2 or -o1).
Post 15 Aug 2007, 13:33
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Loco: then it oppresses it for all functions though, including those with local variables... which might not always be what you want...
Post 15 Aug 2007, 14:27
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/Optimize-Options.html#Optimize-Options wrote:
-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.
On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage.

Enabled at levels -O, -O2, -O3, -Os.


Also says
Quote:
-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.
Post 15 Aug 2007, 15:03
View user's profile Send private message Reply with quote
Rahsennor



Joined: 07 Jul 2007
Posts: 61
Rahsennor
Quote:
-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.


It interferes with debugging on a x86, so you need to turn it on manually. Very Happy
Post 21 Aug 2007, 09:27
View user's profile Send private message Reply with quote
xspeed



Joined: 16 Aug 2007
Posts: 22
xspeed
what kind of debugging program are you talking about?

You should check randall hyde book on instruction set chapter. He discussed the adv/dis for lea over other instructions like mov/add.
Not sure what the book version.
Post 21 Aug 2007, 14:41
View user's profile Send private message Reply with quote
Hayden



Joined: 06 Oct 2005
Posts: 132
Hayden
you would think that a good compiler would do something like this:
Code:
; sample compiler output for z = y + z

mov eax, [ds:y_var]
add [ds:z_var], eax    ; eax destroyed
    


I suppose it all depends on the compiler anti agi stall stratergy...

_________________
New User.. Hayden McKay.
Post 26 Aug 2007, 19:54
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.