flat assembler
Message board for the users of flat assembler.
Index
> Main > add eax,ebx VS lea eax[esi+ebx] |
Author |
|
rCX 12 Aug 2007, 17:34
I'm not sure myself, but I think it's important to point out that the C code modifies eax, ebx and esi while the 2nd one modifies only eax and ebx. If esi contained a value you want to keep you would have to put it somewhere else, increasing the number instructions needed.
Edit: maybe you could use something like this... Code: mov esi,2 mov ebx,3 lea ebx,[esi+ebx] (If anyone knows where to find documentation on the number of clock ticks per instruction please post it. I've been looking for one for a long time and have not been able to find one ) |
|||
12 Aug 2007, 17:34 |
|
MazeGen 13 Aug 2007, 08:15
Ozzy, don't bother yourself with ADD vs. LEA speed. This hardly speeds up your algorithm.
rCX wrote:
http://www.agner.org/optimize/#manuals instruction_tables.pdf |
|||
13 Aug 2007, 08:15 |
|
f0dder 13 Aug 2007, 08:27
OzzY: for your code sequence, the compiler should generate neither ADD nor LEA, it should simply move the constant 5 into the destination. I guess a reason to use three registers could be to avoid a dependency.
Anyway, to avoid the it's-a-constant optimization, let's see what code VC2005sp1 generates for this code: Code: extern int x, y, z; int main() { z = x + y; return z; } With /Ox optimization, it turns into the following: Code: mov eax, DWORD PTR ?x@@3HA ; x mov ecx, DWORD PTR ?y@@3HA ; y add eax, ecx mov DWORD PTR ?z@@3HA, eax ; z ret 0 Instead of saying C compilers usually get this code, you should really state which compiler that generates the code. |
|||
13 Aug 2007, 08:27 |
|
Madis731 13 Aug 2007, 11:15
I've got instriction_tables.pdf printed out ...for Core 2 only...for now
I think Agner's site has been mentioned before repeatedly on these boards!!! |
|||
13 Aug 2007, 11:15 |
|
LocoDelAssembly 13 Aug 2007, 15:47
BTW, Ozzy's code is just this?
Code: void main(){int x=2; int y=3; z=x+y; } Because if there is more code below then the reason could be that X and Y will be used later. Look the registers that the compiler choosed for those variables, EBX and ESI, those are preserved across calls. |
|||
13 Aug 2007, 15:47 |
|
rCX 13 Aug 2007, 22:11
I was looking at the instruction table. Are "uops" or "latency" the best measure of an instruction's speed?
|
|||
13 Aug 2007, 22:11 |
|
f0dder 13 Aug 2007, 23:20
LocoDelAssembly: good question - and another good question would be which compiler + settings he used
For the code snippet you posted, assuming that 'z' is an extern, any decent compiler will simply generate a "mov [z], 5". |
|||
13 Aug 2007, 23:20 |
|
OzzY 14 Aug 2007, 02:25
I really meant that code to be wrapped in a function. Not already declared x and y.
Example: Code: int add(int x, int y) { return x+y; } |
|||
14 Aug 2007, 02:25 |
|
f0dder 14 Aug 2007, 11:36
Interesting, that code snippet generates the following assembly with VC2005:
Code: mov eax, DWORD PTR _y$[esp-4] mov ecx, DWORD PTR _x$[esp-4] add eax, ecx Probably a compiler heuristic that says "load arguments to register before using"? |
|||
14 Aug 2007, 11:36 |
|
OzzY 14 Aug 2007, 14:29
Code: int add(int x, int y) { return x+y; } From this, Pelles C generates this: Code: [global _add] [section .text] #line 1 "test.c" [function _add] _add: push ebp mov ebp,esp mov eax,dword [ebp+(8)] mov edx,dword [ebp+(12)] lea eax,[edx+eax] @1: pop ebp ret ..?X_add: [section .drectve] db " -defaultlib:crt" [cpu pentium] using Quote: pocc -Tx86-asm test.c -Fotest.asm But it I use -Os to optimize for size, it generates: Code: [global _add] [section .text] #line 1 "test.c" [function _add] _add: [fpo _add, ..?X_add-_add, 0, 2, 0, 0, 0, 0] mov eax,dword [esp+(4)] mov edx,dword [esp+(8)] add eax,edx @1: ret ..?X_add: [section .drectve] db " -defaultlib:crt" [cpu pentium] With -Ot (for speed) it generates the same code of -Os. GCC always generates this: Code: .file "test.c" .text .globl _add .def _add; .scl 2; .type 32; .endef _add: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax addl 8(%ebp), %eax popl %ebp ret |
|||
14 Aug 2007, 14:29 |
|
OzzY 14 Aug 2007, 14:36
If I use 3 numbers (int x, int y, int z), Pelles C still tries to keep everything on different registers:
Code: [global _add] [section .text] #line 1 "test.c" [function _add] _add: [fpo _add, ..?X_add-_add, 0, 3, 0, 0, 0, 0] mov eax,dword [esp+(4)] mov edx,dword [esp+(8)] add eax,edx mov edx,dword [esp+(12)] add eax,edx @1: ret ..?X_add: [align 16] [section .drectve] db " -defaultlib:crt" [cpu ppro] While GCC, adds directly from memory to the register: Code: .file "test.c" .text .globl _add .def _add; .scl 2; .type 32; .endef _add: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax addl 8(%ebp), %eax addl 16(%ebp), %eax popl %ebp ret I wonder which is faster? Add directly from memory to EAX or MOV everything to registers and then Add? I suppose on that sample, GCC wins, right? But if we would keep adding numbers in a loop, then it would be better to keep them in registers, right? |
|||
14 Aug 2007, 14:36 |
|
f0dder 15 Aug 2007, 11:21
Silly that GCC sets up a stack frame since no locals are used...
Personally I see no reason to move into registers for this code snippet, so I'm guessing that it's a generic compiler heuristic. Makes sense if the arguments are reused or if they're pointers (you need to load to register then, for indirection), and I guess most code are closer to that than this silly little snippet |
|||
15 Aug 2007, 11:21 |
|
LocoDelAssembly 15 Aug 2007, 13:33
Ozzy probably forgots to use proper params, gcc does suppress it, just use "-fomit-frame-pointer" (I don't remember if that param is including in -o3, -o2 or -o1).
|
|||
15 Aug 2007, 13:33 |
|
f0dder 15 Aug 2007, 14:27
Loco: then it oppresses it for all functions though, including those with local variables... which might not always be what you want...
|
|||
15 Aug 2007, 14:27 |
|
LocoDelAssembly 15 Aug 2007, 15:03
http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/Optimize-Options.html#Optimize-Options wrote: -fomit-frame-pointer Also says Quote: -O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging. |
|||
15 Aug 2007, 15:03 |
|
Rahsennor 21 Aug 2007, 09:27
Quote: -O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging. It interferes with debugging on a x86, so you need to turn it on manually. |
|||
21 Aug 2007, 09:27 |
|
xspeed 21 Aug 2007, 14:41
what kind of debugging program are you talking about?
You should check randall hyde book on instruction set chapter. He discussed the adv/dis for lea over other instructions like mov/add. Not sure what the book version. |
|||
21 Aug 2007, 14:41 |
|
Hayden 26 Aug 2007, 19:54
you would think that a good compiler would do something like this:
Code: ; sample compiler output for z = y + z mov eax, [ds:y_var] add [ds:z_var], eax ; eax destroyed I suppose it all depends on the compiler anti agi stall stratergy... _________________ New User.. Hayden McKay. |
|||
26 Aug 2007, 19:54 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.