flat assembler
Message board for the users of flat assembler.
Index
> Main > Error when using the lea instruction? Goto page Previous 1, 2 |
Author |
|
Mino 24 Aug 2018, 12:27
The optimization option is simple -O. For the rest, I don't know. I'm just a simple user of this tool, sorry.
|
|||
24 Aug 2018, 12:27 |
|
DimonSoft 24 Aug 2018, 16:36
Furs wrote: -O is not optimized. Not even -O1 should be considered as optimized. And even then I doubt the tool doesn’t perform any modifications. After all it has to show the disasembly of the procedure so might add some stuff to prevent inlining. Or, judging from the registers used to pass parameters, it might just be a single line of code written as if it was a procedure. |
|||
24 Aug 2018, 16:36 |
|
Mino 24 Aug 2018, 19:07
Try the tool yourself if you don't believe me.
-O (with this tool) performs a maximum level optimization of the code, whether you set -O2 or -O3, the result is the same. |
|||
24 Aug 2018, 19:07 |
|
DimonSoft 24 Aug 2018, 19:55
Mino wrote: Try the tool yourself if you don't believe me. I do not “believe” or “don’t believe”. I just say that the experiment is almost definitely affected by the tool itself. |
|||
24 Aug 2018, 19:55 |
|
Mino 24 Aug 2018, 20:17
It is possible.
Still, it stays, I find, very good and practical. |
|||
24 Aug 2018, 20:17 |
|
Furs 25 Aug 2018, 15:55
Mino wrote: Try the tool yourself if you don't believe me. But you know there's a difference between making the function and actually calling it right? If it's not static or internal linkage, then it will have to emit the function despite the fact it inlines calls to it. Make the function static, use -O2 and behold. |
|||
25 Aug 2018, 15:55 |
|
rugxulo 26 Aug 2018, 02:01
GCC has -finline-functions (which is only auto-enabled at -O3). Of course, it has its own arbitrary heuristics for determining whether that is feasible or not.
|
|||
26 Aug 2018, 02:01 |
|
Mino 26 Aug 2018, 11:37
GCC optimizes to such an extent that it directly replaces the data in the program if it is known, even if it means not generating unnecessary code.
For example : Code: static int add(int a, int b) { return a + b; } int main() { int foo = add(8, 2); } give just (with -O2) : Code:
main:
rep ret
To get a better idea of how to optimize the 'add' function, I used the parameters of the main function: Code: static int add(int a, int b) { return a + b; } int main(int argc) { return add(argc, 2); } give (always with the optimization option) : Code: main: lea eax, [rdi+2] ret It also seems that GCC no longer takes into account the concept of "stack" in the generated code. Example : Code: static int fibs(int n) { return ((n == 0 || n == 1) ? n : (fibs(n - 1) + fibs(n - 2))); } int main() { int n = 20; int fib[n]; for (int i = 0; i <= n; ++i) fib[i] = fibs(i); return 0; } Give : Code: fibs: cmp edi, 1 push r12 mov r12d, edi push rbp push rbx jbe .L4 mov ebx, edi xor ebp, ebp .L3: lea edi, [rbx-1] sub ebx, 2 call fibs add ebp, eax cmp ebx, 1 ja .L3 and r12d, 1 .L2: lea eax, [rbp+0+r12] pop rbx pop rbp pop r12 ret .L4: xor ebp, ebp jmp .L2 main: push rbx xor ebx, ebx .L9: mov edi, ebx add ebx, 1 call fibs cmp ebx, 21 jne .L9 xor eax, eax pop rbx ret Without optimization: Code: fibs: push rbp mov rbp, rsp push rbx sub rsp, 24 mov DWORD PTR [rbp-20], edi cmp DWORD PTR [rbp-20], 0 je .L2 cmp DWORD PTR [rbp-20], 1 je .L2 mov eax, DWORD PTR [rbp-20] sub eax, 1 mov edi, eax call fibs mov ebx, eax mov eax, DWORD PTR [rbp-20] sub eax, 2 mov edi, eax call fibs add eax, ebx jmp .L3 .L2: mov eax, DWORD PTR [rbp-20] .L3: add rsp, 24 pop rbx pop rbp ret main: push rbp mov rbp, rsp push rbx sub rsp, 40 mov rax, rsp mov rbx, rax mov DWORD PTR [rbp-24], 20 mov eax, DWORD PTR [rbp-24] movsx rdx, eax sub rdx, 1 mov QWORD PTR [rbp-32], rdx movsx rdx, eax mov r8, rdx mov r9d, 0 movsx rdx, eax mov rsi, rdx mov edi, 0 cdqe sal rax, 2 lea rdx, [rax+3] mov eax, 16 sub rax, 1 add rax, rdx mov edi, 16 mov edx, 0 div rdi imul rax, rax, 16 sub rsp, rax mov rax, rsp add rax, 3 shr rax, 2 sal rax, 2 mov QWORD PTR [rbp-40], rax mov DWORD PTR [rbp-20], 0 jmp .L6 .L7: mov eax, DWORD PTR [rbp-20] mov edi, eax call fibs mov ecx, eax mov rax, QWORD PTR [rbp-40] mov edx, DWORD PTR [rbp-20] movsx rdx, edx mov DWORD PTR [rax+rdx*4], ecx add DWORD PTR [rbp-20], 1 .L6: mov eax, DWORD PTR [rbp-20] cmp eax, DWORD PTR [rbp-24] jle .L7 mov eax, 0 mov rsp, rbx mov rbx, QWORD PTR [rbp-8] leave ret [/code] _________________ The best way to predict the future is to invent it. |
|||
26 Aug 2018, 11:37 |
|
DimonSoft 26 Aug 2018, 15:20
Mino wrote: It also seems that GCC no longer takes into account the concept of "stack" in the generated code. <…> Funny that students who miss nearly all my lectures on assembly programming tend to write the same spaghetti code. They also tend to blame Basic for some reason. I “love” modern compilers. |
|||
26 Aug 2018, 15:20 |
|
Mino 26 Aug 2018, 16:49
It's a code, certainly smaller, but less practical to read it seems to me.
DimonSoft wrote: Funny that students who miss nearly all my lectures on assembly programming tend to write the same spaghetti code. They also tend to blame Basic for some reason. I “love” modern compilers. I'm not sure I understand the meaning of this reflection ? |
|||
26 Aug 2018, 16:49 |
|
Furs 26 Aug 2018, 16:51
Mino wrote: GCC optimizes to such an extent that it directly replaces the data in the program if it is known, even if it means not generating unnecessary code. The lea in your example is already inlined: it's part of main, not add so it's inlined already. |
|||
26 Aug 2018, 16:51 |
|
DimonSoft 26 Aug 2018, 18:05
Mino wrote:
I was just impressed by the spaghetti GCC produces. Jumps are generally slower than non-branching code. This doesn’t really relate to the main topic. |
|||
26 Aug 2018, 18:05 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.