Error when using the lea instruction?

Index > Main > Error when using the lea instruction?

Goto page Previous 1, 2

Author

Thread

Mino

Joined: 14 Jan 2018
Posts: 163

Mino 24 Aug 2018, 12:27

The optimization option is simple -O. For the rest, I don't know. I'm just a simple user of this tool, sorry.

24 Aug 2018, 12:27

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 24 Aug 2018, 15:07

-O is not optimized. Not even -O1 should be considered as optimized.

You need at least -O2 (but there's also -O3 and -Ofast which is even more aggressive with float rules and such)

24 Aug 2018, 15:07

DimonSoft

Joined: 03 Mar 2010
Posts: 1228
Location: Belarus

DimonSoft 24 Aug 2018, 16:36

Furs wrote:

-O is not optimized. Not even -O1 should be considered as optimized.

You need at least -O2 (but there's also -O3 and -Ofast which is even more aggressive with float rules and such)

And even then I doubt the tool doesn’t perform any modifications. After all it has to show the disasembly of the procedure so might add some stuff to prevent inlining. Or, judging from the registers used to pass parameters, it might just be a single line of code written as if it was a procedure.

24 Aug 2018, 16:36

Mino

Joined: 14 Jan 2018
Posts: 163

Mino 24 Aug 2018, 19:07

Try the tool yourself if you don't believe me.

-O (with this tool) performs a maximum level optimization of the code, whether you set -O2 or -O3, the result is the same.

24 Aug 2018, 19:07

DimonSoft

Joined: 03 Mar 2010
Posts: 1228
Location: Belarus

DimonSoft 24 Aug 2018, 19:55

Mino wrote:

Try the tool yourself if you don't believe me.

-O (with this tool) performs a maximum level optimization of the code, whether you set -O2 or -O3, the result is the same.

I do not “believe” or “don’t believe”. I just say that the experiment is almost definitely affected by the tool itself.

24 Aug 2018, 19:55

Mino

Joined: 14 Jan 2018
Posts: 163

Mino 24 Aug 2018, 20:17

It is possible.
Still, it stays, I find, very good and practical.

24 Aug 2018, 20:17

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 25 Aug 2018, 15:55

Mino wrote:

Try the tool yourself if you don't believe me.

-O (with this tool) performs a maximum level optimization of the code, whether you set -O2 or -O3, the result is the same.

Can you copy-paste the code here, godbolt doesn't work for me right now.

But you know there's a difference between making the function and actually calling it right? If it's not static or internal linkage, then it will have to emit the function despite the fact it inlines calls to it.

Make the function static, use -O2 and behold.

25 Aug 2018, 15:55

rugxulo

Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)

rugxulo 26 Aug 2018, 02:01

GCC has -finline-functions (which is only auto-enabled at -O3). Of course, it has its own arbitrary heuristics for determining whether that is feasible or not.

26 Aug 2018, 02:01

Mino

Joined: 14 Jan 2018
Posts: 163

Mino 26 Aug 2018, 11:37

GCC optimizes to such an extent that it directly replaces the data in the program if it is known, even if it means not generating unnecessary code.
For example :

Code:

static int add(int a, int b) {
    return a + b;
}

int main() {
    int foo = add(8, 2);
}

give just (with -O2) :

Code:

main:
        rep ret

To get a better idea of how to optimize the 'add' function, I used the parameters of the main function:

Code:

static int add(int a, int b) {
    return a + b;
}

int main(int argc) {
    return add(argc, 2);
}

give (always with the optimization option) :

Code:

main:
        lea     eax, [rdi+2]
        ret

It also seems that GCC no longer takes into account the concept of "stack" in the generated code. Example :

Code:

static int fibs(int n) {
    return ((n == 0 || n == 1) ? n : (fibs(n - 1) + fibs(n - 2)));
}

int main() {
  int n = 20;
  int fib[n];
  for (int i = 0; i <= n; ++i) fib[i] = fibs(i);
  return 0;
}

Give :

Code:

fibs:
        cmp     edi, 1
        push    r12
        mov     r12d, edi
        push    rbp
        push    rbx
        jbe     .L4
        mov     ebx, edi
        xor     ebp, ebp
.L3:
        lea     edi, [rbx-1]
        sub     ebx, 2
        call    fibs
        add     ebp, eax
        cmp     ebx, 1
        ja      .L3
        and     r12d, 1
.L2:
        lea     eax, [rbp+0+r12]
        pop     rbx
        pop     rbp
        pop     r12
        ret
.L4:
        xor     ebp, ebp
        jmp     .L2
main:
        push    rbx
        xor     ebx, ebx
.L9:
        mov     edi, ebx
        add     ebx, 1
        call    fibs
        cmp     ebx, 21
        jne     .L9
        xor     eax, eax
        pop     rbx
        ret

Without optimization:

Code:

fibs:
        push    rbp
        mov     rbp, rsp
        push    rbx
        sub     rsp, 24
        mov     DWORD PTR [rbp-20], edi
        cmp     DWORD PTR [rbp-20], 0
        je      .L2
        cmp     DWORD PTR [rbp-20], 1
        je      .L2
        mov     eax, DWORD PTR [rbp-20]
        sub     eax, 1
        mov     edi, eax
        call    fibs
        mov     ebx, eax
        mov     eax, DWORD PTR [rbp-20]
        sub     eax, 2
        mov     edi, eax
        call    fibs
        add     eax, ebx
        jmp     .L3
.L2:
        mov     eax, DWORD PTR [rbp-20]
.L3:
        add     rsp, 24
        pop     rbx
        pop     rbp
        ret
main:
        push    rbp
        mov     rbp, rsp
        push    rbx
        sub     rsp, 40
        mov     rax, rsp
        mov     rbx, rax
        mov     DWORD PTR [rbp-24], 20
        mov     eax, DWORD PTR [rbp-24]
        movsx   rdx, eax
        sub     rdx, 1
        mov     QWORD PTR [rbp-32], rdx
        movsx   rdx, eax
        mov     r8, rdx
        mov     r9d, 0
        movsx   rdx, eax
        mov     rsi, rdx
        mov     edi, 0
        cdqe
        sal     rax, 2
        lea     rdx, [rax+3]
        mov     eax, 16
        sub     rax, 1
        add     rax, rdx
        mov     edi, 16
        mov     edx, 0
        div     rdi
        imul    rax, rax, 16
        sub     rsp, rax
        mov     rax, rsp
        add     rax, 3
        shr     rax, 2
        sal     rax, 2
        mov     QWORD PTR [rbp-40], rax
        mov     DWORD PTR [rbp-20], 0
        jmp     .L6
.L7:
        mov     eax, DWORD PTR [rbp-20]
        mov     edi, eax
        call    fibs
        mov     ecx, eax
        mov     rax, QWORD PTR [rbp-40]
        mov     edx, DWORD PTR [rbp-20]
        movsx   rdx, edx
        mov     DWORD PTR [rax+rdx*4], ecx
        add     DWORD PTR [rbp-20], 1
.L6:
        mov     eax, DWORD PTR [rbp-20]
        cmp     eax, DWORD PTR [rbp-24]
        jle     .L7
        mov     eax, 0
        mov     rsp, rbx
        mov     rbx, QWORD PTR [rbp-8]
        leave
        ret

[/code]

_________________
The best way to predict the future is to invent it.

26 Aug 2018, 11:37

DimonSoft

Joined: 03 Mar 2010
Posts: 1228
Location: Belarus

DimonSoft 26 Aug 2018, 15:20

Mino wrote:

It also seems that GCC no longer takes into account the concept of "stack" in the generated code. <…>

Code:

fibs:
        cmp     edi, 1
        push    r12
        mov     r12d, edi
        push    rbp
        push    rbx
        jbe     .L4
        mov     ebx, edi
        xor     ebp, ebp
.L3:
        lea     edi, [rbx-1]
        sub     ebx, 2
        call    fibs
        add     ebp, eax
        cmp     ebx, 1
        ja      .L3
        and     r12d, 1
.L2:
        lea     eax, [rbp+0+r12]
        pop     rbx
        pop     rbp
        pop     r12
        ret
.L4:
        xor     ebp, ebp
        jmp     .L2
main:
        push    rbx
        xor     ebx, ebx
.L9:
        mov     edi, ebx
        add     ebx, 1
        call    fibs
        cmp     ebx, 21
        jne     .L9
        xor     eax, eax
        pop     rbx
        ret

Funny that students who miss nearly all my lectures on assembly programming tend to write the same spaghetti code. They also tend to blame Basic for some reason. I “love” modern compilers.

26 Aug 2018, 15:20

Mino

Joined: 14 Jan 2018
Posts: 163

Mino 26 Aug 2018, 16:49

It's a code, certainly smaller, but less practical to read it seems to me.

DimonSoft wrote:

Funny that students who miss nearly all my lectures on assembly programming tend to write the same spaghetti code. They also tend to blame Basic for some reason. I “love” modern compilers.

I'm not sure I understand the meaning of this reflection Smile

26 Aug 2018, 16:49

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 26 Aug 2018, 16:51

Mino wrote:

GCC optimizes to such an extent that it directly replaces the data in the program if it is known, even if it means not generating unnecessary code.
For example :
Code:
static int add(int a, int b) {
    return a + b;
}

int main() {
    int foo = add(8, 2);
}
    
give just (with -O2) :
Code:
main:
        rep ret
    

Of course it does, your main function doesn't do anything. The add function is useless since the return value is not used. So it's just a "ret". (the rep is useless since it's tuning for an old architecture AFAIK)

The lea in your example is already inlined: it's part of main, not add so it's inlined already.

26 Aug 2018, 16:51

DimonSoft

Joined: 03 Mar 2010
Posts: 1228
Location: Belarus

DimonSoft 26 Aug 2018, 18:05

Mino wrote:

DimonSoft wrote:
Funny that students who miss nearly all my lectures on assembly programming tend to write the same spaghetti code. They also tend to blame Basic for some reason. I “love” modern compilers.

I'm not sure I understand the meaning of this reflection ?

I was just impressed by the spaghetti GCC produces. Jumps are generally slower than non-branching code. This doesn’t really relate to the main topic.

26 Aug 2018, 18:05

Goto page Previous 1, 2

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum