flat assembler
Message board for the users of flat assembler.

flat assembler > Main > Error when using the lea instruction?

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
Mino



Joined: 14 Jan 2018
Posts: 160
The optimization option is simple -O. For the rest, I don't know. I'm just a simple user of this tool, sorry.
Post 24 Aug 2018, 12:27
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1315
-O is not optimized. Not even -O1 should be considered as optimized.

You need at least -O2 (but there's also -O3 and -Ofast which is even more aggressive with float rules and such)
Post 24 Aug 2018, 15:07
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 451
Location: Belarus
Furs wrote:
-O is not optimized. Not even -O1 should be considered as optimized.

You need at least -O2 (but there's also -O3 and -Ofast which is even more aggressive with float rules and such)

And even then I doubt the tool doesn’t perform any modifications. After all it has to show the disasembly of the procedure so might add some stuff to prevent inlining. Or, judging from the registers used to pass parameters, it might just be a single line of code written as if it was a procedure.
Post 24 Aug 2018, 16:36
View user's profile Send private message Visit poster's website Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 160
Try the tool yourself if you don't believe me.

-O (with this tool) performs a maximum level optimization of the code, whether you set -O2 or -O3, the result is the same.
Post 24 Aug 2018, 19:07
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 451
Location: Belarus
Mino wrote:
Try the tool yourself if you don't believe me.

-O (with this tool) performs a maximum level optimization of the code, whether you set -O2 or -O3, the result is the same.

I do not “believe” or “don’t believe”. I just say that the experiment is almost definitely affected by the tool itself.
Post 24 Aug 2018, 19:55
View user's profile Send private message Visit poster's website Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 160
It is possible.
Still, it stays, I find, very good and practical.
Post 24 Aug 2018, 20:17
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1315
Mino wrote:
Try the tool yourself if you don't believe me.

-O (with this tool) performs a maximum level optimization of the code, whether you set -O2 or -O3, the result is the same.
Can you copy-paste the code here, godbolt doesn't work for me right now.

But you know there's a difference between making the function and actually calling it right? If it's not static or internal linkage, then it will have to emit the function despite the fact it inlines calls to it.

Make the function static, use -O2 and behold.
Post 25 Aug 2018, 15:55
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2311
Location: Usono (aka, USA)
GCC has -finline-functions (which is only auto-enabled at -O3). Of course, it has its own arbitrary heuristics for determining whether that is feasible or not.
Post 26 Aug 2018, 02:01
View user's profile Send private message Visit poster's website Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 160
GCC optimizes to such an extent that it directly replaces the data in the program if it is known, even if it means not generating unnecessary code.
For example :

Code:
static int add(int a, int b) {
    return a + b;
}

int main() {
    int foo = add(8, 2);
}
    


give just (with -O2) :

Code:
main:
        rep ret
    


To get a better idea of how to optimize the 'add' function, I used the parameters of the main function:

Code:
static int add(int a, int b) {
    return a + b;
}

int main(int argc) {
    return add(argc, 2);
}
    


give (always with the optimization option) :

Code:
main:
        lea     eax, [rdi+2]
        ret
    


It also seems that GCC no longer takes into account the concept of "stack" in the generated code. Example :

Code:
static int fibs(int n) {
    return ((n == 0 || n == 1) ? n : (fibs(n - 1) + fibs(n - 2)));
}

int main() {
  int n = 20;
  int fib[n];
  for (int i = 0; i <= n; ++i) fib[i] = fibs(i);
  return 0;
}
    


Give :

Code:
fibs:
        cmp     edi, 1
        push    r12
        mov     r12d, edi
        push    rbp
        push    rbx
        jbe     .L4
        mov     ebx, edi
        xor     ebp, ebp
.L3:
        lea     edi, [rbx-1]
        sub     ebx, 2
        call    fibs
        add     ebp, eax
        cmp     ebx, 1
        ja      .L3
        and     r12d, 1
.L2:
        lea     eax, [rbp+0+r12]
        pop     rbx
        pop     rbp
        pop     r12
        ret
.L4:
        xor     ebp, ebp
        jmp     .L2
main:
        push    rbx
        xor     ebx, ebx
.L9:
        mov     edi, ebx
        add     ebx, 1
        call    fibs
        cmp     ebx, 21
        jne     .L9
        xor     eax, eax
        pop     rbx
        ret
    


Without optimization:

Code:
fibs:
        push    rbp
        mov     rbp, rsp
        push    rbx
        sub     rsp, 24
        mov     DWORD PTR [rbp-20], edi
        cmp     DWORD PTR [rbp-20], 0
        je      .L2
        cmp     DWORD PTR [rbp-20], 1
        je      .L2
        mov     eax, DWORD PTR [rbp-20]
        sub     eax, 1
        mov     edi, eax
        call    fibs
        mov     ebx, eax
        mov     eax, DWORD PTR [rbp-20]
        sub     eax, 2
        mov     edi, eax
        call    fibs
        add     eax, ebx
        jmp     .L3
.L2:
        mov     eax, DWORD PTR [rbp-20]
.L3:
        add     rsp, 24
        pop     rbx
        pop     rbp
        ret
main:
        push    rbp
        mov     rbp, rsp
        push    rbx
        sub     rsp, 40
        mov     rax, rsp
        mov     rbx, rax
        mov     DWORD PTR [rbp-24], 20
        mov     eax, DWORD PTR [rbp-24]
        movsx   rdx, eax
        sub     rdx, 1
        mov     QWORD PTR [rbp-32], rdx
        movsx   rdx, eax
        mov     r8, rdx
        mov     r9d, 0
        movsx   rdx, eax
        mov     rsi, rdx
        mov     edi, 0
        cdqe
        sal     rax, 2
        lea     rdx, [rax+3]
        mov     eax, 16
        sub     rax, 1
        add     rax, rdx
        mov     edi, 16
        mov     edx, 0
        div     rdi
        imul    rax, rax, 16
        sub     rsp, rax
        mov     rax, rsp
        add     rax, 3
        shr     rax, 2
        sal     rax, 2
        mov     QWORD PTR [rbp-40], rax
        mov     DWORD PTR [rbp-20], 0
        jmp     .L6
.L7:
        mov     eax, DWORD PTR [rbp-20]
        mov     edi, eax
        call    fibs
        mov     ecx, eax
        mov     rax, QWORD PTR [rbp-40]
        mov     edx, DWORD PTR [rbp-20]
        movsx   rdx, edx
        mov     DWORD PTR [rax+rdx*4], ecx
        add     DWORD PTR [rbp-20], 1
.L6:
        mov     eax, DWORD PTR [rbp-20]
        cmp     eax, DWORD PTR [rbp-24]
        jle     .L7
        mov     eax, 0
        mov     rsp, rbx
        mov     rbx, QWORD PTR [rbp-8]
        leave
        ret
    


[/code]

_________________
The best way to predict the future is to invent it.
Post 26 Aug 2018, 11:37
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 451
Location: Belarus
Mino wrote:
It also seems that GCC no longer takes into account the concept of "stack" in the generated code. <…>
Code:
fibs:
        cmp     edi, 1
        push    r12
        mov     r12d, edi
        push    rbp
        push    rbx
        jbe     .L4
        mov     ebx, edi
        xor     ebp, ebp
.L3:
        lea     edi, [rbx-1]
        sub     ebx, 2
        call    fibs
        add     ebp, eax
        cmp     ebx, 1
        ja      .L3
        and     r12d, 1
.L2:
        lea     eax, [rbp+0+r12]
        pop     rbx
        pop     rbp
        pop     r12
        ret
.L4:
        xor     ebp, ebp
        jmp     .L2
main:
        push    rbx
        xor     ebx, ebx
.L9:
        mov     edi, ebx
        add     ebx, 1
        call    fibs
        cmp     ebx, 21
        jne     .L9
        xor     eax, eax
        pop     rbx
        ret    

Funny that students who miss nearly all my lectures on assembly programming tend to write the same spaghetti code. They also tend to blame Basic for some reason. I “love” modern compilers.
Post 26 Aug 2018, 15:20
View user's profile Send private message Visit poster's website Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 160
It's a code, certainly smaller, but less practical to read it seems to me.

DimonSoft wrote:
Funny that students who miss nearly all my lectures on assembly programming tend to write the same spaghetti code. They also tend to blame Basic for some reason. I “love” modern compilers.


I'm not sure I understand the meaning of this reflection Smile ?
Post 26 Aug 2018, 16:49
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1315
Mino wrote:
GCC optimizes to such an extent that it directly replaces the data in the program if it is known, even if it means not generating unnecessary code.
For example :

Code:
static int add(int a, int b) {
    return a + b;
}

int main() {
    int foo = add(8, 2);
}
    


give just (with -O2) :

Code:
main:
        rep ret
    
Of course it does, your main function doesn't do anything. The add function is useless since the return value is not used. So it's just a "ret". (the rep is useless since it's tuning for an old architecture AFAIK)

The lea in your example is already inlined: it's part of main, not add so it's inlined already.
Post 26 Aug 2018, 16:51
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 451
Location: Belarus
Mino wrote:
DimonSoft wrote:
Funny that students who miss nearly all my lectures on assembly programming tend to write the same spaghetti code. They also tend to blame Basic for some reason. I “love” modern compilers.


I'm not sure I understand the meaning of this reflection Smile ?

I was just impressed by the spaghetti GCC produces. Jumps are generally slower than non-branching code. This doesn’t really relate to the main topic.
Post 26 Aug 2018, 18:05
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2018, Tomasz Grysztar.

Powered by rwasa.