flat assembler
Message board for the users of flat assembler.

Index > Linux > Surprised...!

Author
Thread Post new topic Reply to topic
flash



Joined: 11 Mar 2006
Posts: 55
Location: Cuba
flash 31 May 2010, 06:15
Hi!
I make some test last minutes to present on class an example of how assembly code runs faster than any other code. I try fasm ELF code like:
Code:
format ELF executable 3
entry start

segment readable executable

start:
        mov     ecx,1000000000
 CICLO: 
        fldpi
        fld     [r]
        fmul    st0,st0
        fmul    st0,st1
        fstp    [area]
        fwait
        loopd   CICLO

        mov     eax,1
        xor     ebx,ebx
        int     0x80

segment readable writeable

r       dd 1.25
area    dd 0.0
    

Against that C code:
Code:
void main()
{
        int i;
        float r=1.25;
        float area=0.0;
        float pi=3.14159265;
        for ( i=0 ; i<1000000000 ; i++)
        {
                area=pi*r*r;
        }
}
    

But... SURPRISELY!!!Shocked the C program is wide faster than assembly one. Why is this happening? Which kind of optimization is performed by gcc? How I can optimize de fasm code to achive at least equal performance than C code. Thanks in advance
Post 31 May 2010, 06:15
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20621
Location: In your JS exploiting you and your system
revolution 31 May 2010, 06:28
Get a disassembly listing of the C generated code. I expect you will find that the loop is completely eliminated from the code since it produces no output.

Try using something like this:
Code:
        for ( i=0 ; i<1000000000 ; i++)
        {
                area+=pi*r*r;
        }
        printf ("area=%f",area)
}    
You might also need to mark r or pi as volatile to force the compiler to generate the loop code.
Post 31 May 2010, 06:28
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 31 May 2010, 07:42
Still, I'm quite sure the C compiler is very likely to do better job than the assembly version, I don't think that, for instance, "area" won't gonna be registered, pi will be reloaded always and pi*r*r will be recalculated in each iteration instead of moving it at for "prolog". At least one of those inefficiencies GCC will optimize out.

PS: What I've said above is only valid for the case of consuming the for-loop output, using volatile variables could indeed make the compiler be beaten by the Assembly version (for instance, GCC may be forced to access R twice from before when multiplying instead of once as the Assembly code)

PS2: (last I hope) The Assembly version is using two fp registers per iteration but releasing only one. After enough iterations, the result will be just a NaN and the processor probably will take more time executing the FPU instructions because of that (besides the before mentioned inefficiencies). Note that you have to fix this for correctness sake as you are not producing a valid result.
Post 31 May 2010, 07:42
View user's profile Send private message Reply with quote
Endre



Joined: 29 Dec 2003
Posts: 215
Location: Budapest, Hungary
Endre 31 May 2010, 08:45
I don't know how you've compiled your c-file with gcc, but depending whether you use any optimization switches or not the result can be very different. For instance if you apply -O2 then nothing will be computed in run-time since "pi" and "r" are assumed to be constants (they don't change), thus the compiler eliminates the whole loop and the computation.

If you want to see what gcc generates then just use objdump -S my_file | less

On the other hand if you don't apply any optimization switch then it shan't be quicker than your assembly program.
If you really want to demonstrate how faster your assembly program is then try writing a c-code which makes the compiler believe that it is really a sane c-program and cannot be optimized to zero (as the compiler did with your current c-program).
Post 31 May 2010, 08:45
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.