flat assembler
Message board for the users of flat assembler.

Index > Main > fix and or optimize my FPU code ? (on going)

Author
Thread Post new topic Reply to topic
RedGhost



Joined: 18 May 2005
Posts: 443
Location: BC, Canada
RedGhost 11 Dec 2006, 20:32
Well I haven't coded a "pure asm" project in a while, and this one requires a heavy amount of FPU code. Needless to say in all the time I've coded in assembly I haven't used much FPU so I've basically been tabbing between fasmw and the Intel manuals. Some of these I am converting from C so I will post the C code below the assembly for those. I haven't been able to test some of these at run-time yet (I'm not that far in the project yet). But I would appreciate some help/input.

Angle Vectors ->
Code:

PITCH = 0
YAW   = 4
ROLL  = 8

; eax = angles
; ebx = forward
; ecx = right
; edx = up
angle_vectors:
    ; local floats
    sy equ ebp-$4
    cy equ ebp-$8
    sp equ ebp-$C                 ; probably should rename this one
    cp equ ebp-$10
    sr equ ebp-$14
    cr equ ebp+$18

    enter $18, $0                 ; storage for local floats

    push  $BF800000               ; -1.0f
    push  $3C8E8A72               ; PI/180 (0.0174f)

    ; angle = angles[YAW]*deg2rad
    fld   dword [eax+YAW]
    fmul  dword [esp]
    ; sy = sine( angle ) & cy = cosine( angle )
    fsincos                       ; cosine = st0, sine = st1
    fstp  dword [cy]
    fstp  dword [sy]

    ; angle = angles[PITCH]*deg2rad
    fld   dword [eax+PITCH]
    fmul  dword [esp]
    ; sp = sine( angle ) & cp = cosine( angle )
    fsincos
    fstp  dword [cp]
    fstp  dword [sp]

    ; angle = angles[ROLL]*deg2rad
    fld   dword [eax+ROLL]
    fmul  dword [esp]
    ; sr = sine( angle ) &  cr = cosine( angle )
    fsincos
    fstp  dword [cr]
    fstp  dword [sr]

    ; forward[0] = cp*cy
    fld   dword [cp]
    fst   st1                     ; save
    fmul  dword [cy]
    fstp  dword [ebx]             ; cp now in st0 again
    ; forward[1] = cp*sy
    fmul  dword [sy]
    fstp  dword [ebx+$4]
    ; forward[2] = -sp
    fld   dword [sp]
    fchs
    fstp  dword [ebx+$8]

    ; right[0] = -1*sr*sp*cy+-1*cr*-sy
    fld   dword [sy]
    fchs
    fmul  dword [cr]
    fmul  dword [esp+$4]
    fst   st1
    fld   dword [cy]
    fmul  dword [sp]
    fmul  dword [sr]
    fmul  dword [esp+$4]
    faddp st1, st0
    fstp  dword [ecx]
    ; right[1] = -1*sr*sp*sy+-1*cr*cy
    fld   dword [cy]
    fmul  dword [cr]
    fmul  dword [esp+$4]
    fst   st1
    fld   dword [sy]
    fmul  dword [sp]
    fmul  dword [sr]
    fmul  dword [esp+4]
    faddp st1, st0
    fstp  dword [ecx+$4]
    ;  right[2] = -1*sr*cp
    fld   dword [cp]
    fmul  dword [sr]
    fmul  dword [esp+$4]
    fstp  dword [ecx+$8]

    ; up[0] = cr*sp*cy+-sr*-sy
    fld   dword [sy]
    fmul  dword [sr]
    fst   st1                     ; negative*negative is the same as positive*positive ¿
    fld   dword [cy]
    fmul  dword [sp]
    fmul  dword [cr]
    fadd  st0, st1
    fstp  dword [edx]
    ; up[1] = cr*sp*sy+-sr*cy
    fld   dword [sr]
    fchs
    fmul  dword [cy]
    fst   st1
    fld   dword [sy]
    fmul  dword [sp]
    fmul  dword [cr]
    faddp st1, st0
    fstp  dword [edx+$4]
    ; up[2] = cr*cp
    fld   dword [cr]
    fmul  dword [cp]
    fstp  dword [edx+$8]

    add   esp, $8                 ; +$8 for deg2rad & -1
    leave
    ret

    restore sy, cy, sp, cp, sr, cr
;---                                    
    


and in C
Code:
static void AngleVectors( const vec3_t angles, vec3_t forward, vec3_t right, vec3_t up ) {
        float sr, sp, sy, cr, cp, cy, angle;

        angle = (float) angles[YAW]*0.0174f; //( (float) M_PI*2 / 360 );
        sy    = (float) sin( angle );
        cy    = (float) cos( angle );

        angle = (float) angles[PITCH]*0.0174f;
        sp    = (float) sin( angle );
        cp    = (float) cos( angle );

        angle = (float) angles[ROLL]*0.0174f;
        sr    = (float) sin( angle );
        cr    = (float) cos( angle );

        if ( forward ) {
                forward[0] = cp*cy;
                forward[1] = cp*sy;
                forward[2] = -sp;
        }
        
        if ( right ) {
                right[0] = (-1*sr*sp*cy+-1*cr*-sy);
                right[1] = (-1*sr*sp*sy+-1*cr*cy);
                right[2] = -1*sr*cp;
        }
        
        if ( up ) {
                up[0] = (cr*sp*cy+-sr*-sy);
                up[1] = (cr*sp*sy+-sr*cy);
                up[2] = cr*cp;
        }
}
//---
    


I set the C compiler to full optimizations and it was at least 20% more code, there was also no use of fsincos, fsin and fcos were used separately and the optimization guide recommends fsincos anyways. So if I get this working right it should end up being all around better.

Dot Product ->
Code:
; eax = x
; ebx = y
; returns product in st0
dot_product:
    fld   dword [eax]
    fmul  dword [ebx]
    fld   dword [eax+$4]
    fmul  dword [ebx+$4]
    fld   dword [eax+$8]
    fmul  dword [ebx+$8]
    faddp st1, st0
    faddp st1, st0

    ret
;--- 
    


Vector Subtract ->
Code:
; eax = a
; ebx = b
; ecx = c
vector_subtract:
    ; c[0] = a[0]-b[0]
    fld   dword [eax]
    fsub  dword [ebx]
    fstp  dword [ecx]
    ; c[1] = a[1]-b[1]
    fld   dword [eax+$4]
    fsub  dword [ebx+$4]
    fstp  dword [ecx+$4]
    ; c[2] = a[2]-[b]
    fld   dword [eax+$8]
    fsub  dword [ebx+$8]
    fstp  dword [ecx*$8]

    ret
;---  
    


Vector Length ->
Code:
; eax = v
; returns length in st0
vector_length:
    ; sqrt( (a[0]*a[0]) + (a[1]*a[1]) + (a[2]*a[2])
    fld   dword [eax]
    fmul  st0, st0
    fld   dword [eax+$4]
    fmul  st0, st0
    fld   dword [eax+$8]
    fmul  st0, st0
    faddp st1, st0
    faddp st1, st0
    fsqrt

    ret
;--- 
    

_________________
redghost.ca


Last edited by RedGhost on 13 Dec 2006, 19:09; edited 2 times in total
Post 11 Dec 2006, 20:32
View user's profile Send private message AIM Address MSN Messenger Reply with quote
Big Red



Joined: 25 Feb 2005
Posts: 43
Big Red 12 Dec 2006, 05:18
You need to go over:
Dot Product ->
Vector Subtract ->
Vector Length ->

In each of them you load values onto the stack without ever popping them. Use faddp/fstp. Example:

Code:
dot_product:
    fld  dword [eax]
    fmul dword [ebx]
    fld  dword [eax+$4]
    fmul dword [ebx+$4]
    fld  dword [eax+$8]
    fmul dword [ebx+$8]
    faddp st1, st0
    faddp st1, st0

; returns in st0
    ret    
Post 12 Dec 2006, 05:18
View user's profile Send private message Reply with quote
RedGhost



Joined: 18 May 2005
Posts: 443
Location: BC, Canada
RedGhost 12 Dec 2006, 06:38
Doh, I wasn't even considering keeping the FPU stack balanced.

_________________
redghost.ca
Post 12 Dec 2006, 06:38
View user's profile Send private message AIM Address MSN Messenger Reply with quote
Big Red



Joined: 25 Feb 2005
Posts: 43
Big Red 12 Dec 2006, 09:12
Yeah, I can't stand it. At some point I had written a sort of partial reverse-compiler to generate equations from pure asm FPU code and to detect over/underflows errors before compilation, but I was making so many FPU stack errors in the process that I figured it wasn't worth it and gave up.

Anyway, here's a bunch of random 3d procs, umm some along the lines of what you have there. It's missing definitions, but you get the idea, feel free to copy+paste if there's anything interesting. Also a 3dnow version for some.


Description: 3dnow 3d procs
Download
Filename: codeprocs.inc
Filesize: 11.19 KB
Downloaded: 375 Time(s)

Description: fpu 3d procs
Download
Filename: codeprocs.inc
Filesize: 22.68 KB
Downloaded: 409 Time(s)

Post 12 Dec 2006, 09:12
View user's profile Send private message Reply with quote
RedGhost



Joined: 18 May 2005
Posts: 443
Location: BC, Canada
RedGhost 12 Dec 2006, 19:18
Thanks for the examples, but I don't think I'll venture into MMX/3DNOW just quite yet. I will updated the first post with what (hopefully) doesn't rape the FPU the stack.

I don't know if it's an error with the forums or a mistake but you seem to have uploaded the same file twice (both download as the 3DNow versions, and the files are identical).

_________________
redghost.ca
Post 12 Dec 2006, 19:18
View user's profile Send private message AIM Address MSN Messenger Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 13 Dec 2006, 03:37
Most x86 processors (Built this century) have SSE and SSE2 instruction sets.
Doing math with XMMX instructions is much better
-Faster execution
-No annoying FPU stack
-Clearer code
-Using iterative approximations for Single precision SIN/COS/TAN functions can also be faster than the FPU opcodes.

Simple example:
Code:
;;;C = AX + BX ;;;C = X(A+B)
.data
X dd -9.0
A dd 20.0
B dd 5.0
C dd 0.0
.code
call Foo
ret 0
Foo:
movss xmm1,dword[A]
movss xmm0,dword[X]
addss xmm1,dword[B]
mulss xmm0,xmm1
movss [C],xmm0
ret 0
    
Post 13 Dec 2006, 03:37
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
RedGhost



Joined: 18 May 2005
Posts: 443
Location: BC, Canada
RedGhost 13 Dec 2006, 19:07
I got a friend of mine to give me the algorithms to get sine/cosine to the 6th decimal place on a single number, this type of math is beyond my current knowledge so fsincos or fsin/fcos is a must. Updated the main post, everything seems to be working, I guess just optimization time? (Unless I am still raping the stack on angle_vectors)

_________________
redghost.ca
Post 13 Dec 2006, 19:07
View user's profile Send private message AIM Address MSN Messenger Reply with quote
Big Red



Joined: 25 Feb 2005
Posts: 43
Big Red 14 Dec 2006, 04:14
Quote:
I don't know if it's an error with the forums or a mistake but you seem to have uploaded the same file twice (both download as the 3DNow versions, and the files are identical).


Odd, must be because both have the same filename. Clearly shows two different filesizes though... second upload must have replaced the first. Bug...

Anyhow, sorry about that; I have a website/forum curse. I've attached here the one that was originally meant to be there.

Code looks good to me, except maybe the "fst st1" 's - not sure how standard that is if a value has not been previously loaded into st1 prior to; I suppose you could use "fld st0" instead in most of that. I could be wrong though.

Well, good luck.


Description: random 3d fpu
Download
Filename: codeprocs.inc
Filesize: 23.33 KB
Downloaded: 401 Time(s)

Post 14 Dec 2006, 04:14
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.