flat assembler
Message board for the users of flat assembler.

Index > Main > Critique this. Algorithm to add vectors

Author
Thread Post new topic Reply to topic
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 20 Sep 2006, 01:29
Here's some code I wrote for fun to add vectors. It's really one of the first things I've ever written in assembly. Advice, improvements, etc wanted.

Code:
format PE

entry main

section '.code' code readable executable
main:   finit
        fldpi                   ; pi

        ; Calculate Ax and Ay
        fld     [a_mag]         ; pi a_mag
        fld     [a_rad]         ; pi a_rad a_mag
        fmul    st0,st2         ; a_rad a_mag pi
        fld     st0             ; a_rad a_rad a_mag pi
        fcos                    ; cos(a_rad) a_rad a_mag pi
        fmul    st0,st2         ; Ax a_rad a_mag pi
        fxch    st2             ; a_mag a_rad Ax pi
        fxch    st1             ; a_rad a_mag Ax pi
        fsin                    ; sin(a_rad) a_mag Ax pi
        fmulp   st1,st0         ; Ay Ax pi

        ; Calculate Bx and By
        fld     [b_mag]         ; b_mag Ay Ax pi
        fld     [b_rad]         ; b_rad b_mag Ay Ax pi
        fmul    st0,st4         ; b_rad b_mag Ay Ax pi
        ffree   st4             ; b_rad b_mag Ay Ax
        fld     st0             ; b_rad b_rad b_mag Ay Ax
        fcos                    ; cos(b_rad) b_rad b_mag Ay Ax
        fmul    st0,st2         ; Bx b_rad b_mag Ay Ax
        fxch    st2             ; b_mag b_rad Bx Ay Ax
        fxch    st1             ; b_rad b_mag Bx Ay Ax
        fsin                    ; sin(b_rad) b_mag Bx Ay Ax
        fmulp   st1,st0         ; By Bx Ay Ax

        ; Calculate Rx and Ry
        faddp   st2,st0         ; Ry Bx Ax
        fxch    st2             ; Ax Bx Ry
        faddp   st1,st0         ; Rx Ry

        ; Calculate direction of resultant
        fld     st0             ; Rx Rx Ry
        fld     st2             ; Ry Rx Rx Ry
        fpatan                  ; r_rad Rx Ry
        fstp    [r_rad]         ; Rx Ry
                                ; resultant direction is stored in memory

        ; Calculate magnitude of resultant
        fmul    st0,st0         ; Rx^2 Ry
        fxch    st1             ; Ry Rx^2
        fmul    st0,st0         ; Ry^2 Rx^2
        faddp   st1,st0         ; Ry^2+Rx^2
        fsqrt                   ; R
        fstp    [r_mag]         ; resultant magnitude is stored in memory

        ret

section '.data' data readable writeable

a_mag dq 1.0    ; Magnitude of vector A
a_rad dq 1.5    ; Direction of vector A in terms of Pi. 2.0 would be 2.0*Pi
b_mag dq 1.0    ; Magnitude of vector B
b_rad dq 0.0    ; Direction of vector B in terms of Pi. 2.0 would be 2.0*Pi
r_mag dq ?      ; Magnitude of the resultant
r_rad dq ?      ; Direction of the resultant    


Values have to be hard coded. Result is stored in r_mag and r_rad. The way I checked their values for validity is by fld'ing them back onto the stack and looking in Ollydbg.

The comments out to the side of the instructions just keep track of what's on the FPU stack after that instruction is executed.
Post 20 Sep 2006, 01:29
View user's profile Send private message Visit poster's website Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat 20 Sep 2006, 02:14
There's an "fsincos" instruction that calculates both sine and cosine at the same time; it's faster than doing them separately.
Post 20 Sep 2006, 02:14
View user's profile Send private message Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 20 Sep 2006, 02:51
Ahh, you are correct. That will save quite a few clocks.

Thanks Smile

More please Smile
Post 20 Sep 2006, 02:51
View user's profile Send private message Visit poster's website Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio 20 Sep 2006, 11:00
mattst88 wrote:
Ahh, you are correct. That will save quite a few clocks.

Thanks Smile

More please Smile

I don´t understand what you are doing, why you use trigonometric functions to add vectors?
it´s not enought to sum the components?
Post 20 Sep 2006, 11:00
View user's profile Send private message Visit poster's website Reply with quote
Garthower



Joined: 21 Apr 2006
Posts: 158
Location: Ukraine
Garthower 20 Sep 2006, 12:36
You can use 3DNOW!, or, if yours CPU new enough, SSE and/or SSE2. Banal translation into these commands it's possible to receive a gain of speed on %20-%60.
Post 20 Sep 2006, 12:36
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 20 Sep 2006, 13:12
I think the FPU is more precise, he does a lot of intermediate calculus using 80 bits precision while SSE has up to 64 bits. I now that he uses qwords memory values but he does lots of calculus before storing the final result on memory so those extra bits can make some difference in the final result compared to the result obtained with SSE (which is not an extremately big difference of course).
Post 20 Sep 2006, 13:12
View user's profile Send private message Reply with quote
Garthower



Joined: 21 Apr 2006
Posts: 158
Location: Ukraine
Garthower 20 Sep 2006, 13:28
Perhaps, and it's need to check up, since I did not check it. But in any case, this discrepancy makes million shares (if not less) if to compare the results received by commands FPU and SSE. Interesting experiment can turn out, it will be soon necessary to lead it.
Post 20 Sep 2006, 13:28
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat 20 Sep 2006, 16:14
Octavio wrote:
I don´t understand what you are doing, why you use trigonometric functions to add vectors?
it´s not enought to sum the components?


That's how you add vectors that are in rectangular form. In this case the vectors are in polar form, so they have to be converted, added, then converted back.
Post 20 Sep 2006, 16:14
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 21 Sep 2006, 18:11
Try a very unusual approach:
NB! Still far from optimal
Code:
        finit
        fldpi
        fmul            [a_rad]
        fsincos
        faddp           st1,st0
        fmul            [a_mag]
        fldpi
        fmul            [b_rad]
        fsincos
        fld             [b_mag]
        fmul            st2,st0
        fmulp           st1,st0
        fincstp
        faddp           st1,st0
        fld             st6
        fdecstp
        fld             st2
        fxch            st1
        fpatan
        fstp            [r_rad]
        fmul            st0,st0
        fxch            st1 ;Hint! This should be optimized out...
        fmul            st0,st0
        faddp           st1,st0
        fsqrt
        fstp            [r_mag]
    
Post 21 Sep 2006, 18:11
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.