flat assembler
Message board for the users of flat assembler.
Index
> Main > Critique this. Algorithm to add vectors |
Author |
|
Goplat 20 Sep 2006, 02:14
There's an "fsincos" instruction that calculates both sine and cosine at the same time; it's faster than doing them separately.
|
|||
20 Sep 2006, 02:14 |
|
mattst88 20 Sep 2006, 02:51
Ahh, you are correct. That will save quite a few clocks.
Thanks More please |
|||
20 Sep 2006, 02:51 |
|
Octavio 20 Sep 2006, 11:00
mattst88 wrote: Ahh, you are correct. That will save quite a few clocks. I don´t understand what you are doing, why you use trigonometric functions to add vectors? it´s not enought to sum the components? |
|||
20 Sep 2006, 11:00 |
|
Garthower 20 Sep 2006, 12:36
You can use 3DNOW!, or, if yours CPU new enough, SSE and/or SSE2. Banal translation into these commands it's possible to receive a gain of speed on %20-%60.
|
|||
20 Sep 2006, 12:36 |
|
LocoDelAssembly 20 Sep 2006, 13:12
I think the FPU is more precise, he does a lot of intermediate calculus using 80 bits precision while SSE has up to 64 bits. I now that he uses qwords memory values but he does lots of calculus before storing the final result on memory so those extra bits can make some difference in the final result compared to the result obtained with SSE (which is not an extremately big difference of course).
|
|||
20 Sep 2006, 13:12 |
|
Garthower 20 Sep 2006, 13:28
Perhaps, and it's need to check up, since I did not check it. But in any case, this discrepancy makes million shares (if not less) if to compare the results received by commands FPU and SSE. Interesting experiment can turn out, it will be soon necessary to lead it.
|
|||
20 Sep 2006, 13:28 |
|
Goplat 20 Sep 2006, 16:14
Octavio wrote: I don´t understand what you are doing, why you use trigonometric functions to add vectors? That's how you add vectors that are in rectangular form. In this case the vectors are in polar form, so they have to be converted, added, then converted back. |
|||
20 Sep 2006, 16:14 |
|
Madis731 21 Sep 2006, 18:11
Try a very unusual approach:
NB! Still far from optimal Code: finit fldpi fmul [a_rad] fsincos faddp st1,st0 fmul [a_mag] fldpi fmul [b_rad] fsincos fld [b_mag] fmul st2,st0 fmulp st1,st0 fincstp faddp st1,st0 fld st6 fdecstp fld st2 fxch st1 fpatan fstp [r_rad] fmul st0,st0 fxch st1 ;Hint! This should be optimized out... fmul st0,st0 faddp st1,st0 fsqrt fstp [r_mag] |
|||
21 Sep 2006, 18:11 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.