flat assembler
Message board for the users of flat assembler.

 Index > Main > Need input: sin function using FPU
Author
mattst88

Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 26 Jan 2007, 01:34
Code:
```sin:
fld     [x]     ; x
fmul    st0,st0 ; x^2
fld     [rf9]   ; 1/9! x^2
fmul    st0,st1 ; x^2/9! x^2
fsub    [rf7]   ; (-1/7! + x^2/9!) x^2
fmul    st0,st1 ; (-x^2/7! + x^4/9!) x^2
fadd    [rf5]   ; (1/5! - x^2/7! + x^4/9!) x^2
fmul    st0,st1 ; (x^2/5! - x^4/7! + x^6/9!) x^2
fsub    [rf3]   ; (-1/3! + x^2/5! - x^4/7! + x^6/9!) x^2
fmulp   st0,st1 ; (-x^2/3! + x^4/5! - x^6/7! + x^8/9!) x^2
fmul    [x]     ; (-x^3/3! + x^5/5! - x^7/7! + x^9/9!)
fadd    [x]     ; (x - x^3/3! + x^5/5! - x^7/7! + x^9/9!)
fstp    [x]     ;

rf9     dq      2.7557319223985890651862166557528e-6
rf7     dq      0.0001984126984126984126984126984127
rf5     dq      0.0083333333333333333333333333333333
rf3     dq      0.16666666666666666666666666666667

x       dq      ?    ```

I wrote this after figuring out Goplat's awesome cosine function he wrote in this thread.

I've never figured out how you people count the clocks experimentally so if someone would like to A) do it and tell me the results or B) show me how I'd be very appreciative.

Any errors? Input is more thank welcome.

Thanks guys

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
26 Jan 2007, 01:34
LocoDelAssembly

Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 26 Jan 2007, 03:07
You can get an idea from http://www.agner.org/optimize/testp.zip and a recent thread http://board.flatassembler.net/topic.php?t=6563 (but much better first link )
26 Jan 2007, 03:07
MCD

Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 26 Jan 2007, 05:38
You may view this old Fasm thread where I have posted an interesting algorithm, it's based of combining values of a LUT with some less further Taylor refinements. Also an examples of a very fast double and single precision cosine and sine function done with SSE (maybe not what you need) is included
see about in the middle of the page http://board.flatassembler.net/topic.php?t=3841

I especially mean this
Code:
```cos(a) = cos( a-round(a)   +   round(a) )

using this trigonometric equation cos(a+b) = cos(a)*cos(b) - sin(a)*sin(b)

=> cos(a-round(a)) * costab[round(a)] - sin(a-round(a)) * sintab[round(a)]
```
26 Jan 2007, 05:38
Jack

Joined: 16 Feb 2005
Posts: 21
Jack 26 Jan 2007, 11:43
you should use a minimax approximation instead of taylor series, for example the following approximation is accurate to six digits in the interval -Pi/2 .. Pi/2
Code:
```.99999661590855105827518776*x- .16664828382099226703400879*x^3+ .0083063252288911864431576062*x^5- .00018363654014537635070098734*x^7
```
26 Jan 2007, 11:43
MCD

Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 26 Jan 2007, 18:10
Jack wrote:
you should use a minimax approximation instead of taylor series, for example the following approximation is accurate to six digits in the interval -Pi/2 .. Pi/2
Code:
```.99999661590855105827518776*x- .16664828382099226703400879*x^3+ .0083063252288911864431576062*x^5- .00018363654014537635070098734*x^7
```
ahh, you mou meant something like minimizing the maximum difference between the approximated function and the approximation polynom instead of juat Taylor polynom? Well, I have simply used the Taylor one cause it's easier to explain.

But you're right, those other polynoms are even more effective than the taylor ones cause you try to get the best approximation in an interval(like [0,2*pi]), and not only in 1 point like 0.

But best would be to combine those minimax polynom with the algorithm I have posted above.
I meane decomposing the sin/cosine into a bigger LUT part and into a smaller remainder that is passed to the minimax polynom which must then be set up to settle down very quickly in a different but smaller intervall, e.g. not [0,2*pi] but rather [0,d] with d very close to 0.
The resulting polynom will then be almost same as the taylor one, but only with very little change in parameters, but these small changes are what makes the difference!

Finally I got a SSE float example
Code:
```CosSSE:
movss   xmm0,[esi]
cvtss2si        eax,xmm0
cvtsi2ss        xmm1,eax
and     eax,0FFh
shl     eax,4
subss   xmm0,xmm1
movaps  xmm1,xmm0
mulss   xmm0,xmm0
movlhps xmm0,xmm0
mulss   xmm0,xmm1
movhps  xmm1,[_1.0]
mulps   xmm0,[SinCosCoeff]
mulps   xmm0,[SinCosLUT+eax]
movhlps xmm1,xmm0
subss   xmm1,xmm0
movss   [edi],xmm1
ret

align   4
_1.0:
dd      1.0,0

align   16
SinCosCoeff:
dd      -0.16666230,0, -0.49997921,0;~ -1/3!, ~ -1/2!
SinCosLUT:
rb      256*4*4;Have to be setup before first use
;The format of the LUT is as follows:
;[Single sine(0*(2pi/256))] [0] [Single cosine(0*(2pi/256))] [0]
; 0                          4   8                            0Ch
;[Single sine(1*(2pi/256))] [0] [Single cosine(1*(2pi/256))] [0]
; 10h                        14h 18h                          1Ch
;               . . .                   . . .
```

I know this math stuff isn't easy if you aren't into that, anyway, that task/algorithm could be clarified very well with a function-plot, but I'm too lazy for that now

p.s. this code is over a year old, so it's kinda unmainted, but it workeed at the time I wrote it

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||

Last edited by MCD on 28 Jan 2007, 13:16; edited 1 time in total
26 Jan 2007, 18:10
rugxulo

Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 28 Jan 2007, 04:23
LocoDelAssembly wrote:
You can get an idea from http://www.agner.org/optimize/testp.zip and a recent thread http://board.flatassembler.net/topic.php?t=6563 (but much better first link )

I would not recommend downloading that, especially because of PMCDOS.ZIP (ugh, that Agner guy just can't take a hint).
28 Jan 2007, 04:23
mattst88

Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 28 Jan 2007, 19:14
rugxulo wrote:
LocoDelAssembly wrote:
You can get an idea from http://www.agner.org/optimize/testp.zip and a recent thread http://board.flatassembler.net/topic.php?t=6563 (but much better first link )

I would not recommend downloading that, especially because of PMCDOS.ZIP (ugh, that Agner guy just can't take a hint).

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
28 Jan 2007, 19:14
rugxulo

Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 28 Jan 2007, 21:06
mattst88, let's just say that neither Richard Stallman nor Linus Torvalds would EVER approve, m'kay? DON'T DO IT!
28 Jan 2007, 21:06
FrozenKnight

Joined: 24 Jun 2005
Posts: 128
FrozenKnight 31 Jan 2007, 21:32
to test clocks i use the following code
Code:
```  sub   esp, 8
RDTSC
mov   [esp], eax
mov   [esp+4], edx     ```
then when my code segment is done i use
Code:
```  RDTSC
sub   eax, [esp]
sbb   edx, [esp+4]
a few clock cycles are added because of the code involved but generally this will tell you about how much time it took to run a code segment. be warned you should limit the code to one cpu for proper testing. and try to keep the stack properly balanced within the tested code it adjust my code for you stack.
if everything works then the result will be stored in edx:eax.
31 Jan 2007, 21:32
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8344
Location: Kraków, Poland
Tomasz Grysztar 31 Jan 2007, 21:53
You may also find this interesting.
31 Jan 2007, 21:53
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum