flat assembler
Message board for the users of flat assembler.
Index
> Main > Need input: sin function using FPU 
Author 

mattst88
Code: sin: fld [x] ; x fmul st0,st0 ; x^2 fld [rf9] ; 1/9! x^2 fmul st0,st1 ; x^2/9! x^2 fsub [rf7] ; (1/7! + x^2/9!) x^2 fmul st0,st1 ; (x^2/7! + x^4/9!) x^2 fadd [rf5] ; (1/5!  x^2/7! + x^4/9!) x^2 fmul st0,st1 ; (x^2/5!  x^4/7! + x^6/9!) x^2 fsub [rf3] ; (1/3! + x^2/5!  x^4/7! + x^6/9!) x^2 fmulp st0,st1 ; (x^2/3! + x^4/5!  x^6/7! + x^8/9!) x^2 fmul [x] ; (x^3/3! + x^5/5!  x^7/7! + x^9/9!) fadd [x] ; (x  x^3/3! + x^5/5!  x^7/7! + x^9/9!) fstp [x] ; rf9 dq 2.7557319223985890651862166557528e6 rf7 dq 0.0001984126984126984126984126984127 rf5 dq 0.0083333333333333333333333333333333 rf3 dq 0.16666666666666666666666666666667 x dq ? I wrote this after figuring out Goplat's awesome cosine function he wrote in this thread. I've never figured out how you people count the clocks experimentally so if someone would like to A) do it and tell me the results or B) show me how I'd be very appreciative. Any errors? Input is more thank welcome. Thanks guys _________________ My x86 Instruction Reference  includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal 

26 Jan 2007, 01:34 

LocoDelAssembly
You can get an idea from http://www.agner.org/optimize/testp.zip and a recent thread http://board.flatassembler.net/topic.php?t=6563 (but much better first link )


26 Jan 2007, 03:07 

MCD
You may view this old Fasm thread where I have posted an interesting algorithm, it's based of combining values of a LUT with some less further Taylor refinements. Also an examples of a very fast double and single precision cosine and sine function done with SSE (maybe not what you need) is included
see about in the middle of the page http://board.flatassembler.net/topic.php?t=3841 I especially mean this Code: cos(a) = cos( around(a) + round(a) ) using this trigonometric equation cos(a+b) = cos(a)*cos(b)  sin(a)*sin(b) => cos(around(a)) * costab[round(a)]  sin(around(a)) * sintab[round(a)] 

26 Jan 2007, 05:38 

MCD
Jack wrote: you should use a minimax approximation instead of taylor series, for example the following approximation is accurate to six digits in the interval Pi/2 .. Pi/2 But you're right, those other polynoms are even more effective than the taylor ones cause you try to get the best approximation in an interval(like [0,2*pi]), and not only in 1 point like 0. But best would be to combine those minimax polynom with the algorithm I have posted above. I meane decomposing the sin/cosine into a bigger LUT part and into a smaller remainder that is passed to the minimax polynom which must then be set up to settle down very quickly in a different but smaller intervall, e.g. not [0,2*pi] but rather [0,d] with d very close to 0. The resulting polynom will then be almost same as the taylor one, but only with very little change in parameters, but these small changes are what makes the difference! Finally I got a SSE float example Code: CosSSE: movss xmm0,[esi] mulss xmm0,[Rad2LUT] cvtss2si eax,xmm0 cvtsi2ss xmm1,eax and eax,0FFh shl eax,4 subss xmm0,xmm1 mulss xmm0,[LUT2Rad] movaps xmm1,xmm0 mulss xmm0,xmm0 movlhps xmm0,xmm0 mulss xmm0,xmm1 movhps xmm1,[_1.0] mulps xmm0,[SinCosCoeff] addps xmm0,xmm1 mulps xmm0,[SinCosLUT+eax] movhlps xmm1,xmm0 subss xmm1,xmm0 movss [edi],xmm1 ret align 4 Rad2LUT dd 40.743665 ;256/(2*pi) LUT2Rad dd 0.024543692;2*pi/256 _1.0: dd 1.0,0 align 16 SinCosCoeff: dd 0.16666230,0, 0.49997921,0;~ 1/3!, ~ 1/2! SinCosLUT: rb 256*4*4;Have to be setup before first use ;The format of the LUT is as follows: ;[Single sine(0*(2pi/256))] [0] [Single cosine(0*(2pi/256))] [0] ; 0 4 8 0Ch ;[Single sine(1*(2pi/256))] [0] [Single cosine(1*(2pi/256))] [0] ; 10h 14h 18h 1Ch ; . . . . . . I know this math stuff isn't easy if you aren't into that, anyway, that task/algorithm could be clarified very well with a functionplot, but I'm too lazy for that now p.s. this code is over a year old, so it's kinda unmainted, but it workeed at the time I wrote it _________________ MCD  the inevitable return of the Mad Computer Doggy __/ .+~ .  Last edited by MCD on 28 Jan 2007, 13:16; edited 1 time in total 

26 Jan 2007, 18:10 

rugxulo
LocoDelAssembly wrote: You can get an idea from http://www.agner.org/optimize/testp.zip and a recent thread http://board.flatassembler.net/topic.php?t=6563 (but much better first link ) I would not recommend downloading that, especially because of PMCDOS.ZIP (ugh, that Agner guy just can't take a hint). 

28 Jan 2007, 04:23 

mattst88
rugxulo wrote:
Elaborate please. _________________ My x86 Instruction Reference  includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal 

28 Jan 2007, 19:14 

rugxulo
mattst88, let's just say that neither Richard Stallman nor Linus Torvalds would EVER approve, m'kay? DON'T DO IT!


28 Jan 2007, 21:06 

FrozenKnight
to test clocks i use the following code
Code: sub esp, 8 RDTSC mov [esp], eax mov [esp+4], edx Code: RDTSC sub eax, [esp] sbb edx, [esp+4] add esp, 8 if everything works then the result will be stored in edx:eax. 

31 Jan 2007, 21:32 

Tomasz Grysztar
You may also find this interesting.


31 Jan 2007, 21:53 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.