flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
mattst88 26 Jan 2007, 01:34
Code: sin: fld [x] ; x fmul st0,st0 ; x^2 fld [rf9] ; 1/9! x^2 fmul st0,st1 ; x^2/9! x^2 fsub [rf7] ; (-1/7! + x^2/9!) x^2 fmul st0,st1 ; (-x^2/7! + x^4/9!) x^2 fadd [rf5] ; (1/5! - x^2/7! + x^4/9!) x^2 fmul st0,st1 ; (x^2/5! - x^4/7! + x^6/9!) x^2 fsub [rf3] ; (-1/3! + x^2/5! - x^4/7! + x^6/9!) x^2 fmulp st0,st1 ; (-x^2/3! + x^4/5! - x^6/7! + x^8/9!) x^2 fmul [x] ; (-x^3/3! + x^5/5! - x^7/7! + x^9/9!) fadd [x] ; (x - x^3/3! + x^5/5! - x^7/7! + x^9/9!) fstp [x] ; rf9 dq 2.7557319223985890651862166557528e-6 rf7 dq 0.0001984126984126984126984126984127 rf5 dq 0.0083333333333333333333333333333333 rf3 dq 0.16666666666666666666666666666667 x dq ? I wrote this after figuring out Goplat's awesome cosine function he wrote in this thread. I've never figured out how you people count the clocks experimentally so if someone would like to A) do it and tell me the results or B) show me how I'd be very appreciative. Any errors? Input is more thank welcome. Thanks guys ![]() _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
![]() |
|
LocoDelAssembly 26 Jan 2007, 03:07
You can get an idea from http://www.agner.org/optimize/testp.zip and a recent thread http://board.flatassembler.net/topic.php?t=6563 (but much better first link
![]() |
|||
![]() |
|
MCD 26 Jan 2007, 05:38
You may view this old Fasm thread where I have posted an interesting algorithm, it's based of combining values of a LUT with some less further Taylor refinements. Also an examples of a very fast double and single precision cosine and sine function done with SSE (maybe not what you need) is included
see about in the middle of the page http://board.flatassembler.net/topic.php?t=3841 I especially mean this Code: cos(a) = cos( a-round(a) + round(a) ) using this trigonometric equation cos(a+b) = cos(a)*cos(b) - sin(a)*sin(b) => cos(a-round(a)) * costab[round(a)] - sin(a-round(a)) * sintab[round(a)] |
|||
![]() |
|
MCD 26 Jan 2007, 18:10
Jack wrote: you should use a minimax approximation instead of taylor series, for example the following approximation is accurate to six digits in the interval -Pi/2 .. Pi/2 But you're right, those other polynoms are even more effective than the taylor ones cause you try to get the best approximation in an interval(like [0,2*pi]), and not only in 1 point like 0. But best would be to combine those minimax polynom with the algorithm I have posted above. I meane decomposing the sin/cosine into a bigger LUT part and into a smaller remainder that is passed to the minimax polynom which must then be set up to settle down very quickly in a different but smaller intervall, e.g. not [0,2*pi] but rather [0,d] with d very close to 0. The resulting polynom will then be almost same as the taylor one, but only with very little change in parameters, but these small changes are what makes the difference! Finally I got a SSE float example Code: CosSSE: movss xmm0,[esi] mulss xmm0,[Rad2LUT] cvtss2si eax,xmm0 cvtsi2ss xmm1,eax and eax,0FFh shl eax,4 subss xmm0,xmm1 mulss xmm0,[LUT2Rad] movaps xmm1,xmm0 mulss xmm0,xmm0 movlhps xmm0,xmm0 mulss xmm0,xmm1 movhps xmm1,[_1.0] mulps xmm0,[SinCosCoeff] addps xmm0,xmm1 mulps xmm0,[SinCosLUT+eax] movhlps xmm1,xmm0 subss xmm1,xmm0 movss [edi],xmm1 ret align 4 Rad2LUT dd 40.743665 ;256/(2*pi) LUT2Rad dd 0.024543692;2*pi/256 _1.0: dd 1.0,0 align 16 SinCosCoeff: dd -0.16666230,0, -0.49997921,0;~ -1/3!, ~ -1/2! SinCosLUT: rb 256*4*4;Have to be setup before first use ;The format of the LUT is as follows: ;[Single sine(0*(2pi/256))] [0] [Single cosine(0*(2pi/256))] [0] ; 0 4 8 0Ch ;[Single sine(1*(2pi/256))] [0] [Single cosine(1*(2pi/256))] [0] ; 10h 14h 18h 1Ch ; . . . . . . I know this math stuff isn't easy if you aren't into that, anyway, that task/algorithm could be clarified very well with a function-plot, but I'm too lazy for that now p.s. this code is over a year old, so it's kinda unmainted, but it workeed at the time I wrote it _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || Last edited by MCD on 28 Jan 2007, 13:16; edited 1 time in total |
|||
![]() |
|
rugxulo 28 Jan 2007, 04:23
LocoDelAssembly wrote: You can get an idea from http://www.agner.org/optimize/testp.zip and a recent thread http://board.flatassembler.net/topic.php?t=6563 (but much better first link I would not recommend downloading that, especially because of PMCDOS.ZIP (ugh, that Agner guy just can't take a hint). |
|||
![]() |
|
mattst88 28 Jan 2007, 19:14
rugxulo wrote:
Elaborate please. _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
![]() |
|
rugxulo 28 Jan 2007, 21:06
mattst88, let's just say that neither Richard Stallman nor Linus Torvalds would EVER approve, m'kay? DON'T DO IT!
|
|||
![]() |
|
FrozenKnight 31 Jan 2007, 21:32
to test clocks i use the following code
Code: sub esp, 8 RDTSC mov [esp], eax mov [esp+4], edx Code: RDTSC sub eax, [esp] sbb edx, [esp+4] add esp, 8 if everything works then the result will be stored in edx:eax. |
|||
![]() |
|
Tomasz Grysztar 31 Jan 2007, 21:53
You may also find this interesting.
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.