flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
SomeoneNew
from: http://www.beyond3d.com/content/articles/8/
I was wondering if someone has ported this code into FASM yet? It appears to produce faster results than 1.0/sqr(n) ! (and yes I know the code is originally 15 years old, still, useful for some stuff). _________________ Im new, sorry if I bothered with any stupid question ![]() |
|||
![]() |
|
asmfan
Try to port it and compare to FPU & SSE versions if interested.
|
|||
![]() |
|
edfed
speaking about SQR...
by shifting right the exponent.. we obtain the square root of a fp number? |
|||
![]() |
|
DJ Mauretto
Code: x DD 2.0 const05 DD 0.5 i DD ? const15 DD 1.5 Return DD ? ; st st1 fld [x] ; x fmul [const05] ; 0.5*x mov eax, [x] sar eax, 1 mov ecx, 5f3759dfH sub ecx, eax mov [i], ecx ; i = 0x5f3759df - (i>>1) ; st st1 fld [i] ; i 0.5*x fmul [i] ; i*i 0.5*x fxch st1 ; 0.5*x i*i fmul st,st1 ; 0.5*x*i*i i*i fsubr [const15] ; 1.5-0.5*x*i*i i*i fmul [i] ; i*(1.5-0.5*x*i*i) i*i fstp [Return] ; i*i fstp st ; Empty Last edited by DJ Mauretto on 31 Dec 2007, 14:07; edited 1 time in total |
|||
![]() |
|
Xorpd!
Paul Hsieh's web page is a better source than the one given above for the piecewise-linear approximation. I don't know why you would bother at this point in time as rsqrtps gets you better than one part in 2048 while the piecewise linear method is only about one part in 33.
|
|||
![]() |
|
FrozenKnight
if you need accuracy just use newtons method, you can adjust the accuracy to performance ratio as needed.
|
|||
![]() |
|
Borsuc
SomeoneNew wrote: from: http://www.beyond3d.com/content/articles/8/ Here a different explanation on the method. I guess creating a 4th order polynomial (with least-squares differences between it and 1/sqrt) would produce better results, and it can also be factored so it can be computed in 4 multiplications and 4 additions (possibly parallel) by factoring out the roots. Though, a 5th such polynomial would be more parallelable and, of course, more accurate (however, there will be 5 muls and 5 adds). I have a question though: why does the code accept inputs between 0.5 and 1.0? Why not from 0.0 to 1.0 (I know it blows up to infinity at 0)? Do values under 0.5 work? regarding values > 1.0, they can be normalized ![]() |
|||
![]() |
|
mattst88
This page (http://olivermcfadden.livejournal.com/15872.html) has some links to some very interesting papers/articles regarding reciprocal square roots.
One paper linked even finds a better 'magic number' (0x5f3759df) |
|||
![]() |
|
Borsuc
edfed wrote: by shifting right the exponent.. we obtain the square root of a fp number? for full sqrt you'll also need the square root of the mantissa, which is the complicated problem of doing it fast ![]() |
|||
![]() |
|
Madis731
I think...
Code: 4.4 Floating point XMM instructions Instruction Operands uops fused domain uops unfused domain Latency Reciprocal throughput Math p0 p1 p01 p2 p3 p4 RSQRTPS xmm,xmm 2 3 3 2 RSQRTPS xmm,m128 4 2 2 2 |
|||
![]() |
|
revolution
IIRC the AMD optimisation manual has code to compute full precision 1/SQRT using newtons method after the first RSQRTPS & RSQRTPD
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.