flat assembler
Message board for the users of flat assembler.
Index
> Main > Fast inverse square root, anyone ported it to fasm yet? 
Author 

SomeoneNew
from: http://www.beyond3d.com/content/articles/8/
I was wondering if someone has ported this code into FASM yet? It appears to produce faster results than 1.0/sqr(n) ! (and yes I know the code is originally 15 years old, still, useful for some stuff). _________________ Im new, sorry if I bothered with any stupid question 

28 Dec 2007, 19:19 

asmfan
Try to port it and compare to FPU & SSE versions if interested.


28 Dec 2007, 19:24 

edfed
speaking about SQR...
by shifting right the exponent.. we obtain the square root of a fp number? 

28 Dec 2007, 19:32 

DJ Mauretto
Code: x DD 2.0 const05 DD 0.5 i DD ? const15 DD 1.5 Return DD ? ; st st1 fld [x] ; x fmul [const05] ; 0.5*x mov eax, [x] sar eax, 1 mov ecx, 5f3759dfH sub ecx, eax mov [i], ecx ; i = 0x5f3759df  (i>>1) ; st st1 fld [i] ; i 0.5*x fmul [i] ; i*i 0.5*x fxch st1 ; 0.5*x i*i fmul st,st1 ; 0.5*x*i*i i*i fsubr [const15] ; 1.50.5*x*i*i i*i fmul [i] ; i*(1.50.5*x*i*i) i*i fstp [Return] ; i*i fstp st ; Empty Last edited by DJ Mauretto on 31 Dec 2007, 14:07; edited 1 time in total 

28 Dec 2007, 20:46 

Xorpd!
Paul Hsieh's web page is a better source than the one given above for the piecewiselinear approximation. I don't know why you would bother at this point in time as rsqrtps gets you better than one part in 2048 while the piecewise linear method is only about one part in 33.


29 Dec 2007, 01:43 

FrozenKnight
if you need accuracy just use newtons method, you can adjust the accuracy to performance ratio as needed.


29 Dec 2007, 11:46 

Borsuc
SomeoneNew wrote: from: http://www.beyond3d.com/content/articles/8/ Here a different explanation on the method. I guess creating a 4th order polynomial (with leastsquares differences between it and 1/sqrt) would produce better results, and it can also be factored so it can be computed in 4 multiplications and 4 additions (possibly parallel) by factoring out the roots. Though, a 5th such polynomial would be more parallelable and, of course, more accurate (however, there will be 5 muls and 5 adds). I have a question though: why does the code accept inputs between 0.5 and 1.0? Why not from 0.0 to 1.0 (I know it blows up to infinity at 0)? Do values under 0.5 work? regarding values > 1.0, they can be normalized 

31 Dec 2007, 14:16 

mattst88
This page (http://olivermcfadden.livejournal.com/15872.html) has some links to some very interesting papers/articles regarding reciprocal square roots.
One paper linked even finds a better 'magic number' (0x5f3759df) 

01 Jan 2008, 19:44 

Borsuc
edfed wrote: by shifting right the exponent.. we obtain the square root of a fp number? for full sqrt you'll also need the square root of the mantissa, which is the complicated problem of doing it fast 

02 Jan 2008, 16:14 

Madis731
I think...
Code: 4.4 Floating point XMM instructions Instruction Operands uops fused domain uops unfused domain Latency Reciprocal throughput Math p0 p1 p01 p2 p3 p4 RSQRTPS xmm,xmm 2 3 3 2 RSQRTPS xmm,m128 4 2 2 2 

02 Jan 2008, 17:13 

revolution
IIRC the AMD optimisation manual has code to compute full precision 1/SQRT using newtons method after the first RSQRTPS & RSQRTPD


02 Jan 2008, 17:21 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.