flat assembler
Message board for the users of flat assembler.

Index > Main > Fast inverse square root, anyone ported it to fasm yet?

Author
Thread Post new topic Reply to topic
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew
from: http://www.beyond3d.com/content/articles/8/
I was wondering if someone has ported this code into FASM yet?

It appears to produce faster results than 1.0/sqr(n) ! (and yes I know the code is originally 15 years old, still, useful for some stuff).

_________________
Im new, sorry if I bothered with any stupid question Smile
Post 28 Dec 2007, 19:19
View user's profile Send private message Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
Try to port it and compare to FPU & SSE versions if interested.
Post 28 Dec 2007, 19:24
View user's profile Send private message Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew
Actually you must perform integer operations for this to work, please read the C code to understand whats going on.

I only ask so I don't waste time on something that was already done Smile

Thanks
Post 28 Dec 2007, 19:32
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4240
Location: 2018
edfed
speaking about SQR...

by shifting right the exponent.. we obtain the square root of a fp number?
Post 28 Dec 2007, 19:32
View user's profile Send private message Visit poster's website Reply with quote
DJ Mauretto



Joined: 14 Mar 2007
Posts: 464
Location: Rome,Italy
DJ Mauretto
Code:
x             DD 2.0
const05               DD 0.5
i             DD ?
const15         DD 1.5
Return                DD ?


                                                ; st  st1
           fld     [x]                     ;  x 
               fmul    [const05]               ; 0.5*x


             mov     eax, [x]
            sar     eax, 1
              mov     ecx, 5f3759dfH                          
            sub     ecx, eax
            mov     [i], ecx                ; i = 0x5f3759df - (i>>1)

                                         ; st                  st1
           fld     [i]                     ;  i                  0.5*x
         fmul    [i]                     ; i*i                 0.5*x
         fxch    st1                     ; 0.5*x               i*i
           fmul    st,st1                  ; 0.5*x*i*i           i*i
           fsubr   [const15]               ; 1.5-0.5*x*i*i       i*i
           fmul    [i]                     ; i*(1.5-0.5*x*i*i)   i*i
           fstp    [Return]                ; i*i
               fstp    st                      ; Empty Smile    


Last edited by DJ Mauretto on 31 Dec 2007, 14:07; edited 1 time in total
Post 28 Dec 2007, 20:46
View user's profile Send private message Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd!
Paul Hsieh's web page is a better source than the one given above for the piecewise-linear approximation. I don't know why you would bother at this point in time as rsqrtps gets you better than one part in 2048 while the piecewise linear method is only about one part in 33.
Post 29 Dec 2007, 01:43
View user's profile Send private message Visit poster's website Reply with quote
FrozenKnight



Joined: 24 Jun 2005
Posts: 128
FrozenKnight
if you need accuracy just use newtons method, you can adjust the accuracy to performance ratio as needed.
Post 29 Dec 2007, 11:46
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
SomeoneNew wrote:
from: http://www.beyond3d.com/content/articles/8/
I was wondering if someone has ported this code into FASM yet?

It appears to produce faster results than 1.0/sqr(n) ! (and yes I know the code is originally 15 years old, still, useful for some stuff).
Very good code.

Here a different explanation on the method.

I guess creating a 4th order polynomial (with least-squares differences between it and 1/sqrt) would produce better results, and it can also be factored so it can be computed in 4 multiplications and 4 additions (possibly parallel) by factoring out the roots. Though, a 5th such polynomial would be more parallelable and, of course, more accurate (however, there will be 5 muls and 5 adds).

I have a question though: why does the code accept inputs between 0.5 and 1.0? Why not from 0.0 to 1.0 (I know it blows up to infinity at 0)? Do values under 0.5 work?

regarding values > 1.0, they can be normalized Wink
Post 31 Dec 2007, 14:16
View user's profile Send private message Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
This page (http://olivermcfadden.livejournal.com/15872.html) has some links to some very interesting papers/articles regarding reciprocal square roots.

One paper linked even finds a better 'magic number' (0x5f3759df)
Post 01 Jan 2008, 19:44
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
edfed wrote:
by shifting right the exponent.. we obtain the square root of a fp number?
you'll only get the square root of the exponent.

for full sqrt you'll also need the square root of the mantissa, which is the complicated problem of doing it fast Wink
Post 02 Jan 2008, 16:14
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
I think...
Code:
4.4 Floating point XMM instructions
Instruction     Operands        uops fused domain       uops unfused domain     Latency         Reciprocal throughput
Math                                                    p0 p1 p01 p2 p3 p4

RSQRTPS         xmm,xmm         2                          3                    3               2
RSQRTPS         xmm,m128        4                          2      2                             2
    
Post 02 Jan 2008, 17:13
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17476
Location: In your JS exploiting you and your system
revolution
IIRC the AMD optimisation manual has code to compute full precision 1/SQRT using newtons method after the first RSQRTPS & RSQRTPD
Post 02 Jan 2008, 17:21
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.