flat assembler
Message board for the users of flat assembler.

 Index > Main > Fast inverse square root, anyone ported it to fasm yet?
Author
SomeoneNew

Joined: 12 Aug 2006
Posts: 54
SomeoneNew 28 Dec 2007, 19:19
from: http://www.beyond3d.com/content/articles/8/
I was wondering if someone has ported this code into FASM yet?

It appears to produce faster results than 1.0/sqr(n) ! (and yes I know the code is originally 15 years old, still, useful for some stuff).

_________________
Im new, sorry if I bothered with any stupid question
28 Dec 2007, 19:19
asmfan

Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan 28 Dec 2007, 19:24
Try to port it and compare to FPU & SSE versions if interested.
28 Dec 2007, 19:24
SomeoneNew

Joined: 12 Aug 2006
Posts: 54
SomeoneNew 28 Dec 2007, 19:32
Actually you must perform integer operations for this to work, please read the C code to understand whats going on.

I only ask so I don't waste time on something that was already done

Thanks
28 Dec 2007, 19:32
edfed

Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 28 Dec 2007, 19:32

by shifting right the exponent.. we obtain the square root of a fp number?
28 Dec 2007, 19:32
DJ Mauretto

Joined: 14 Mar 2007
Posts: 464
Location: Rome,Italy
DJ Mauretto 28 Dec 2007, 20:46
Code:
```x             DD 2.0
const05               DD 0.5
i             DD ?
const15         DD 1.5
Return                DD ?

; st  st1
fld     [x]                     ;  x
fmul    [const05]               ; 0.5*x

mov     eax, [x]
sar     eax, 1
mov     ecx, 5f3759dfH
sub     ecx, eax
mov     [i], ecx                ; i = 0x5f3759df - (i>>1)

; st                  st1
fld     [i]                     ;  i                  0.5*x
fmul    [i]                     ; i*i                 0.5*x
fxch    st1                     ; 0.5*x               i*i
fmul    st,st1                  ; 0.5*x*i*i           i*i
fsubr   [const15]               ; 1.5-0.5*x*i*i       i*i
fmul    [i]                     ; i*(1.5-0.5*x*i*i)   i*i
fstp    [Return]                ; i*i
fstp    st                      ; Empty     ```

Last edited by DJ Mauretto on 31 Dec 2007, 14:07; edited 1 time in total
28 Dec 2007, 20:46
Xorpd!

Joined: 21 Dec 2006
Posts: 161
Xorpd! 29 Dec 2007, 01:43
Paul Hsieh's web page is a better source than the one given above for the piecewise-linear approximation. I don't know why you would bother at this point in time as rsqrtps gets you better than one part in 2048 while the piecewise linear method is only about one part in 33.
29 Dec 2007, 01:43
FrozenKnight

Joined: 24 Jun 2005
Posts: 128
FrozenKnight 29 Dec 2007, 11:46
if you need accuracy just use newtons method, you can adjust the accuracy to performance ratio as needed.
29 Dec 2007, 11:46
Borsuc

Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 31 Dec 2007, 14:16
SomeoneNew wrote:
from: http://www.beyond3d.com/content/articles/8/
I was wondering if someone has ported this code into FASM yet?

It appears to produce faster results than 1.0/sqr(n) ! (and yes I know the code is originally 15 years old, still, useful for some stuff).
Very good code.

Here a different explanation on the method.

I guess creating a 4th order polynomial (with least-squares differences between it and 1/sqrt) would produce better results, and it can also be factored so it can be computed in 4 multiplications and 4 additions (possibly parallel) by factoring out the roots. Though, a 5th such polynomial would be more parallelable and, of course, more accurate (however, there will be 5 muls and 5 adds).

I have a question though: why does the code accept inputs between 0.5 and 1.0? Why not from 0.0 to 1.0 (I know it blows up to infinity at 0)? Do values under 0.5 work?

regarding values > 1.0, they can be normalized
31 Dec 2007, 14:16
mattst88

Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 01 Jan 2008, 19:44

One paper linked even finds a better 'magic number' (0x5f3759df)
01 Jan 2008, 19:44
Borsuc

Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 02 Jan 2008, 16:14
edfed wrote:
by shifting right the exponent.. we obtain the square root of a fp number?
you'll only get the square root of the exponent.

for full sqrt you'll also need the square root of the mantissa, which is the complicated problem of doing it fast
02 Jan 2008, 16:14

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
I think...
Code:
```4.4 Floating point XMM instructions
Instruction     Operands        uops fused domain       uops unfused domain     Latency         Reciprocal throughput
Math                                                    p0 p1 p01 p2 p3 p4

RSQRTPS         xmm,xmm         2                          3                    3               2
RSQRTPS         xmm,m128        4                          2      2                             2
```
02 Jan 2008, 17:13
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20212
revolution 02 Jan 2008, 17:21
IIRC the AMD optimisation manual has code to compute full precision 1/SQRT using newtons method after the first RSQRTPS & RSQRTPD
02 Jan 2008, 17:21
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum