flat assembler
Message board for the users of flat assembler.
Index
> Main > nth roots in asm 
Author 

LocoDelAssembly
Take a look to http://board.flatassembler.net/topic.php?t=10488 or maybe http://www.madwizard.org/programming/snippets?id=36 .
Remember that the Nth root of X is also X^(1/N). 

06 Mar 2010, 05:56 

Tyler
I haven't looked at the resources you provided yet, but the reason I knew of X^(1/N) and didn't
see it as the answer is because I would do exponents as a loop that repeated for the number that is the exponent, each rep, muling the number by the previous total. Like Code: ; ax^cx exponent: mov ax, bx @@: imul ax, bx dec cx cmp cx, 0 jne @b Say I want to do 16^(1/2), I can't because I can't do 1/2 repetition. 

06 Mar 2010, 06:43 

edemko
Quote:
x^y = 2^z z = log(2;x^y) = log(2;x)*y use fpu 

06 Mar 2010, 06:51 

edemko
Quote:
equals Code: loop @b 

06 Mar 2010, 06:54 

LocoDelAssembly
If you want an ALU solution, then you could make a dichotomic search. For every number you probe, multiply it by the N times and check if the result is the argument (make sure to check EDX is zero in between to ensure your are getting the real answer). Since some answers are not integer (and others not even real numbers), you'll have to make a decision on what to do with those.
Message to the rest: Since Tyler skipped my links he obviously wants this as an exercise so wait a little before posting code 

06 Mar 2010, 07:06 

Tyler
Is it okay to do algorithms in such a guesscheck type way? This may expose my inexperience
(yet again ), but I've always tried to find a definite solution with a fairly definite amount of repetitions, as opposed to that, which could repeat a large amount of times if the number was big enough. Is it possible to make a single rep equation, or do I have no choice than to use guesscheck. 

07 Mar 2010, 02:04 

edemko
what is the quess


07 Mar 2010, 02:10 

Tyler
The first number of the dichotomic search LocoDelAssembly mentioned.


07 Mar 2010, 02:19 

edemko
post #7 at
http://wasm.ru/forum/viewtopic.php?id=36161 The executive has its own to float converter which is updated but not included in the file offered. Tomasz's expressi.inc's get_fp_value at line 449 amused me more than the one i've got myself. We should give a transcription how our aliases are spelled 

07 Mar 2010, 02:28 

Tyler
Oh, I see now, that is a one rep equation like solution. Unfortunately, I have
no knowledge of how to program for the fpu(and barely any for the cpu ), so, although learning how to do floating point math in asm is on my todo list, at the present, if it uses anything that isn't stored by pushad(in exception of a few), it's above my head. serfasm wrote:
Sorry, I did that by accident because I remember Tomasz's name as the english Thomas. 

07 Mar 2010, 02:58 

LocoDelAssembly
You should take in mind however that although your code is loopless, the FPU internally has to do some sort of iterative work inside to compute the answer. The instructions FYL2X and F2XM1 take 96 and 69 cycles respectively on a Core2 Duo (family 06_0EH) according to Intel Optimization manual, and it is also noted that it could take longer. (fprem is not included in the manual, but Agner Fog says it takes 1656 cycles)
For the method I commented, if your input is 32bit numbers, it would cost you 16 probes max to compute the Nth square root (not 32 because your upper bound for searching would be 65536 to cover for the square root and could obviously be much less for cubic root, fourth root, etc) PS: But of course the FPU produce a better quality result than this ALU method. 

07 Mar 2010, 03:35 

windwakr
I know this is offtopic to his question, but this is the perfect place to ask this:
I've recently been looking into exponents(So I can implement variable exponent powers on Z for my Mandelbrot viewer, which I did, and it is awesome! See it in action: http://www.youtube.com/watch?v=xAgvqG3zEto ) and I came across this way of doing it(pieced it together from here): Code: fld [power] fld [number] fyl2x fld st0 frndint fsub st1,st0 fxch st1 f2xm1 fld1 faddp fscale fstp st1 Which is faster? This one, or the one in that madwizard link? 

07 Mar 2010, 03:50 

LocoDelAssembly
I don't know, I guess someone will have to time them both
BTW, there also exist an SSE version of pow inside msvcrt.dll which is so, so complex and large that I don't know if it is really faster on my computer. Check the link to this forum I posted. 

07 Mar 2010, 04:12 

edemko
Code: number dt 0.0 power dt 0.0 ; ... ; x^y = 2^z ; z = log(2;x)*y fld [power] ; y fld [number] ; x fabs ; engine requires x to be > 0 fyl2x ; z fld st0 ; z copy push $37f or (11b shl 10) ; truncating rounding... fldcw [esp] ; ...applied pop dword[esp] ; restore stack frndint ; z.whole fsub st1,st0 ; z.fraction fxch st1 ; z.fraction <> z.whole f2xm1 ; 2^z.fraction 1 fld1 ; 1 loaded faddp ; 2^z.fraction fscale ; 2^z.fraction * 2^z.whole = x^y fstp st1 ; restore fpu stack ; Tyler, you have got complicated TODO 45 bytes, x cycles. 

07 Mar 2010, 07:28 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.