flat assembler
Message board for the users of flat assembler.
Index
> Main > nth roots in asm 
Author 

LocoDelAssembly 06 Mar 2010, 05:56
Take a look to http://board.flatassembler.net/topic.php?t=10488 or maybe http://www.madwizard.org/programming/snippets?id=36 .
Remember that the Nth root of X is also X^(1/N). 

06 Mar 2010, 05:56 

Tyler 06 Mar 2010, 06:43
I haven't looked at the resources you provided yet, but the reason I knew of X^(1/N) and didn't
see it as the answer is because I would do exponents as a loop that repeated for the number that is the exponent, each rep, muling the number by the previous total. Like Code: ; ax^cx exponent: mov ax, bx @@: imul ax, bx dec cx cmp cx, 0 jne @b Say I want to do 16^(1/2), I can't because I can't do 1/2 repetition. 

06 Mar 2010, 06:43 

edemko 06 Mar 2010, 06:51
Quote:
x^y = 2^z z = log(2;x^y) = log(2;x)*y use fpu 

06 Mar 2010, 06:51 

edemko 06 Mar 2010, 06:54
Quote:
equals Code: loop @b 

06 Mar 2010, 06:54 

LocoDelAssembly 06 Mar 2010, 07:06
If you want an ALU solution, then you could make a dichotomic search. For every number you probe, multiply it by the N times and check if the result is the argument (make sure to check EDX is zero in between to ensure your are getting the real answer). Since some answers are not integer (and others not even real numbers), you'll have to make a decision on what to do with those.
Message to the rest: Since Tyler skipped my links he obviously wants this as an exercise so wait a little before posting code 

06 Mar 2010, 07:06 

Tyler 07 Mar 2010, 02:04
Is it okay to do algorithms in such a guesscheck type way? This may expose my inexperience
(yet again ), but I've always tried to find a definite solution with a fairly definite amount of repetitions, as opposed to that, which could repeat a large amount of times if the number was big enough. Is it possible to make a single rep equation, or do I have no choice than to use guesscheck. 

07 Mar 2010, 02:04 

edemko 07 Mar 2010, 02:10
what is the quess


07 Mar 2010, 02:10 

Tyler 07 Mar 2010, 02:19
The first number of the dichotomic search LocoDelAssembly mentioned.


07 Mar 2010, 02:19 

edemko 07 Mar 2010, 02:28
post #7 at
http://wasm.ru/forum/viewtopic.php?id=36161 The executive has its own to float converter which is updated but not included in the file offered. Tomasz's expressi.inc's get_fp_value at line 449 amused me more than the one i've got myself. We should give a transcription how our aliases are spelled 

07 Mar 2010, 02:28 

Tyler 07 Mar 2010, 02:58
Oh, I see now, that is a one rep equation like solution. Unfortunately, I have
no knowledge of how to program for the fpu(and barely any for the cpu ), so, although learning how to do floating point math in asm is on my todo list, at the present, if it uses anything that isn't stored by pushad(in exception of a few), it's above my head. serfasm wrote:
Sorry, I did that by accident because I remember Tomasz's name as the english Thomas. 

07 Mar 2010, 02:58 

LocoDelAssembly 07 Mar 2010, 03:35
You should take in mind however that although your code is loopless, the FPU internally has to do some sort of iterative work inside to compute the answer. The instructions FYL2X and F2XM1 take 96 and 69 cycles respectively on a Core2 Duo (family 06_0EH) according to Intel Optimization manual, and it is also noted that it could take longer. (fprem is not included in the manual, but Agner Fog says it takes 1656 cycles)
For the method I commented, if your input is 32bit numbers, it would cost you 16 probes max to compute the Nth square root (not 32 because your upper bound for searching would be 65536 to cover for the square root and could obviously be much less for cubic root, fourth root, etc) PS: But of course the FPU produce a better quality result than this ALU method. 

07 Mar 2010, 03:35 

windwakr 07 Mar 2010, 03:50
I know this is offtopic to his question, but this is the perfect place to ask this:
I've recently been looking into exponents(So I can implement variable exponent powers on Z for my Mandelbrot viewer, which I did, and it is awesome! See it in action: http://www.youtube.com/watch?v=xAgvqG3zEto ) and I came across this way of doing it(pieced it together from here): Code: fld [power] fld [number] fyl2x fld st0 frndint fsub st1,st0 fxch st1 f2xm1 fld1 faddp fscale fstp st1 Which is faster? This one, or the one in that madwizard link? 

07 Mar 2010, 03:50 

LocoDelAssembly 07 Mar 2010, 04:12
I don't know, I guess someone will have to time them both
BTW, there also exist an SSE version of pow inside msvcrt.dll which is so, so complex and large that I don't know if it is really faster on my computer. Check the link to this forum I posted. 

07 Mar 2010, 04:12 

edemko 07 Mar 2010, 07:28
Code: number dt 0.0 power dt 0.0 ; ... ; x^y = 2^z ; z = log(2;x)*y fld [power] ; y fld [number] ; x fabs ; engine requires x to be > 0 fyl2x ; z fld st0 ; z copy push $37f or (11b shl 10) ; truncating rounding... fldcw [esp] ; ...applied pop dword[esp] ; restore stack frndint ; z.whole fsub st1,st0 ; z.fraction fxch st1 ; z.fraction <> z.whole f2xm1 ; 2^z.fraction 1 fld1 ; 1 loaded faddp ; 2^z.fraction fscale ; 2^z.fraction * 2^z.whole = x^y fstp st1 ; restore fpu stack ; Tyler, you have got complicated TODO 45 bytes, x cycles. 

07 Mar 2010, 07:28 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.