flat assembler
Message board for the users of flat assembler.
Index
> Main > BIGINT Factorial |
Author |
|
Madis731 08 Jun 2009, 13:12
I'm just wondering if one can do this without shuffling so much on stack, but at least we can make it faster (~2000clk vs ~10000clk in my experiments):
Code: factorial: ; I couldn't find a "normal" way of doing the stacking ; Now the calling-address is interleaved with data on the stack ; mov rax,r11 lea rcx,[r11-1] xor ebx,ebx .inner: mul rcx test rax,rax jz @f push rax call .recur pop rax @@: sub rcx,1 jne .inner ret .recur: mov rax,[rsp-8] mov rbx,rdx mul rcx add rax,rbx ; adc qword[rsp-24],0 test rax,rax jz @f push rax call .recur pop rax @@: ; mov [rsp-8],rax ret |
|||
08 Jun 2009, 13:12 |
|
bitRAKE 08 Jun 2009, 15:22
I have no idea what is at [RSP-8] -- looks like bad mojo.
There are several different methods at: http://www.luschny.de/math/factorial/index.html ...should be easy to reduce by a factor of 4 just using the Linear Difference method - almost the same code. Just kind of brain-storming still - I haven't done any time testing, yet. Each algorithm will have a performance characteristic, as well as external requirements to consider. For small values... Code: ; factorial RCX in interval [0,20] mov rax,1 jrcxz .0 .n: mul rcx loop .n .0: Could use a dispatcher to select the routine based on size of RCX: Code: Dispatcher: virtual at rbp .fac FUNCTION end virtual @@: cmp [.fac.Limit],rcx lea rbp,[rbp+sizeof.FUNCTION] jc @B call [.fac.Routine - sizeof.FUNCTION] |
|||
08 Jun 2009, 15:22 |
|
pal 08 Jun 2009, 17:26
Hmm, interesting codes.
I know it is only an approximation method, but how would an implementation of Stirling's Approximation (or a derivative of it) compare to the standard method. (It is better for larger factorials then faily small ones like 52!). Code: http://en.wikipedia.org/wiki/Stirling%27s_approximation http://mathworld.wolfram.com/StirlingsApproximation.html |
|||
08 Jun 2009, 17:26 |
|
bitRAKE 08 Jun 2009, 18:14
If you're talking in terms of speed - it'd obviously be faster if only 64 significant bits were needed of 1000000! to use an approximation. At some point all that is possible is an approximation: if we could by some magic write a decimal digit on each atom in the observable universe, we would be limited to 10^10^80 -- which is smaller than 2^260 !.
|
|||
08 Jun 2009, 18:14 |
|
revolution 08 Jun 2009, 18:23
2^260 ~= 1.8526734e+78
|
|||
08 Jun 2009, 18:23 |
|
bitRAKE 08 Jun 2009, 18:39
|
|||
08 Jun 2009, 18:39 |
|
Madis731 09 Jun 2009, 06:02
Actually I tested WolframAlpha extensively when testing the output of my factorial program. 2^260 is also calculable there, I wonder why you switched tools when "Wolfie" was already open
|
|||
09 Jun 2009, 06:02 |
|
Borsuc 09 Jun 2009, 20:39
Why Wolfram Alpha and not the excellent offline Pari/GP calculator (and free too)?
|
|||
09 Jun 2009, 20:39 |
|
Madis731 10 Jun 2009, 06:17
^o) hmm... maybe PariI/GP and GMP were not advertised that much.
Yet by a quick look I can see that GMP is really hard to use ("pi", "pi()", "sin", "sin(30)" are not recognized). Only thing I could do was the already used "^" and simple arithmetic ("+-*/"). Then again "3.2" nor "3,2" are recgnized Many signs you have to turn to hex ("+" == "%2B") before they start to work. This link http://gmplib.org/#TRY I found, helps a bit, though. Pari/GP doesn't show me a quick way to get answers to my questions. I would have to download it and then get it to compile and I think it would take at least a day or two before I get my "2^260" answered. |
|||
10 Jun 2009, 06:17 |
|
r22 10 Jun 2009, 12:46
I was thinking SSE could help speed up the factorial, but because of the carry and the fact that SSE doesn't have an UNsigned integer multiply it probably wouldn't help much.
Unless you wanted to calculate N partial factorials in parallel then combine them as a last step. 52! = Product[1-13]x * Product[14-26]y * Product[27-39]z * Product[40-52]w In using SIMD it might even be easier to interleave the products 52! = Prod[1-13](4x - 3) * Prod[1-13](4y-2) * ... * Prod[1-13](4w) ;;For those that don't follow my notation (1*5*9*13...) * (2*6*10*14...) * (3*7*11*15...) * (4*8*12*16...) |
|||
10 Jun 2009, 12:46 |
|
Madis731 10 Jun 2009, 19:23
I figured that one out pretty quickly that if you want to know the last 64 bits of the result you can just MUL in a loop.
Btw, there is "PMULUDQ—Multiply Packed Unsigned Doubleword Integers", but like you said there's no carry and all my tries also ended in a dead end. |
|||
10 Jun 2009, 19:23 |
|
Borsuc 11 Jun 2009, 00:24
Madis731 wrote: Pari/GP doesn't show me a quick way to get answers to my questions. I would have to download it and then get it to compile and I think it would take at least a day or two before I get my "2^260" answered. Plus the true power lies in its scripts. Give it a try, and it's free. _________________ Previously known as The_Grey_Beast |
|||
11 Jun 2009, 00:24 |
|
bitRAKE 16 Jun 2009, 05:42
Madis731 wrote: I wonder why you switched tools when "Wolfie" was already open http://ex-cs.sist.ac.jp/~tkouya/try_mpfr.html ...seems to have a usable interface. Madis731 wrote: I figured that one out pretty quickly that if you want to know the last 64 bits of the result you can just MUL in a loop. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
16 Jun 2009, 05:42 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.