flat assembler
Message board for the users of flat assembler.

Index > Main > BIGINT Factorial

Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 08 Jun 2009, 07:25
So, how big is 52! ? How many bits is it?

Here is a method which grows the factorial on the stack - no need to approximate size and pre-allocate memory. The same technique works in 32-bit or 16-bit.

Code:
BIGINT.factorial: ;...of R11

     push 1 1 1
  pop rcx r8

.next:
        xor edx,edx
 mov r9,rdx
  mov r10,rdx
 push rcx
    jmp .1

.rdx: add [rsp],rax
       jmp .C

.0:   add [rsp+rcx*8+8],rax
       adc r9,rdx
.1:   mov rax,[rsp+rcx*8]
 mul r8
      mov [rsp+rcx*8],r9
  mov r9,r10 ; zero
   loop .0
     pop rcx
     ; CF set when RDX non-zero
  jc .rdx
     add [rsp],rax
       jnc .2
.C:       adc rdx,r9
  ; increase number size
      push rdx
    inc ecx
.2:      inc r8
      cmp r11,r8
  jnc .next

       ; number is generated and size in qwords is known (RCX)
     ; allocate space and copy from stack

    lea rsp,[rsp+rcx*8]
 retn    
...in my case, I just used the number on the stack; but it's in reverse order - with the high qwords low in memory.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 08 Jun 2009, 07:25
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 08 Jun 2009, 13:12
I'm just wondering if one can do this without shuffling so much on stack, but at least we can make it faster (~2000clk vs ~10000clk in my experiments):
Code:
factorial:
; I couldn't find a "normal" way of doing the stacking
; Now the calling-address is interleaved with data on the stack
; Sad
        mov     rax,r11
        lea     rcx,[r11-1]
        xor     ebx,ebx
      .inner:
        mul     rcx
        test    rax,rax
        jz      @f
        push    rax
        call    .recur
        pop     rax
      @@:
        sub     rcx,1
        jne     .inner
ret
      .recur:
        mov     rax,[rsp-8]
        mov     rbx,rdx
        mul     rcx
        add     rax,rbx
;        adc     qword[rsp-24],0
        test    rax,rax
        jz      @f
        push    rax
        call    .recur
        pop     rax
      @@:
;        mov     [rsp-8],rax
ret                         
    
Post 08 Jun 2009, 13:12
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 08 Jun 2009, 15:22
I have no idea what is at [RSP-8] -- looks like bad mojo. Confused

There are several different methods at:
http://www.luschny.de/math/factorial/index.html

...should be easy to reduce by a factor of 4 just using the Linear Difference method - almost the same code. Just kind of brain-storming still - I haven't done any time testing, yet. Each algorithm will have a performance characteristic, as well as external requirements to consider. For small values...
Code:
; factorial RCX in interval [0,20]
   mov rax,1
   jrcxz .0
.n:     mul rcx
     loop .n
.0:    
...is less bytes than storing the table.

Could use a dispatcher to select the routine based on size of RCX:
Code:
Dispatcher:
  virtual at rbp
              .fac FUNCTION
       end virtual

@@:      cmp [.fac.Limit],rcx
        lea rbp,[rbp+sizeof.FUNCTION]
       jc @B
       call [.fac.Routine - sizeof.FUNCTION]    
Post 08 Jun 2009, 15:22
View user's profile Send private message Visit poster's website Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal 08 Jun 2009, 17:26
Hmm, interesting codes.

I know it is only an approximation method, but how would an implementation of Stirling's Approximation (or a derivative of it) compare to the standard method. (It is better for larger factorials then faily small ones like 52!).

Code:
http://en.wikipedia.org/wiki/Stirling%27s_approximation
http://mathworld.wolfram.com/StirlingsApproximation.html
    
Post 08 Jun 2009, 17:26
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 08 Jun 2009, 18:14
If you're talking in terms of speed - it'd obviously be faster if only 64 significant bits were needed of 1000000! to use an approximation. At some point all that is possible is an approximation: if we could by some magic write a decimal digit on each atom in the observable universe, we would be limited to 10^10^80 -- which is smaller than 2^260 !. Very Happy
Post 08 Jun 2009, 18:14
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20300
Location: In your JS exploiting you and your system
revolution 08 Jun 2009, 18:23
2^260 ~= 1.8526734e+78
Post 08 Jun 2009, 18:23
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 08 Jun 2009, 18:39
2^260 = 1852 67342 77970 59126 77713 57601 39006 52565 23197 54650 24902 46313 21344 12661 00742 38976 [GMP].
(Did you see the "!" there?)
Post 08 Jun 2009, 18:39
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 09 Jun 2009, 06:02
Actually I tested WolframAlpha extensively when testing the output of my factorial program. 2^260 is also calculable there, I wonder why you switched tools when "Wolfie" was already open Smile
Post 09 Jun 2009, 06:02
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 09 Jun 2009, 20:39
Why Wolfram Alpha and not the excellent offline Pari/GP calculator (and free too)?
Post 09 Jun 2009, 20:39
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 10 Jun 2009, 06:17
^o) hmm... maybe PariI/GP and GMP were not advertised that much.

Yet by a quick look I can see that GMP is really hard to use ("pi", "pi()", "sin", "sin(30)" are not recognized). Only thing I could do was the already used "^" and simple arithmetic ("+-*/"). Then again "3.2" nor "3,2" are recgnized Sad Many signs you have to turn to hex ("+" == "%2B") before they start to work. This link http://gmplib.org/#TRY I found, helps a bit, though.

Pari/GP doesn't show me a quick way to get answers to my questions. I would have to download it and then get it to compile and I think it would take at least a day or two before I get my "2^260" answered.
Post 10 Jun 2009, 06:17
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 10 Jun 2009, 12:46
I was thinking SSE could help speed up the factorial, but because of the carry and the fact that SSE doesn't have an UNsigned integer multiply it probably wouldn't help much.

Unless you wanted to calculate N partial factorials in parallel then combine them as a last step.

52! = Product[1-13]x * Product[14-26]y * Product[27-39]z * Product[40-52]w

In using SIMD it might even be easier to interleave the products
52! = Prod[1-13](4x - 3) * Prod[1-13](4y-2) * ... * Prod[1-13](4w)
;;For those that don't follow my notation
(1*5*9*13...) * (2*6*10*14...) * (3*7*11*15...) * (4*8*12*16...)
Post 10 Jun 2009, 12:46
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 10 Jun 2009, 19:23
Smile I figured that one out pretty quickly that if you want to know the last 64 bits of the result you can just MUL in a loop.

Btw, there is "PMULUDQ—Multiply Packed Unsigned Doubleword Integers", but like you said there's no carry and all my tries also ended in a dead end.
Post 10 Jun 2009, 19:23
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 11 Jun 2009, 00:24
Madis731 wrote:
Pari/GP doesn't show me a quick way to get answers to my questions. I would have to download it and then get it to compile and I think it would take at least a day or two before I get my "2^260" answered.
It comes in binary I didn't compile anything and once it's set up, it's portable, you can even put it on your USB stick or something for the rest of your life.

Plus the true power lies in its scripts. Give it a try, and it's free. Smile

_________________
Previously known as The_Grey_Beast
Post 11 Jun 2009, 00:24
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 16 Jun 2009, 05:42
Madis731 wrote:
I wonder why you switched tools when "Wolfie" was already open Smile
Some people don't like to use anything other than basic X/HTML in their browser, so GMP is another option.

http://ex-cs.sist.ac.jp/~tkouya/try_mpfr.html

...seems to have a usable interface.
Madis731 wrote:
I figured that one out pretty quickly that if you want to know the last 64 bits of the result you can just MUL in a loop.
They are mostly zero. Very Happy

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 16 Jun 2009, 05:42
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.