flat assembler
Message board for the users of flat assembler.

Index > Main > Working with big numbers

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 05 Dec 2007, 14:10
Code:
; ----------------------------------------------------------------------

 AMD Optimizations

 (c) 2001 - 2006 Advanced Micro Devices, Inc. All rights reserved.

 http://developer.amd.com
        /htmlhelp/optimization/wwhelp/wwhimpl/java/html/wwhelp.htm

 (below taken from "Integer Optimizations" section):

; ----------------------------------------------------------------------


    Efficient 64-Bit Integer Arithmetic in 32-Bit Mode
    Optimization

    The following section contains a collection of code snippets and
    subroutines showing the efficient implementation of 64-bit arithmetic
    in 32-bit mode. Note that these are 32-bit recommendations, in 64-bit
    mode it is important to use 64-bit integer instructions for best
    performance.

    Addition, subtraction, negation, and shifting are best handled by inline
    code. Multiplication, division, and the computation of remainders are less
    common operations and are usually implemented as subroutines. If these
    subroutines are used often, the programmer should consider inlining them.
    Except for division and remainder calculations, the following code works
    for both signed and unsigned integers. The division and remainder code
    shown works for unsigned integers, but can easily be extended to handle
    signed integers.

    64-Bit Addition

    ; Add ECX:EBX to EDX:EAX, and place sum in EDX:EAX. 
    add eax, ebx 
    adc edx, ecx 

    64-Bit Subtraction

    ; Subtract ECX:EBX from EDX:EAX and place difference in EDX:EAX. 
    sub eax, ebx 
    sbb edx, ecx 

    64-Bit Negation

    ; Negate EDX:EAX. 
    not edx 
    neg eax 
    sbb edx, -1   ; Fix: Increment high word if low word was 0. 

    64-Bit Left Shift

    ; Shift EDX:EAX left, ??shift count in ECX (count 
    ;  applied modulo 64). 
       shld edx, eax, cl   ; First apply shift count. 
       shl  eax, cl        ; ??mod 32 to EDX:EAX 
       test ecx, 32        ; Need to shift by another 32? 
       jz   lshift_done    ; No, done. 
       mov  edx, eax       ; Left shift EDX:EAX 
       xor  eax, eax       ;  by 32 bits 
    lshift_done: 

    64-Bit Right Shift

       shrd eax, edx, cl   ; First apply shift count. 
       shr  edx, cl        ; ??mod 32 to EDX:EAX 
       test ecx, 32        ; Need to shift by another 32? 
       jz   rshift_done    ; No, done. 
       mov  eax, edx       ; Left shift EDX:EAX 
       xor  edx, edx       ;  by 32 bits. 
    rshift_done: 

    64-Bit Multiplication

    ; _llmul computes the low-order half of the product of its 
    ;  arguments, two 64-bit integers. 
    ; 
    ; In:       [ESP+8]:[ESP+4] = multiplicand 
    ;           [ESP+16]:[ESP+12] = multiplier 
    ; Out:      EDX:EAX = (multiplicand * multiplier) % 2^64 
    ; Destroys: EAX, ECX, EDX, EFlags 
    _llmul PROC 
       mov edx, [esp+8]    ; multiplicand_hi 
       mov ecx, [esp+16]   ; multiplier_hi 
       or  edx, ecx        ; One operand >= 2^32? 
       mov edx, [esp+12]   ; multiplier_lo 
       mov eax, [esp+4]    ; multiplicand_lo 
       jnz twomul          ; Yes, need two multiplies. 
       mul edx             ; multiplicand_lo * multiplier_lo 
       ret                 ; Done, return to caller. 
    twomul: 
       imul edx, [esp+8]         ; p3_lo = multiplicand_hi * multiplier_lo 
       imul ecx, eax             ; p2_lo = multiplier_hi * multiplicand_lo 
       add  ecx, edx             ; p2_lo + p3_lo 
       mul  dword ptr [esp+12]   ; p1 = multiplicand_lo * multiplier_lo 
       add  edx, ecx             ; p1 + p2_lo + p3_lo = result in EDX:EAX 
       ret                       ; Done, return to caller. 
    _llmul ENDP 

    64-Bit Unsigned Division

    ; _ulldiv divides two unsigned 64-bit integers and returns the quotient. 
    ; 
    ; In:       [ESP+8]:[ESP+4] = dividend 
    ;           [ESP+16]:[ESP+12] = divisor 
    ; Out:      EDX:EAX = quotient of division 
    ; Destroys: EAX, ECX, EDX, EFlags 
    _ulldiv PROC 
       push ebx             ; Save EBX as per calling convention. 
       mov  ecx, [esp+20]   ; divisor_hi 
       mov  ebx, [esp+16]   ; divisor_lo 
       mov  edx, [esp+12]   ; dividend_hi 
       mov  eax, [esp+8]    ; dividend_lo 
       test ecx, ecx        ; divisor > (2^32 - 1)? 
       jnz  big_divisor     ; Yes, divisor > 2^32 - 1. 
       cmp  edx, ebx        ; Only one division needed (ECX = 0)? 
       jae  two_divs        ; Need two divisions. 
       div  ebx             ; EAX = quotient_lo 
       mov  edx, ecx        ; EDX = quotient_hi = 0 (quotient in EDX:EAX) 
       pop  ebx             ; Restore EBX as per calling convention. 
       ret                  ; Done, return to caller. 
    two_divs: 
       mov  ecx, eax   ; Save dividend_lo in ECX. 
       mov  eax, edx   ; Get dividend_hi. 
       xor  edx, edx   ; Zero-extend it into EDX:EAX. 
       div  ebx        ; quotient_hi in EAX 
       xchg eax, ecx   ; ECX = quotient_hi, EAX = dividend_lo 
       div  ebx        ; EAX = quotient_lo 
       mov  edx, ecx   ; EDX = quotient_hi (quotient in EDX:EAX) 
       pop  ebx        ; Restore EBX as per calling convention. 
       ret             ; Done, return to caller. 
    big_divisor: 
       push edi                  ; Save EDI as per calling convention. 
       mov  edi, ecx             ; Save divisor_hi. 
       shr  edx, 1               ; Shift both divisor and dividend right 
       rcr  eax, 1               ;  by 1 bit. 
       ror  edi, 1 
       rcr  ebx, 1 
       bsr  ecx, ecx             ; ECX = number of remaining shifts 
       shrd ebx, edi, cl         ; Scale down divisor and dividend 
       shrd eax, edx, cl         ;  such that divisor is less than 
       shr  edx, cl              ;  2^32 (that is, it fits in EBX). 
       rol  edi, 1               ; Restore original divisor_hi. 
       div  ebx                  ; Compute quotient. 
       mov  ebx, [esp+12]        ; dividend_lo 
       mov  ecx, eax             ; Save quotient. 
       imul edi, eax             ; quotient * divisor high word (??low only) 
       mul  dword ptr [esp+20]   ; quotient * divisor low word 
       add  edx, edi             ; EDX:EAX = quotient * divisor 
       sub  ebx, eax             ; dividend_lo - (quot.*divisor)_lo 
       mov  eax, ecx             ; Get quotient. 
       mov  ecx, [esp+16]        ; dividend_hi 
       sbb  ecx, edx             ; Subtract (divisor * quot.) from dividend. 
       sbb  eax, 0               ; Adjust quotient if remainder negative. 
       xor  edx, edx             ; Clear high word of quot. (EAX<=FFFFFFFFh). 
       pop  edi                  ; Restore EDI as per calling convention. 
       pop  ebx                  ; Restore EBX as per calling convention. 
       ret                       ; Done, return to caller. 
    _ulldiv ENDP 

    64-Bit Signed Division

    ; _lldiv divides two signed 64-bit numbers and delivers the quotient  
    ; 
    ; In:       [ESP+8]:[ESP+4] = dividend 
    ;           [ESP+16]:[ESP+12] = divisor 
    ; Out:      EDX:EAX = quotient of division 
    ; Destroys: EAX, ECX,E DX, EFlags 
    _lldiv PROC 
       push ebx    ; Save EBX as per calling convention. 
       push esi    ; Save ESI as per calling convention. 
       push edi    ; Save EDI as per calling convention. 
       mov  ecx, [esp+28]   ; divisor_hi 
       mov  ebx, [esp+24]   ; divisor_lo 
       mov  edx, [esp+20]   ; dividend_hi 
       mov  eax, [esp+16]   ; dividend_lo 
       mov  esi, ecx        ; divisor_hi 
       xor  esi, edx        ; divisor_hi ^ dividend_hi  
       sar  esi, 31         ; (quotient < 0) ? -1 : 0 
       mov  edi, edx        ; dividend_hi 
       sar  edi, 31         ; (dividend < 0) ? -1 : 0 
       xor  eax, edi        ; If (dividend < 0), 
       xor  edx, edi        ;  compute 1's complement of dividend. 
       sub  eax, edi        ; If (dividend < 0), 
       sbb  edx, edi        ;  compute 2's complement of dividend. 
       mov  edi, ecx        ; divisor_hi 
       sar  edi, 31         ; (divisor < 0) ? -1 : 0 
       xor  ebx, edi        ; If (divisor < 0), 
       xor  ecx, edi        ;  compute 1's complement of divisor. 
       sub  ebx, edi        ; If (divisor < 0), 
       sbb  ecx, edi        ;  compute 2's complement of divisor. 
       jnz  big_divisor     ; divisor > 2^32 - 1 
       cmp  edx, ebx        ; Only one division needed (ECX = 0)? 
       jae  two_divs        ; Need two divisions. 
       div  ebx             ; EAX = quotient_lo 
       mov  edx, ecx        ; EDX = quotient_hi = 0 (quotient in EDX:EAX) 
       xor  eax, esi        ; If (quotient < 0), 
       xor  edx, esi        ;  compute 1's complement of result. 
       sub  eax, esi        ; If (quotient < 0), 
       sbb  edx, esi        ;  compute 2's complement of result. 
       pop  edi             ; Restore EDI as per calling convention. 
       pop  esi             ; Restore ESI as per calling convention. 
       pop  ebx             ; Restore EBX as per calling convention. 
       ret                  ; Done, return to caller. 
    two_divs: 
       mov  ecx, eax     ; Save dividend_lo in ECX. 
       mov  eax, edx     ; Get dividend_hi. 
       xor  edx, edx     ; Zero-extend it into EDX:EAX. 
       div  ebx          ; quotient_hi in EAX 
       xchg eax, ecx     ; ECX = quotient_hi, EAX = dividend_lo 
       div  ebx          ; EAX = quotient_lo 
       mov  edx, ecx     ; EDX = quotient_hi (quotient in EDX:EAX) 
       jmp  make_sign   ; Make quotient signed. 
    big_divisor: 
       sub  esp, 12             ; Create three local variables. 
       mov  [esp], eax          ; dividend_lo 
       mov  [esp+4], ebx        ; divisor_lo 
       mov  [esp+8], edx        ; dividend_hi 
       mov  edi, ecx            ; Save divisor_hi. 
       shr  edx, 1              ; Shift both 
       rcr  eax, 1              ;  divisor and 
       ror  edi, 1              ;  and dividend 
       rcr  ebx, 1              ;  right by 1 bit. 
       bsr  ecx, ecx            ; ECX = number of remaining shifts 
       shrd ebx, edi, cl        ; Scale down divisor and 
       shrd eax, edx, cl        ;  dividend such that divisor is 
       shr  edx, cl             ;  less than 2^32 (that is, fits in EBX). 
       rol  edi, 1              ; Restore original divisor_hi. 
       div  ebx                 ; Compute quotient. 
       mov  ebx, [esp]          ; dividend_lo 
       mov  ecx, eax            ; Save quotient. 
       imul edi, eax            ; quotient * divisor high word (??low only) 
       mul  DWORD PTR [esp+4]   ; quotient * divisor low word 
       add  edx, edi            ; EDX:EAX = quotient * divisor 
       sub  ebx, eax            ; dividend_lo - (quot.*divisor)_lo 
       mov  eax, ecx            ; Get quotient. 
       mov  ecx, [esp+8]        ; dividend_hi 
       sbb  ecx, edx            ; Subtract (divisor * quot.) from dividend 
       sbb  eax, 0              ; Adjust quotient if remainder is negative. 
       xor  edx, edx            ; Clear high word of quotient. 
       add  esp, 12             ; Remove local variables. 
    make_sign: 
       xor eax, esi   ; If (quotient < 0), 
       xor edx, esi   ;  compute 1's complement of result. 
       sub eax, esi   ; If (quotient < 0), 
       sbb edx, esi   ;  compute 2's complement of result. 
       pop edi        ; Restore EDI as per calling convention. 
       pop esi        ; Restore ESI as per calling convention. 
       pop ebx        ; Restore EBX as per calling convention. 
       ret            ; Done, return to caller. 
    _lldiv ENDP 

    64-Bit Unsigned Remainder Computation

    ; _ullrem divides two unsigned 64-bit integers and returns the remainder. 
    ; 
    ; In:       [ESP+8]:[ESP+4] = dividend 
    ;           [ESP+16]:[ESP+12] = divisor 
    ; 
    ; Out:      EDX:EAX = remainder of division 
    ; 
    ; Destroys: EAX, ECX, EDX, EFlags 
    _ullrem PROC 
       push ebx              ; Save EBX as per calling convention. 
       mov  ecx, [esp+20]    ; divisor_hi 
       mov  ebx, [esp+16]    ; divisor_lo 
       mov  edx, [esp+12]    ; dividend_hi 
       mov  eax, [esp+8]     ; dividend_lo 
       test ecx, ecx         ; divisor > 2^32 - 1? 
       jnz  r_big_divisor    ; Yes, divisor > 32^32 - 1. 
       cmp  edx, ebx         ; Only one division needed (ECX = 0)? 
       jae  r_two_divs       ; Need two divisions. 
       div  ebx              ; EAX = quotient_lo 
       mov  eax, edx         ; EAX = remainder_lo 
       mov  edx, ecx         ; EDX = remainder_hi = 0 
       pop  ebx              ; Restore EBX per calling convention. 
       ret                   ; Done, return to caller. 
    r_two_divs: 
       mov ecx, eax   ; Save dividend_lo in ECX. 
       mov eax, edx   ; Get dividend_hi. 
       xor edx, edx   ; Zero-extend it into EDX:EAX. 
       div ebx        ; EAX = quotient_hi, EDX = intermediate remainder 
       mov eax, ecx   ; EAX = dividend_lo 
       div ebx        ; EAX = quotient_lo 
       mov eax, edx   ; EAX = remainder_lo 
       xor edx, edx   ; EDX = remainder_hi = 0 
       pop ebx        ; Restore EBX as per calling convention. 
       ret            ; Done, return to caller. 
    r_big_divisor: 
       push edi                  ; Save EDI as per calling convention. 
       mov  edi, ecx             ; Save divisor_hi. 
       shr  edx, 1               ; Shift both divisor and dividend right 
       rcr  eax, 1               ;  by 1 bit. 
       ror  edi, 1 
       rcr  ebx, 1 
       bsr  ecx, ecx             ; ECX = number of remaining shifts 
       shrd ebx, edi, cl         ; Scale down divisor and dividend such 
       shrd eax, edx, cl         ;  that divisor is less than 2^32 
       shr  edx, cl              ;  (that is, it fits in EBX). 
       rol  edi, 1               ; Restore original divisor (EDI:ESI). 
       div  ebx                  ; Compute quotient. 
       mov  ebx, [esp+12]        ; dividend low word 
       mov  ecx, eax             ; Save quotient. 
       imul edi, eax             ; quotient * divisor high word (??low only) 
       mul  DWORD PTR [esp+20]   ; quotient * divisor low word 
       add  edx, edi             ; EDX:EAX = quotient * divisor 
       sub  ebx, eax             ; dividend_lo - (quot.*divisor)_lo 
       mov  ecx, [esp+16]        ; dividend_hi 
       mov  eax, [esp+20]        ; divisor_lo 
       sbb  ecx, edx             ; Subtract divisor * quot. from dividend. 
       sbb  edx, edx             ; (remainder < 0) ? 0xFFFFFFFF : 0 
       and  eax, edx             ; (remainder < 0) ? divisor_lo : 0 
       and  edx, [esp+24]        ; (remainder < 0) ? divisor_hi : 0 
       add  eax, ebx             ; remainder += (remainder < 0) ? divisor : 0 
       pop  edi                  ; Restore EDI as per calling convention. 
       pop  ebx                  ; Restore EBX as per calling convention. 
       ret                       ; Done, return to caller. 
    _ullrem ENDP 

    64-Bit Signed Remainder Computation

    ; _llrem divides two signed 64-bit numbers and returns the remainder. 
    ; 
    ; In:       [ESP+8]:[ESP+4] = dividend 
    ;           [ESP+16]:[ESP+12] = divisor 
    ; 
    ; Out:      EDX:EAX = remainder of division 
    ; 
    ; Destroys: EAX, ECX, EDX, EFlags 
       push ebx               ; Save EBX as per calling convention. 
       push esi               ; Save ESI as per calling convention. 
       push edi               ; Save EDI as per calling convention. 
       mov  ecx, [esp+28]     ; divisor-hi 
       mov  ebx, [esp+24]     ; divisor-lo 
       mov  edx, [esp+20]     ; dividend-hi 
       mov  eax, [esp+16]     ; dividend-lo 
       mov  esi, edx          ; sign(remainder) == sign(dividend) 
       sar  esi, 31           ; (remainder < 0) ? -1 : 0 
       mov  edi, edx          ; dividend-hi 
       sar  edi, 31           ; (dividend < 0) ? -1 : 0 
       xor  eax, edi          ; If (dividend < 0), 
       xor  edx, edi          ;  compute 1's complement of dividend. 
       sub  eax, edi          ; If (dividend < 0), 
       sbb  edx, edi          ;  compute 2's complement of dividend. 
       mov  edi, ecx          ; divisor-hi 
       sar  edi, 31           ; (divisor < 0) ? -1 : 0 
       xor  ebx, edi          ; If (divisor < 0), 
       xor  ecx, edi          ;  compute 1's complement of divisor. 
       sub  ebx, edi          ; If (divisor < 0), 
       sbb  ecx, edi          ;  compute 2's complement of divisor. 
       jnz  sr_big_divisor    ; divisor > 2^32 - 1 
       cmp  edx, ebx          ; Only one division needed (ECX = 0)? 
       jae  sr_two_divs       ; No, need two divisions. 
       div  ebx               ; EAX = quotient_lo 
       mov  eax, edx          ; EAX = remainder_lo 
       mov  edx, ecx          ; EDX = remainder_lo = 0 
       xor  eax, esi          ; If (remainder < 0), 
       xor  edx, esi          ;  compute 1's complement of result. 
       sub  eax, esi          ; If (remainder < 0), 
       sbb  edx, esi          ;  compute 2's complement of result. 
       pop  edi               ; Restore EDI as per calling convention. 
       pop  esi               ; Restore ESI as per calling convention. 
       pop  ebx               ; Restore EBX as per calling convention. 
       ret                    ; Done, return to caller. 
    sr_two_divs: 
       mov ecx, eax        ; Save dividend_lo in ECX. 
       mov eax, edx        ; Get dividend_hi. 
       xor edx, edx        ; Zero-extend it into EDX:EAX. 
       div ebx             ; EAX = quotient_hi, EDX = intermediate remainder 
       mov eax, ecx        ; EAX = dividend_lo 
       div  ebx            ; EAX = quotient_lo 
       mov  eax, edx       ; remainder_lo 
       xor  edx, edx       ; remainder_hi = 0 
       jmp  sr_makesign    ; Make remainder signed. 
    sr_big_divisor: 
       sub  esp, 16             ; Create three local variables. 
       mov  [esp], eax          ; dividend_lo              
       mov  [esp+4], ebx        ; divisor_lo              
       mov  [esp+8], edx        ; dividend_hi 
       mov  [esp+12], ecx       ; divisor_hi 
       mov  edi, ecx            ; Save divisor_hi. 
       shr  edx, 1              ; Shift both 
       rcr  eax, 1              ;  divisor and 
       ror  edi, 1              ;  and dividend 
       rcr  ebx, 1              ;  right by 1 bit. 
       bsr  ecx, ecx            ; ECX = number of remaining shifts 
       shrd ebx, edi, cl        ; Scale down divisor and 
       shrd eax, edx, cl        ;  dividend such that divisor is 
       shr  edx, cl             ;  less than 2^32 (that is, fits in EBX). 
       rol  edi, 1              ; Restore original divisor_hi. 
       div  ebx                 ; Compute quotient. 
       mov  ebx, [esp]          ; dividend_lo 
       mov  ecx, eax            ; Save quotient. 
       imul edi, eax            ; quotient * divisor high word (??low only) 
       mul  DWORD PTR [esp+4]   ; quotient * divisor low word 
       add  edx, edi            ; EDX:EAX = quotient * divisor 
       sub  ebx, eax            ; dividend_lo - (quot.*divisor)_lo 
       mov  ecx, [esp+8]        ; dividend_hi 
       sbb  ecx, edx            ; Subtract divisor * quot. from dividend. 
       sbb  eax, eax            ; remainder < 0 ? 0xffffffff : 0 
       mov  edx, [esp+12]       ; divisor_hi 
       and  edx, eax            ; remainder < 0 ? divisor_hi : 0 
       and  eax, [esp+4]        ; remainder < 0 ? divisor_lo : 0 
       add  eax, ebx            ; remainder_lo 
       add  edx, ecx            ; remainder_hi 
       add  esp, 16             ; Remove local variables. 
    sr_makesign: 
       xor eax, esi   ; If (remainder < 0), 
       xor edx, esi   ;  compute 1's complement of result. 
       sub eax, esi   ; If (remainder < 0), 
      sbb edx, esi   ;  compute 2's complement of result. 
       pop edi        ; Restore EDI as per calling convention. 
       pop esi        ; Restore ESI as per calling convention. 
       pop ebx        ; Restore EBX as per calling convention. 
       ret            ; Done, return to caller. 
    
Post 05 Dec 2007, 14:10
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia
vid 05 Dec 2007, 14:30
Thanks. Don't you happen to have direct link to this manual, and name of chapter/section where this is? I'd add it as reference to article.

Things in article are basically done same way, except for NEG and that was already pointed out by bitrake on x86asm board.

Those bignum divisions are really interesting, I'd have to study them someday.
Post 05 Dec 2007, 14:30
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 05 Dec 2007, 14:45
Sorry, I posted as close to a direct reference (URL) as I could.

Software Optimization Guide for AMD64 Processors
- Integer Optimizations
-- Efficient 64-Bit Integer Arithmatic in 32-Bit Mode
Post 05 Dec 2007, 14:45
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia
vid 05 Dec 2007, 14:50
Thanks a lot.

I plan to add bignum multiplication, and with your help even division someday to article.
Post 05 Dec 2007, 14:50
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 07 Jan 2008, 05:41
Post 07 Jan 2008, 05:41
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia
vid 07 Jan 2008, 11:25
MCD:
not sure about math stuff, but my O(n*log(n)) is number of MUL instructions used, probably not what math people intended. My idea is something like this:

c = a*b
(all number consist of 4 dwords,
c0 = first dword of destination, c1 = second, etc...
edx is upper 32bits of previous multiplication)
Code:
c0 = a0 * b0
c1 = a1 * b0 + edx
c2 = a2 * b0 + edx
c3 = a3 * b0 + edx

c1 += a0 * b1
c2 += a1 * b1 + edx
c3 += a2 * b1 + edx

c2 += a0 * b2
c3 += a1 * b2 + edx

c3 += a0 * b3
    

Isn't this O(n*log(n)) number of MUL instructions?
Post 07 Jan 2008, 11:25
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2138
Location: Estonia
Madis731 07 Jan 2008, 13:19
But if you want this code to scale without loops, you need to scale your code that does the operation.
If you add 128-bit numbers with 32-bit registers you need a loop ECX=4
If you mul 128-bit numbers with 64-bit registers you can do this manually
but MUL128 with R32 is tricky. You still need an algorithm, which means shifting is needed on sub-calculations.
Post 07 Jan 2008, 13:19
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia
vid 07 Jan 2008, 14:13
of course that it would be done with loops.

but number of MULs is N + (N-1) + (N-2) + (N-3) + ... + 1

isn't this O(n*log(n))? I never really knew big-O notation
Post 07 Jan 2008, 14:13
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4623
Location: Argentina
LocoDelAssembly 07 Jan 2008, 14:38
The "N + (N-1) + (N-2) + (N-3) + ... + 1" is sometimes refered as the sum of the first N natural numbers (though, you wrote it in reverse order Razz).

Doing some maths:
Code:
1 + 2 + ... + N-1 + N = ((1 + 2 + 3 + ... + N-2 + N-1 + N) + (N + N-1 + N-2 + ... + 3 + 2 + 1))/2
                    = (1+N + 2+N-1 + 3+N-2 + ... + N-2+3 + N-1+2 + N+1)/2
               = (N+1 + N+1 + N+1 + ... + N+1 + N+1 + N+1)/2
               = N(N+1)/2    


However, while N(N+1)/2 for N=100 is 5050, log(100) is 2 and log2(100) is 6,64.. Not sure if the notation refers to log2 or log10 but non of those matches your calculation of the number of MULs Razz
Post 07 Jan 2008, 14:38
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia
vid 07 Jan 2008, 14:54
okay, so it is O(N^2)... my bad Smile
I was just quessing anyway...
Post 07 Jan 2008, 14:54
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4623
Location: Argentina
LocoDelAssembly 07 Jan 2008, 14:57
Doesn't exists O(N(N+1)/2)? N^2 is ultra big on big Ns, much much bigger than N(N+1)/2
Post 07 Jan 2008, 14:57
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8526
Location: Kraków, Poland
Tomasz Grysztar 07 Jan 2008, 15:02
N(N+1)/2= 0.5 N^2 + 0.5 N, which is just O(N^2)
For O() notation it doesn't matter whether it is 10*N^2 or 0.0001*N^2, it's stil O(N^2)

Why? Because for sizes large enough A*N^2 will always be larger than B*N and smaller than C*N^3, no matter what A, B and C are.
Post 07 Jan 2008, 15:02
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4429
Location: vpcmpistri
bitRAKE 08 Jan 2008, 05:36
I'd like to suggest working in a residual number system (RNS) because it is very parrallel in nature. It is costly to transfer between number systems, but often that only needs to be done at beginning and end of problem. Which means it is not a solution at the single operation level.

http://www.math.utah.edu/~pa/MDS/residual.html

...and it is easy to understand/implement at the code/instruction level.
Post 08 Jan 2008, 05:36
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 08 Jan 2008, 05:41
@vid: your algorithm is just a nice and clean rearangement of the usual multiplication algorithm. But in case most of your factors are only a few times bigger than the word size of 32..64bit(maybe say up to 256..4096bits), your algorithm will suretainly outperform the others.

I can't say much more about this subject right now because I'm not into it, but if someone is interested in it, I found this interesting paper about a FFT-based big number multiply:
http://citeseer.ist.psu.edu/yap00quickmul.html
Post 08 Jan 2008, 05:41
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2138
Location: Estonia
Madis731 08 Jan 2008, 08:40
I dug into FFT a while ago when I wanted to perform A*B and A^B (powers not xor Razz) with big numbers. http://board.flatassembler.net/topic.php?t=4882 There are code samples about the primitive method, but they are not very good. I couldn't get the hang of FFT so I didn't convert it to using it.
Post 08 Jan 2008, 08:40
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
ATV



Joined: 31 Aug 2004
Posts: 109
Location: Finland
ATV 08 Jan 2008, 15:14
I have written multiply algorithms same way as you calculate with pen and paper. First code was written 1985-86 with 8bit 6502 assembly.

Later I have realize, that algorithm is very slow, better use already written librarys. Like GMP or something. They are over 10x faster.

_________________
6213186413*2^605+1 divides Fermat F(600) and 121531*2^30260+1 divides Fermat F(30256)
Post 08 Jan 2008, 15:14
View user's profile Send private message Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 08 Jan 2008, 21:17
Madis731 wrote:
...I couldn't get the hang of FFT so I didn't convert it to using it.
I believe that most of us have trouble comprehending it!!! I know I do. It may be just sukoshi easier to understand, if one begins instead with the DFT (discrete Fourier transform), because the FFT is fast, only because of eliminating the many redundant multiplications found in the traditional DFT. The fastest FFT uses a "split radix", i.e. Radix 2 mixed with radix 4. Again, this is not necessary to study, in order to make (a little) progress on the DFT itself. The main point of Fourier's brilliant discovery is this:
Any time varying signal can be expressed as a sum of a series of sinusoids.
http://www.relisoft.com/Science/Physics/sound.html
http://www.dspguru.com/info/faqs/fftfaq2.htm
http://www.gweep.net/~shifty/portfolio/fftmulspreadsheet/index.html
http://www.dataq.com/applicat/articles/an11.htm
http://ccrma.stanford.edu/~lantz/fourier.html
http://www.eeglossary.com/fft.htm
Post 08 Jan 2008, 21:17
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 09 Jan 2008, 05:34
The concept around the (fast) Fourier transformation and even the discret cosinus transformation are quiet clear for me, but the thing I don't understand is how to use the FFT for things that are no functions.

Usually you apply the FFT to a function like for example f: t -> E (t being time and E being energy) to get a FFT-function of it: f_FFT: t -> E, this is what tom tobias basically said
Quote:

The main point of Fourier's brilliant discovery is this:
Any time varying signal can be expressed as a sum of a series of sinusoids.

You can generalize this to any kind of function f: x -> y where x and y are any kind of value. (This is a 1-dimensional FFT, and of course FFTs can be any higher dimensions.)
now, if you don't have those 2 values, but just 1 value, like it is the case for one big number, how do you make a FFT out of that? for me there is some value missing.

(The DCT is a special case of the FFT, where the function is discretized and thus the integrals become sums. Also the evaluation points of the function are even spaced and their number is a power of 2 for 1 period).
Post 09 Jan 2008, 05:34
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4429
Location: vpcmpistri
bitRAKE 09 Jan 2008, 08:25
Imagine the number is a time variant signal - each bit is in fact multiple particles moving through the circuitry! Either way (parallel or serial) we look at it the bits don't arrive at the same time.
Post 09 Jan 2008, 08:25
View user's profile Send private message Visit poster's website Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1459
Location: Piraeus, Greece
Picnic 22 Nov 2008, 21:45
vid wrote:
hi, i wrote an article how to do basic operations with bignums (big numbers, more than 32 / 64 bits):

http://www.x86asm.net/articles/working-with-big-numbers-using-x86-instructions/

Interesting and instructive article vid.

_________________
Hobby BASIC Interpreter
Post 22 Nov 2008, 21:45
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2026, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.