flat assembler
Message board for the users of flat assembler.

Index > Main > Working with big numbers

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
hi, i wrote an article how to do basic operations with bignums (big numbers, more than 32 / 64 bits):

http://www.x86asm.net/articles/working-with-big-numbers-using-x86-instructions/
Post 20 Nov 2007, 18:19
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
Quote:

can't be done in any straightforward way.

Mult via addition and div via subtraction)
Post 20 Nov 2007, 18:30
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
By straightforward way i meant something linear, without looping, some O(N) way. I plan to add example of routine to multiply two big numbers, it should be quite simple, O(N*log(N)) i think.
Post 20 Nov 2007, 19:02
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
realcr



Joined: 02 Apr 2007
Posts: 39
realcr
Hey vid.

I really like reading your pieces of code , so first great thanks for sharing it.
Asm people like speed , so you probably want to use an efficient algorithm for multiplication , like fast fourier transform for the discrete case.

realcr.
Post 20 Nov 2007, 20:26
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
Unfortunately i don't know FFT yet. I thought FFT is used for exponentiation, not multiplication.

My purpose in this article was to demonstrate real basics. Like how to work with 64bit numbers on 32bit processor, etc.

If you are really after speed, then use some ready-made library, which has been developed for several years. It will be likely faster than what you would write.
Post 20 Nov 2007, 20:36
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Garthower



Joined: 21 Apr 2006
Posts: 158
Location: Ukraine
Garthower
Hi vid!
Some time back I too was engaged in a similar problem. I shall not write the theory, you and have well painted it. Here my code for various operations, can, it will be useful to someone.

Add numbers
Code:
 ; input: ESI=big a

 ;        EDI=big b

 ;        ECX=length in DWORDs

 ; output:CF

 ; action:b = b + a

 

 bn_add:       clc                     ; CF=0

 __cycle:      lodsd

               adc     eax, [edi]

               stosd

               loop    __cycle

               retn
    


[/b]Sub numbers:[/b]
Code:
 ; input: ESI=big a

 ;        EDI=big b

 ;        ECX=length in DWORDs

 ; output:CF

 ; action:b = b - a

 

 bn_sub:       clc                     ; CF=0

 __cycle:      lodsd

               sbb     [edi], eax

               lea     edi, [edi+4]

               loop    __cycle

               retn
    


Compare numbers:
Code:
 ; input: ESI=big a

 ;        EDI=big b

 ;        ECX=length in DWORDs

 ; output:CF==1 (jb) -- a < b

 ;        ZF==1 (jz) -- a = b

 ;        CF==0 (ja) -- a > b

 

 bn_cmp:

 __cycle:      mov     eax, [esi+ecx*4-4]

               cmp     eax, [edi+ecx*4-4]               
               loopz  __cycle

 __exit:       retn
    


Mul numbers:
Code:
MulNumbers: ;Num1,Num2,Res,Size
   pushad

  mov edi,[esp+2Ch]
   xor eax,eax
 mov ecx,LEN_DWORD*2
 rep stosd

       mov ebx,[esp+24h]
   mov ecx,[esp+28h]
   xor esi,esi
 align 4
  @@First_Cicle:
     xor edi,edi
 align 4
  @@Second_Cicle:
    mov eax,[ebx+edi]
   mul 4 ptr [ecx+esi]
 lea ebp,[esi+edi]
   add ebp,[esp+2Ch]
   add [ebp],eax
       adc [ebp+4],edx
     lea ebp,[ebp+8]
     jnc @@No_Add
  @@Add_Again:
  adc 4 ptr [ebp],0
   lea ebp,[ebp+4]
     jc @@Add_Again
  @@No_Add:
   add edi,4
   cmp edi,[esp+30h]
   jb @@Second_Cicle

       add esi,4
   cmp esi,[esp+30h]
   jb @@First_Cicle

        popad
       ret 10h
    


DIV numbers:
Code:
DivNumbers: ;Num1:dword,Num2:dword,Res:dword,Rem:dword
       pushad

  mov ecx,LEN_DWORD
   mov edi,[esp+2Ch]
   xor eax,eax
 push ecx
    rep stosd
   pop ecx

 mov edi,[esp+30h]
   rep stosd

       mov ebx,LEN_BITS-1
  align 4
  @@Next_Bit:
        mov esi,[esp+2Ch]
   clc
 mov ecx,LEN_DWORD
   align 4
  @@Next_Shl:
        rcl 4 ptr [esi],1
   lea esi,[esi+4]
     loop @@Next_Shl

 mov eax,[esp+24h]
   bt [eax],ebx
        mov esi,[esp+30h]
   mov ecx,LEN_DWORD
   push esi
    align 4
  @@Next_Rcl:
        rcl 4 ptr [esi],1
   lea esi,[esi+4]
     loop @@Next_Rcl
     pop edi

 mov esi,[esp+28h]
   mov ecx,LEN_DWORD
   align 4
  @@Next_Cmp:
        mov eax,[edi+ecx*4-4]
       cmp eax,[esi+ecx*4-4]
       jb @@No_Sub
 ja @@Sub
    loop @@Next_Cmp
  @@Sub:
     mov ecx,LEN_DWORD
   clc
 align 4
  @@Next_Sub:
        lodsd
       sbb [edi],eax
       lea edi,[edi+4]
     loop @@Next_Sub

 mov eax,[esp+2Ch]
   or 1 ptr [eax],1
  @@No_Sub:
 dec ebx
     jns @@Next_Bit

  popad
       ret 10h
    
Post 21 Nov 2007, 18:35
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number Reply with quote
hckr83



Joined: 12 Nov 2006
Posts: 86
Location: usa
hckr83
hey, Vid, is English your first language? You talk understandably, but some stuff is kinda awkward to read...like
Quote:
To invert all bits, we have not instruction

would be at least To invert all bits, we have a not instruction...and possibly phrased a bit differently...
Post 21 Nov 2007, 23:51
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
nope, i just learnt english by using. I always omit "a" and "the" , for some reason.

and i would use "the" in that case Wink
Post 21 Nov 2007, 23:55
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
hckr83 wrote:
Quote:
To invert all bits, we have not instruction

would be at least To invert all bits, we have a not instruction...and possibly phrased a bit differently...

I'm the editor of x86asm.net. Note that "not" is set up using Curier - that should make the difference. Vid is used to use quotes to make mnemonics different, but I don't like this way.
And yes, there should be additional a...
Post 22 Nov 2007, 08:55
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Code:
; ----------------------------------------------------------------------

 AMD Optimizations

 (c) 2001 - 2006 Advanced Micro Devices, Inc. All rights reserved.

 http://developer.amd.com
        /htmlhelp/optimization/wwhelp/wwhimpl/java/html/wwhelp.htm

 (below taken from "Integer Optimizations" section):

; ----------------------------------------------------------------------


    Efficient 64-Bit Integer Arithmetic in 32-Bit Mode
    Optimization

    The following section contains a collection of code snippets and
    subroutines showing the efficient implementation of 64-bit arithmetic
    in 32-bit mode. Note that these are 32-bit recommendations, in 64-bit
    mode it is important to use 64-bit integer instructions for best
    performance.

    Addition, subtraction, negation, and shifting are best handled by inline
    code. Multiplication, division, and the computation of remainders are less
    common operations and are usually implemented as subroutines. If these
    subroutines are used often, the programmer should consider inlining them.
    Except for division and remainder calculations, the following code works
    for both signed and unsigned integers. The division and remainder code
    shown works for unsigned integers, but can easily be extended to handle
    signed integers.

    64-Bit Addition

    ; Add ECX:EBX to EDX:EAX, and place sum in EDX:EAX. 
    add eax, ebx 
    adc edx, ecx 

    64-Bit Subtraction

    ; Subtract ECX:EBX from EDX:EAX and place difference in EDX:EAX. 
    sub eax, ebx 
    sbb edx, ecx 

    64-Bit Negation

    ; Negate EDX:EAX. 
    not edx 
    neg eax 
    sbb edx, -1   ; Fix: Increment high word if low word was 0. 

    64-Bit Left Shift

    ; Shift EDX:EAX left, ??shift count in ECX (count 
    ;  applied modulo 64). 
       shld edx, eax, cl   ; First apply shift count. 
       shl  eax, cl        ; ??mod 32 to EDX:EAX 
       test ecx, 32        ; Need to shift by another 32? 
       jz   lshift_done    ; No, done. 
       mov  edx, eax       ; Left shift EDX:EAX 
       xor  eax, eax       ;  by 32 bits 
    lshift_done: 

    64-Bit Right Shift

       shrd eax, edx, cl   ; First apply shift count. 
       shr  edx, cl        ; ??mod 32 to EDX:EAX 
       test ecx, 32        ; Need to shift by another 32? 
       jz   rshift_done    ; No, done. 
       mov  eax, edx       ; Left shift EDX:EAX 
       xor  edx, edx       ;  by 32 bits. 
    rshift_done: 

    64-Bit Multiplication

    ; _llmul computes the low-order half of the product of its 
    ;  arguments, two 64-bit integers. 
    ; 
    ; In:       [ESP+8]:[ESP+4] = multiplicand 
    ;           [ESP+16]:[ESP+12] = multiplier 
    ; Out:      EDX:EAX = (multiplicand * multiplier) % 2^64 
    ; Destroys: EAX, ECX, EDX, EFlags 
    _llmul PROC 
       mov edx, [esp+8]    ; multiplicand_hi 
       mov ecx, [esp+16]   ; multiplier_hi 
       or  edx, ecx        ; One operand >= 2^32? 
       mov edx, [esp+12]   ; multiplier_lo 
       mov eax, [esp+4]    ; multiplicand_lo 
       jnz twomul          ; Yes, need two multiplies. 
       mul edx             ; multiplicand_lo * multiplier_lo 
       ret                 ; Done, return to caller. 
    twomul: 
       imul edx, [esp+8]         ; p3_lo = multiplicand_hi * multiplier_lo 
       imul ecx, eax             ; p2_lo = multiplier_hi * multiplicand_lo 
       add  ecx, edx             ; p2_lo + p3_lo 
       mul  dword ptr [esp+12]   ; p1 = multiplicand_lo * multiplier_lo 
       add  edx, ecx             ; p1 + p2_lo + p3_lo = result in EDX:EAX 
       ret                       ; Done, return to caller. 
    _llmul ENDP 

    64-Bit Unsigned Division

    ; _ulldiv divides two unsigned 64-bit integers and returns the quotient. 
    ; 
    ; In:       [ESP+8]:[ESP+4] = dividend 
    ;           [ESP+16]:[ESP+12] = divisor 
    ; Out:      EDX:EAX = quotient of division 
    ; Destroys: EAX, ECX, EDX, EFlags 
    _ulldiv PROC 
       push ebx             ; Save EBX as per calling convention. 
       mov  ecx, [esp+20]   ; divisor_hi 
       mov  ebx, [esp+16]   ; divisor_lo 
       mov  edx, [esp+12]   ; dividend_hi 
       mov  eax, [esp+8]    ; dividend_lo 
       test ecx, ecx        ; divisor > (2^32 - 1)? 
       jnz  big_divisor     ; Yes, divisor > 2^32 - 1. 
       cmp  edx, ebx        ; Only one division needed (ECX = 0)? 
       jae  two_divs        ; Need two divisions. 
       div  ebx             ; EAX = quotient_lo 
       mov  edx, ecx        ; EDX = quotient_hi = 0 (quotient in EDX:EAX) 
       pop  ebx             ; Restore EBX as per calling convention. 
       ret                  ; Done, return to caller. 
    two_divs: 
       mov  ecx, eax   ; Save dividend_lo in ECX. 
       mov  eax, edx   ; Get dividend_hi. 
       xor  edx, edx   ; Zero-extend it into EDX:EAX. 
       div  ebx        ; quotient_hi in EAX 
       xchg eax, ecx   ; ECX = quotient_hi, EAX = dividend_lo 
       div  ebx        ; EAX = quotient_lo 
       mov  edx, ecx   ; EDX = quotient_hi (quotient in EDX:EAX) 
       pop  ebx        ; Restore EBX as per calling convention. 
       ret             ; Done, return to caller. 
    big_divisor: 
       push edi                  ; Save EDI as per calling convention. 
       mov  edi, ecx             ; Save divisor_hi. 
       shr  edx, 1               ; Shift both divisor and dividend right 
       rcr  eax, 1               ;  by 1 bit. 
       ror  edi, 1 
       rcr  ebx, 1 
       bsr  ecx, ecx             ; ECX = number of remaining shifts 
       shrd ebx, edi, cl         ; Scale down divisor and dividend 
       shrd eax, edx, cl         ;  such that divisor is less than 
       shr  edx, cl              ;  2^32 (that is, it fits in EBX). 
       rol  edi, 1               ; Restore original divisor_hi. 
       div  ebx                  ; Compute quotient. 
       mov  ebx, [esp+12]        ; dividend_lo 
       mov  ecx, eax             ; Save quotient. 
       imul edi, eax             ; quotient * divisor high word (??low only) 
       mul  dword ptr [esp+20]   ; quotient * divisor low word 
       add  edx, edi             ; EDX:EAX = quotient * divisor 
       sub  ebx, eax             ; dividend_lo - (quot.*divisor)_lo 
       mov  eax, ecx             ; Get quotient. 
       mov  ecx, [esp+16]        ; dividend_hi 
       sbb  ecx, edx             ; Subtract (divisor * quot.) from dividend. 
       sbb  eax, 0               ; Adjust quotient if remainder negative. 
       xor  edx, edx             ; Clear high word of quot. (EAX<=FFFFFFFFh). 
       pop  edi                  ; Restore EDI as per calling convention. 
       pop  ebx                  ; Restore EBX as per calling convention. 
       ret                       ; Done, return to caller. 
    _ulldiv ENDP 

    64-Bit Signed Division

    ; _lldiv divides two signed 64-bit numbers and delivers the quotient  
    ; 
    ; In:       [ESP+8]:[ESP+4] = dividend 
    ;           [ESP+16]:[ESP+12] = divisor 
    ; Out:      EDX:EAX = quotient of division 
    ; Destroys: EAX, ECX,E DX, EFlags 
    _lldiv PROC 
       push ebx    ; Save EBX as per calling convention. 
       push esi    ; Save ESI as per calling convention. 
       push edi    ; Save EDI as per calling convention. 
       mov  ecx, [esp+28]   ; divisor_hi 
       mov  ebx, [esp+24]   ; divisor_lo 
       mov  edx, [esp+20]   ; dividend_hi 
       mov  eax, [esp+16]   ; dividend_lo 
       mov  esi, ecx        ; divisor_hi 
       xor  esi, edx        ; divisor_hi ^ dividend_hi  
       sar  esi, 31         ; (quotient < 0) ? -1 : 0 
       mov  edi, edx        ; dividend_hi 
       sar  edi, 31         ; (dividend < 0) ? -1 : 0 
       xor  eax, edi        ; If (dividend < 0), 
       xor  edx, edi        ;  compute 1's complement of dividend. 
       sub  eax, edi        ; If (dividend < 0), 
       sbb  edx, edi        ;  compute 2's complement of dividend. 
       mov  edi, ecx        ; divisor_hi 
       sar  edi, 31         ; (divisor < 0) ? -1 : 0 
       xor  ebx, edi        ; If (divisor < 0), 
       xor  ecx, edi        ;  compute 1's complement of divisor. 
       sub  ebx, edi        ; If (divisor < 0), 
       sbb  ecx, edi        ;  compute 2's complement of divisor. 
       jnz  big_divisor     ; divisor > 2^32 - 1 
       cmp  edx, ebx        ; Only one division needed (ECX = 0)? 
       jae  two_divs        ; Need two divisions. 
       div  ebx             ; EAX = quotient_lo 
       mov  edx, ecx        ; EDX = quotient_hi = 0 (quotient in EDX:EAX) 
       xor  eax, esi        ; If (quotient < 0), 
       xor  edx, esi        ;  compute 1's complement of result. 
       sub  eax, esi        ; If (quotient < 0), 
       sbb  edx, esi        ;  compute 2's complement of result. 
       pop  edi             ; Restore EDI as per calling convention. 
       pop  esi             ; Restore ESI as per calling convention. 
       pop  ebx             ; Restore EBX as per calling convention. 
       ret                  ; Done, return to caller. 
    two_divs: 
       mov  ecx, eax     ; Save dividend_lo in ECX. 
       mov  eax, edx     ; Get dividend_hi. 
       xor  edx, edx     ; Zero-extend it into EDX:EAX. 
       div  ebx          ; quotient_hi in EAX 
       xchg eax, ecx     ; ECX = quotient_hi, EAX = dividend_lo 
       div  ebx          ; EAX = quotient_lo 
       mov  edx, ecx     ; EDX = quotient_hi (quotient in EDX:EAX) 
       jmp  make_sign   ; Make quotient signed. 
    big_divisor: 
       sub  esp, 12             ; Create three local variables. 
       mov  [esp], eax          ; dividend_lo 
       mov  [esp+4], ebx        ; divisor_lo 
       mov  [esp+8], edx        ; dividend_hi 
       mov  edi, ecx            ; Save divisor_hi. 
       shr  edx, 1              ; Shift both 
       rcr  eax, 1              ;  divisor and 
       ror  edi, 1              ;  and dividend 
       rcr  ebx, 1              ;  right by 1 bit. 
       bsr  ecx, ecx            ; ECX = number of remaining shifts 
       shrd ebx, edi, cl        ; Scale down divisor and 
       shrd eax, edx, cl        ;  dividend such that divisor is 
       shr  edx, cl             ;  less than 2^32 (that is, fits in EBX). 
       rol  edi, 1              ; Restore original divisor_hi. 
       div  ebx                 ; Compute quotient. 
       mov  ebx, [esp]          ; dividend_lo 
       mov  ecx, eax            ; Save quotient. 
       imul edi, eax            ; quotient * divisor high word (??low only) 
       mul  DWORD PTR [esp+4]   ; quotient * divisor low word 
       add  edx, edi            ; EDX:EAX = quotient * divisor 
       sub  ebx, eax            ; dividend_lo - (quot.*divisor)_lo 
       mov  eax, ecx            ; Get quotient. 
       mov  ecx, [esp+8]        ; dividend_hi 
       sbb  ecx, edx            ; Subtract (divisor * quot.) from dividend 
       sbb  eax, 0              ; Adjust quotient if remainder is negative. 
       xor  edx, edx            ; Clear high word of quotient. 
       add  esp, 12             ; Remove local variables. 
    make_sign: 
       xor eax, esi   ; If (quotient < 0), 
       xor edx, esi   ;  compute 1's complement of result. 
       sub eax, esi   ; If (quotient < 0), 
       sbb edx, esi   ;  compute 2's complement of result. 
       pop edi        ; Restore EDI as per calling convention. 
       pop esi        ; Restore ESI as per calling convention. 
       pop ebx        ; Restore EBX as per calling convention. 
       ret            ; Done, return to caller. 
    _lldiv ENDP 

    64-Bit Unsigned Remainder Computation

    ; _ullrem divides two unsigned 64-bit integers and returns the remainder. 
    ; 
    ; In:       [ESP+8]:[ESP+4] = dividend 
    ;           [ESP+16]:[ESP+12] = divisor 
    ; 
    ; Out:      EDX:EAX = remainder of division 
    ; 
    ; Destroys: EAX, ECX, EDX, EFlags 
    _ullrem PROC 
       push ebx              ; Save EBX as per calling convention. 
       mov  ecx, [esp+20]    ; divisor_hi 
       mov  ebx, [esp+16]    ; divisor_lo 
       mov  edx, [esp+12]    ; dividend_hi 
       mov  eax, [esp+8]     ; dividend_lo 
       test ecx, ecx         ; divisor > 2^32 - 1? 
       jnz  r_big_divisor    ; Yes, divisor > 32^32 - 1. 
       cmp  edx, ebx         ; Only one division needed (ECX = 0)? 
       jae  r_two_divs       ; Need two divisions. 
       div  ebx              ; EAX = quotient_lo 
       mov  eax, edx         ; EAX = remainder_lo 
       mov  edx, ecx         ; EDX = remainder_hi = 0 
       pop  ebx              ; Restore EBX per calling convention. 
       ret                   ; Done, return to caller. 
    r_two_divs: 
       mov ecx, eax   ; Save dividend_lo in ECX. 
       mov eax, edx   ; Get dividend_hi. 
       xor edx, edx   ; Zero-extend it into EDX:EAX. 
       div ebx        ; EAX = quotient_hi, EDX = intermediate remainder 
       mov eax, ecx   ; EAX = dividend_lo 
       div ebx        ; EAX = quotient_lo 
       mov eax, edx   ; EAX = remainder_lo 
       xor edx, edx   ; EDX = remainder_hi = 0 
       pop ebx        ; Restore EBX as per calling convention. 
       ret            ; Done, return to caller. 
    r_big_divisor: 
       push edi                  ; Save EDI as per calling convention. 
       mov  edi, ecx             ; Save divisor_hi. 
       shr  edx, 1               ; Shift both divisor and dividend right 
       rcr  eax, 1               ;  by 1 bit. 
       ror  edi, 1 
       rcr  ebx, 1 
       bsr  ecx, ecx             ; ECX = number of remaining shifts 
       shrd ebx, edi, cl         ; Scale down divisor and dividend such 
       shrd eax, edx, cl         ;  that divisor is less than 2^32 
       shr  edx, cl              ;  (that is, it fits in EBX). 
       rol  edi, 1               ; Restore original divisor (EDI:ESI). 
       div  ebx                  ; Compute quotient. 
       mov  ebx, [esp+12]        ; dividend low word 
       mov  ecx, eax             ; Save quotient. 
       imul edi, eax             ; quotient * divisor high word (??low only) 
       mul  DWORD PTR [esp+20]   ; quotient * divisor low word 
       add  edx, edi             ; EDX:EAX = quotient * divisor 
       sub  ebx, eax             ; dividend_lo - (quot.*divisor)_lo 
       mov  ecx, [esp+16]        ; dividend_hi 
       mov  eax, [esp+20]        ; divisor_lo 
       sbb  ecx, edx             ; Subtract divisor * quot. from dividend. 
       sbb  edx, edx             ; (remainder < 0) ? 0xFFFFFFFF : 0 
       and  eax, edx             ; (remainder < 0) ? divisor_lo : 0 
       and  edx, [esp+24]        ; (remainder < 0) ? divisor_hi : 0 
       add  eax, ebx             ; remainder += (remainder < 0) ? divisor : 0 
       pop  edi                  ; Restore EDI as per calling convention. 
       pop  ebx                  ; Restore EBX as per calling convention. 
       ret                       ; Done, return to caller. 
    _ullrem ENDP 

    64-Bit Signed Remainder Computation

    ; _llrem divides two signed 64-bit numbers and returns the remainder. 
    ; 
    ; In:       [ESP+8]:[ESP+4] = dividend 
    ;           [ESP+16]:[ESP+12] = divisor 
    ; 
    ; Out:      EDX:EAX = remainder of division 
    ; 
    ; Destroys: EAX, ECX, EDX, EFlags 
       push ebx               ; Save EBX as per calling convention. 
       push esi               ; Save ESI as per calling convention. 
       push edi               ; Save EDI as per calling convention. 
       mov  ecx, [esp+28]     ; divisor-hi 
       mov  ebx, [esp+24]     ; divisor-lo 
       mov  edx, [esp+20]     ; dividend-hi 
       mov  eax, [esp+16]     ; dividend-lo 
       mov  esi, edx          ; sign(remainder) == sign(dividend) 
       sar  esi, 31           ; (remainder < 0) ? -1 : 0 
       mov  edi, edx          ; dividend-hi 
       sar  edi, 31           ; (dividend < 0) ? -1 : 0 
       xor  eax, edi          ; If (dividend < 0), 
       xor  edx, edi          ;  compute 1's complement of dividend. 
       sub  eax, edi          ; If (dividend < 0), 
       sbb  edx, edi          ;  compute 2's complement of dividend. 
       mov  edi, ecx          ; divisor-hi 
       sar  edi, 31           ; (divisor < 0) ? -1 : 0 
       xor  ebx, edi          ; If (divisor < 0), 
       xor  ecx, edi          ;  compute 1's complement of divisor. 
       sub  ebx, edi          ; If (divisor < 0), 
       sbb  ecx, edi          ;  compute 2's complement of divisor. 
       jnz  sr_big_divisor    ; divisor > 2^32 - 1 
       cmp  edx, ebx          ; Only one division needed (ECX = 0)? 
       jae  sr_two_divs       ; No, need two divisions. 
       div  ebx               ; EAX = quotient_lo 
       mov  eax, edx          ; EAX = remainder_lo 
       mov  edx, ecx          ; EDX = remainder_lo = 0 
       xor  eax, esi          ; If (remainder < 0), 
       xor  edx, esi          ;  compute 1's complement of result. 
       sub  eax, esi          ; If (remainder < 0), 
       sbb  edx, esi          ;  compute 2's complement of result. 
       pop  edi               ; Restore EDI as per calling convention. 
       pop  esi               ; Restore ESI as per calling convention. 
       pop  ebx               ; Restore EBX as per calling convention. 
       ret                    ; Done, return to caller. 
    sr_two_divs: 
       mov ecx, eax        ; Save dividend_lo in ECX. 
       mov eax, edx        ; Get dividend_hi. 
       xor edx, edx        ; Zero-extend it into EDX:EAX. 
       div ebx             ; EAX = quotient_hi, EDX = intermediate remainder 
       mov eax, ecx        ; EAX = dividend_lo 
       div  ebx            ; EAX = quotient_lo 
       mov  eax, edx       ; remainder_lo 
       xor  edx, edx       ; remainder_hi = 0 
       jmp  sr_makesign    ; Make remainder signed. 
    sr_big_divisor: 
       sub  esp, 16             ; Create three local variables. 
       mov  [esp], eax          ; dividend_lo              
       mov  [esp+4], ebx        ; divisor_lo              
       mov  [esp+8], edx        ; dividend_hi 
       mov  [esp+12], ecx       ; divisor_hi 
       mov  edi, ecx            ; Save divisor_hi. 
       shr  edx, 1              ; Shift both 
       rcr  eax, 1              ;  divisor and 
       ror  edi, 1              ;  and dividend 
       rcr  ebx, 1              ;  right by 1 bit. 
       bsr  ecx, ecx            ; ECX = number of remaining shifts 
       shrd ebx, edi, cl        ; Scale down divisor and 
       shrd eax, edx, cl        ;  dividend such that divisor is 
       shr  edx, cl             ;  less than 2^32 (that is, fits in EBX). 
       rol  edi, 1              ; Restore original divisor_hi. 
       div  ebx                 ; Compute quotient. 
       mov  ebx, [esp]          ; dividend_lo 
       mov  ecx, eax            ; Save quotient. 
       imul edi, eax            ; quotient * divisor high word (??low only) 
       mul  DWORD PTR [esp+4]   ; quotient * divisor low word 
       add  edx, edi            ; EDX:EAX = quotient * divisor 
       sub  ebx, eax            ; dividend_lo - (quot.*divisor)_lo 
       mov  ecx, [esp+8]        ; dividend_hi 
       sbb  ecx, edx            ; Subtract divisor * quot. from dividend. 
       sbb  eax, eax            ; remainder < 0 ? 0xffffffff : 0 
       mov  edx, [esp+12]       ; divisor_hi 
       and  edx, eax            ; remainder < 0 ? divisor_hi : 0 
       and  eax, [esp+4]        ; remainder < 0 ? divisor_lo : 0 
       add  eax, ebx            ; remainder_lo 
       add  edx, ecx            ; remainder_hi 
       add  esp, 16             ; Remove local variables. 
    sr_makesign: 
       xor eax, esi   ; If (remainder < 0), 
       xor edx, esi   ;  compute 1's complement of result. 
       sub eax, esi   ; If (remainder < 0), 
      sbb edx, esi   ;  compute 2's complement of result. 
       pop edi        ; Restore EDI as per calling convention. 
       pop esi        ; Restore ESI as per calling convention. 
       pop ebx        ; Restore EBX as per calling convention. 
       ret            ; Done, return to caller. 
    
Post 05 Dec 2007, 14:10
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
Thanks. Don't you happen to have direct link to this manual, and name of chapter/section where this is? I'd add it as reference to article.

Things in article are basically done same way, except for NEG and that was already pointed out by bitrake on x86asm board.

Those bignum divisions are really interesting, I'd have to study them someday.
Post 05 Dec 2007, 14:30
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Sorry, I posted as close to a direct reference (URL) as I could.

Software Optimization Guide for AMD64 Processors
- Integer Optimizations
-- Efficient 64-Bit Integer Arithmatic in 32-Bit Mode
Post 05 Dec 2007, 14:45
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
Thanks a lot.

I plan to add bignum multiplication, and with your help even division someday to article.
Post 05 Dec 2007, 14:50
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
Post 07 Jan 2008, 05:41
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
MCD:
not sure about math stuff, but my O(n*log(n)) is number of MUL instructions used, probably not what math people intended. My idea is something like this:

c = a*b
(all number consist of 4 dwords,
c0 = first dword of destination, c1 = second, etc...
edx is upper 32bits of previous multiplication)
Code:
c0 = a0 * b0
c1 = a1 * b0 + edx
c2 = a2 * b0 + edx
c3 = a3 * b0 + edx

c1 += a0 * b1
c2 += a1 * b1 + edx
c3 += a2 * b1 + edx

c2 += a0 * b2
c3 += a1 * b2 + edx

c3 += a0 * b3
    

Isn't this O(n*log(n)) number of MUL instructions?
Post 07 Jan 2008, 11:25
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
But if you want this code to scale without loops, you need to scale your code that does the operation.
If you add 128-bit numbers with 32-bit registers you need a loop ECX=4
If you mul 128-bit numbers with 64-bit registers you can do this manually
but MUL128 with R32 is tricky. You still need an algorithm, which means shifting is needed on sub-calculations.
Post 07 Jan 2008, 13:19
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
of course that it would be done with loops.

but number of MULs is N + (N-1) + (N-2) + (N-3) + ... + 1

isn't this O(n*log(n))? I never really knew big-O notation
Post 07 Jan 2008, 14:13
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
The "N + (N-1) + (N-2) + (N-3) + ... + 1" is sometimes refered as the sum of the first N natural numbers (though, you wrote it in reverse order Razz).

Doing some maths:
Code:
1 + 2 + ... + N-1 + N = ((1 + 2 + 3 + ... + N-2 + N-1 + N) + (N + N-1 + N-2 + ... + 3 + 2 + 1))/2
                    = (1+N + 2+N-1 + 3+N-2 + ... + N-2+3 + N-1+2 + N+1)/2
               = (N+1 + N+1 + N+1 + ... + N+1 + N+1 + N+1)/2
               = N(N+1)/2    


However, while N(N+1)/2 for N=100 is 5050, log(100) is 2 and log2(100) is 6,64.. Not sure if the notation refers to log2 or log10 but non of those matches your calculation of the number of MULs Razz
Post 07 Jan 2008, 14:38
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
okay, so it is O(N^2)... my bad Smile
I was just quessing anyway...
Post 07 Jan 2008, 14:54
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Doesn't exists O(N(N+1)/2)? N^2 is ultra big on big Ns, much much bigger than N(N+1)/2
Post 07 Jan 2008, 14:57
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.