flat assembler
Message board for the users of flat assembler.

Index > Main > your thoughts on division...

Author
Thread Post new topic Reply to topic
moriman



Joined: 01 Apr 2006
Posts: 55
Location: Northern Ireland
moriman 12 Apr 2006, 22:20
Hi,

Assume, hypothetically, that only the 'ordinary' registers can be used for division, so that there is no FP support. I'm interested in the different methods that people would use to approach a calculation such as


0987F39A89BC542E2h / 089713A53E6845D43F3h

i.e. where the dividend and divisor are > 32bits and also that dividend is too large to fit in edx:eax without causing an overflow.


The dividend can be 'infinitely' extended with 0s if a fractional answer is required.

I think that this could lead to some interesting ideas Wink

here's hoping

mori
Post 12 Apr 2006, 22:20
View user's profile Send private message Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 13 Apr 2006, 04:33
once i wrote division sub 32bit/16bit for 8080 (8bit CPU without div operation) - align, then subtract/shift -- as well as division in column on the paper -- 32 cycles max.
Post 13 Apr 2006, 04:33
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 13 Apr 2006, 05:34
The MagicNumber program optimized integer division, by finding the 'inverse' of the integer so that you could multiply it and shift the result to get the answer. But I'm pretty sure that program used the FPU to find the 'MagicNumber'.

The best way is to get an optimized BigNum lib would probably be the fastest.

I'd probably try to convert the numbers to decimal and emulate long division OR try to emulate long division using the binary. But when your working with large numbers unless you have some fancy algorithm you'll pretty much end up with a Big Number subtract loop and compare loop.
Post 13 Apr 2006, 05:34
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
moriman



Joined: 01 Apr 2006
Posts: 55
Location: Northern Ireland
moriman 13 Apr 2006, 16:16
r22 wrote:
The MagicNumber program optimized integer division, by finding the 'inverse' of the integer so that you could multiply it and shift the result to get the answer. But I'm pretty sure that program used the FPU to find the 'MagicNumber'.


I'd considered starting with 1/x but after the first division things end up as messy anyway Sad

r22 wrote:
The best way is to get an optimized BigNum lib would probably be the fastest.


Not sure I follow exactly here¿

r22 wrote:
I'd probably try to convert the numbers to decimal and emulate long division OR try to emulate long division using the binary. But when your working with large numbers unless you have some fancy algorithm you'll pretty much end up with a Big Number subtract loop and compare loop.


This is where I was tending to, but instead of simple sub\compare, I was thinking along the lines of starting with the midrange possible value of the quotient then half all the way up or down to the correct value something like


63/3 = 32? too large
= 16 too small
= 24 too large
= 20 too small
= 22 to large
= 21 phew! Smile
Post 13 Apr 2006, 16:16
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 13 Apr 2006, 16:23
That's a pretty good idea, but it can get messy with some numbers, and you'll have lots of ifs. I know there's gotta be some math magic numbers here (maybe make a small lookup table to help you, from nibble to nibble, or byte to byte, I dunno). I'll try to look from a different perspective (low-level math you name it Smile). Though I don't have much time, I'll try. Wink

Still it's good to think at it Very Happy
Post 13 Apr 2006, 16:23
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 13 Apr 2006, 17:54
There are three main algorithms for doing this type of integer division: the classical algorithm, Barrett's algorithm and Montgomery's algorithm. There is a myriad of information about all these methods available with your favourite search engines. r22's hint about bignum lib's was right on the mark. These operations are used for encryption/decryption and having a suitable algorithm combined with proper optimisation is a very important thing.

Depending on exactly what relative sizes of numbers and how often they change will dictate the best algorithm for your needs.

I written have both ARM and x86 optimized versions of all three algorithms but unfortunately they are copyrighted by my company so I cannot paste them here Sad
Post 13 Apr 2006, 17:54
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 13 Apr 2006, 18:00
Oops, I forgot to mention, the RSA macros I post here some time back use Barrett's algorithm for the modular reduction but it can easily be modified for division.
Post 13 Apr 2006, 18:00
View user's profile Send private message Visit poster's website Reply with quote
moriman



Joined: 01 Apr 2006
Posts: 55
Location: Northern Ireland
moriman 13 Apr 2006, 19:48
thx rev, I'll take a look when I get a chance Wink
Post 13 Apr 2006, 19:48
View user's profile Send private message Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 14 Apr 2006, 06:46
it took a lot of my efforts to found this in my archives (essp. because it was called 2.txt)
Code:
        MVI     A,33
UDM1:   DCR     A
        JZ      UDM6
        XCHG
        DAD     H
        XCHG
        RAL
        XTHL
        DAD     H
        RAR
        JNC     UDM2
        INX     H
UDM2:   XTHL
        DAD     H
        RAL
        JNC     UDM3
        INX     H
UDM3:   ORA     A
        RAR
        PUSH    H
        PUSH    PSW
        MOV     A,L
        SUB     C
        MOV     L,A
        MOV     A,H
        SBB     B
        MOV     H,A
        JNC     UDM4
        POP     PSW
        JC      UDM5
        POP     H
        JMP     UDM1
UDM4:   POP     PSW
UDM5:   INX     SP
        INX     SP
        INX     D
        JMP     UDM1
UDM6:    

it is so complex because 8080 has poor set of instructions unlike modern cpus, but anyway as museum exponat Wink it was written to use in forth interpreter and provided division of 32-bit number to 16-bit divider and receive 32-bit result and 16-bit reminder. how does it work - shown on the picture below. can be easy converted to z80 cpu (hope, this post is not so spammy Wink )[/code]


Description:
Filesize: 16.4 KB
Viewed: 12144 Time(s)

div8080.gif



_________________
UNICODE forever!
Post 14 Apr 2006, 06:46
View user's profile Send private message Visit poster's website Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 14 Apr 2006, 07:45
just wrote x86 equivalent Wink
Code:
proc intdiv ; edx/eax -> eax - result, edx - reminder 
    push ebx
    mov ecx,33
    xor ebx,ebx
@@:
    dec ecx
    jz  @F
    add edx,edx
    adc ebx,ebx
    jz  @B
    cmp ebx,eax
    jb  @B
    sub ebx,eax
    inc edx
    jmp @B
@@:
    mov eax,edx
    mov edx,ebx
    pop ebx
    ret
endp
    

_________________
UNICODE forever!
Post 14 Apr 2006, 07:45
View user's profile Send private message Visit poster's website Reply with quote
moriman



Joined: 01 Apr 2006
Posts: 55
Location: Northern Ireland
moriman 26 Apr 2006, 22:46
Thanks for the time in rewriting this shoorick Wink
Unfortunately, this isn't really the type of thing I'm after. In the OP I was saying that the numbers would be too large to fit in the 32-bit registers, so I would have to do the division in stages.

Revolution...
Quote:

There are three main algorithms for doing this type of integer division: the classical algorithm, Barrett's algorithm and Montgomery's algorithm. There is a myriad of information about all these methods available with your favourite search engines.


I have searched (and found) the myriad of info on Barret's and Montgomery's algorithms. Sad to say it appears that I only thought I was smart at math. I cannot find any 'plain english' descriptions of these algorithms and cannot seem to get my head around the info that I have read Sad

Oh well, suppose I'll just have to keep plugging away...
Post 26 Apr 2006, 22:46
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Apr 2006, 09:54
Okay, I've definately seen 64-bit/64-bit somewhere on the net and that was pretty neat. Too bad I can't remember how that was done Sad

...but at least your divider fits in 32 bits - you can use the following:
Code:
        mov     ecx,(aend-a)
        mov     edi,[b]
        xor     edx,edx
    divide:
        sub     ecx,4
        jc      .Imdone
        mov     eax,[a+ecx]
        div     edi
        mov     [result+ecx],eax
        jmp     divide
      .Imdone:

a       dd      0987F39A8h,09BC542E2h, 089713A53h, 0E6845D43h
aend = $

b       dd      00089F1C3h

result  rd      (aend-a)/4
    


What you'd have to do to have arbitrary precision division is to divide the divisor into multiple parts and then cascade the results.

Like with 75/15=5 (these are all DECimals) you can divide 75/5 and 75/3 because 3*5=15. Then you can do something like this: 75/5/3=5 where 5 and 3 both fit in the machine word.

The only problem is to find a way to find factors of the divisor that doesn't fit in the machine word :S
Post 27 Apr 2006, 09:54
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Apr 2006, 16:44
Okay, first thank you shoorick and second, take a look at what I made out of this code-fragment:

The only sad part is that I saw a rather elegant division algo but I can't write one myself. I don't understand how to do it 32-bits at a time NOT 1 bit at a time like now. The example takes 12Kclocks on a P4 and 10Kclocks on a P!!!

For comparison: 10 DIVs take ~800 clocks Wink


Description: This can divide arbitrary numbers. The memory is the only limit :D
Download
Filename: Jagamine.7z
Filesize: 1.09 KB
Downloaded: 527 Time(s)


_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 27 Apr 2006, 16:44
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 28 Apr 2006, 09:47
i just showed a way: you can extend it into any sizes and optimize so far so you need - as well as i divided 32bit/16bit on 8bit cpu (without exception), you can do 128bit/64bit similar way on 32bit cpu and 256bit/128bit on 64bit cpu, etc., etc. Wink
regards!
Post 28 Apr 2006, 09:47
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.