flat assembler
Message board for the users of flat assembler.

 Index > Main > your thoughts on division...
Author
moriman

Joined: 01 Apr 2006
Posts: 55
Location: Northern Ireland
moriman 12 Apr 2006, 22:20
Hi,

Assume, hypothetically, that only the 'ordinary' registers can be used for division, so that there is no FP support. I'm interested in the different methods that people would use to approach a calculation such as

0987F39A89BC542E2h / 089713A53E6845D43F3h

i.e. where the dividend and divisor are > 32bits and also that dividend is too large to fit in edx:eax without causing an overflow.

The dividend can be 'infinitely' extended with 0s if a fractional answer is required.

I think that this could lead to some interesting ideas

here's hoping

mori
12 Apr 2006, 22:20
shoorick

Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 13 Apr 2006, 04:33
once i wrote division sub 32bit/16bit for 8080 (8bit CPU without div operation) - align, then subtract/shift -- as well as division in column on the paper -- 32 cycles max.
13 Apr 2006, 04:33
r22

Joined: 27 Dec 2004
Posts: 805
r22 13 Apr 2006, 05:34
The MagicNumber program optimized integer division, by finding the 'inverse' of the integer so that you could multiply it and shift the result to get the answer. But I'm pretty sure that program used the FPU to find the 'MagicNumber'.

The best way is to get an optimized BigNum lib would probably be the fastest.

I'd probably try to convert the numbers to decimal and emulate long division OR try to emulate long division using the binary. But when your working with large numbers unless you have some fancy algorithm you'll pretty much end up with a Big Number subtract loop and compare loop.
13 Apr 2006, 05:34
moriman

Joined: 01 Apr 2006
Posts: 55
Location: Northern Ireland
moriman 13 Apr 2006, 16:16
r22 wrote:
The MagicNumber program optimized integer division, by finding the 'inverse' of the integer so that you could multiply it and shift the result to get the answer. But I'm pretty sure that program used the FPU to find the 'MagicNumber'.

I'd considered starting with 1/x but after the first division things end up as messy anyway

r22 wrote:
The best way is to get an optimized BigNum lib would probably be the fastest.

Not sure I follow exactly here¿

r22 wrote:
I'd probably try to convert the numbers to decimal and emulate long division OR try to emulate long division using the binary. But when your working with large numbers unless you have some fancy algorithm you'll pretty much end up with a Big Number subtract loop and compare loop.

This is where I was tending to, but instead of simple sub\compare, I was thinking along the lines of starting with the midrange possible value of the quotient then half all the way up or down to the correct value something like

63/3 = 32? too large
= 16 too small
= 24 too large
= 20 too small
= 22 to large
= 21 phew!
13 Apr 2006, 16:16
Borsuc

Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 13 Apr 2006, 16:23
That's a pretty good idea, but it can get messy with some numbers, and you'll have lots of ifs. I know there's gotta be some math magic numbers here (maybe make a small lookup table to help you, from nibble to nibble, or byte to byte, I dunno). I'll try to look from a different perspective (low-level math you name it ). Though I don't have much time, I'll try.

Still it's good to think at it
13 Apr 2006, 16:23
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20207
revolution 13 Apr 2006, 17:54
There are three main algorithms for doing this type of integer division: the classical algorithm, Barrett's algorithm and Montgomery's algorithm. There is a myriad of information about all these methods available with your favourite search engines. r22's hint about bignum lib's was right on the mark. These operations are used for encryption/decryption and having a suitable algorithm combined with proper optimisation is a very important thing.

Depending on exactly what relative sizes of numbers and how often they change will dictate the best algorithm for your needs.

I written have both ARM and x86 optimized versions of all three algorithms but unfortunately they are copyrighted by my company so I cannot paste them here
13 Apr 2006, 17:54
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20207
revolution 13 Apr 2006, 18:00
Oops, I forgot to mention, the RSA macros I post here some time back use Barrett's algorithm for the modular reduction but it can easily be modified for division.
13 Apr 2006, 18:00
moriman

Joined: 01 Apr 2006
Posts: 55
Location: Northern Ireland
moriman 13 Apr 2006, 19:48
thx rev, I'll take a look when I get a chance
13 Apr 2006, 19:48
shoorick

Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 14 Apr 2006, 06:46
it took a lot of my efforts to found this in my archives (essp. because it was called 2.txt)
Code:
```        MVI     A,33
UDM1:   DCR     A
JZ      UDM6
XCHG
XCHG
RAL
XTHL
RAR
JNC     UDM2
INX     H
UDM2:   XTHL
RAL
JNC     UDM3
INX     H
UDM3:   ORA     A
RAR
PUSH    H
PUSH    PSW
MOV     A,L
SUB     C
MOV     L,A
MOV     A,H
SBB     B
MOV     H,A
JNC     UDM4
POP     PSW
JC      UDM5
POP     H
JMP     UDM1
UDM4:   POP     PSW
UDM5:   INX     SP
INX     SP
INX     D
JMP     UDM1
UDM6:    ```

it is so complex because 8080 has poor set of instructions unlike modern cpus, but anyway as museum exponat it was written to use in forth interpreter and provided division of 32-bit number to 16-bit divider and receive 32-bit result and 16-bit reminder. how does it work - shown on the picture below. can be easy converted to z80 cpu (hope, this post is not so spammy )[/code]

 Description: Filesize: 16.4 KB Viewed: 11619 Time(s)

_________________
UNICODE forever!
14 Apr 2006, 06:46
shoorick

Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 14 Apr 2006, 07:45
just wrote x86 equivalent
Code:
```proc intdiv ; edx/eax -> eax - result, edx - reminder
push ebx
mov ecx,33
xor ebx,ebx
@@:
dec ecx
jz  @F
jz  @B
cmp ebx,eax
jb  @B
sub ebx,eax
inc edx
jmp @B
@@:
mov eax,edx
mov edx,ebx
pop ebx
ret
endp
```

_________________
UNICODE forever!
14 Apr 2006, 07:45
moriman

Joined: 01 Apr 2006
Posts: 55
Location: Northern Ireland
moriman 26 Apr 2006, 22:46
Thanks for the time in rewriting this shoorick
Unfortunately, this isn't really the type of thing I'm after. In the OP I was saying that the numbers would be too large to fit in the 32-bit registers, so I would have to do the division in stages.

Revolution...
Quote:

There are three main algorithms for doing this type of integer division: the classical algorithm, Barrett's algorithm and Montgomery's algorithm. There is a myriad of information about all these methods available with your favourite search engines.

I have searched (and found) the myriad of info on Barret's and Montgomery's algorithms. Sad to say it appears that I only thought I was smart at math. I cannot find any 'plain english' descriptions of these algorithms and cannot seem to get my head around the info that I have read

Oh well, suppose I'll just have to keep plugging away...
26 Apr 2006, 22:46

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Okay, I've definately seen 64-bit/64-bit somewhere on the net and that was pretty neat. Too bad I can't remember how that was done

...but at least your divider fits in 32 bits - you can use the following:
Code:
```        mov     ecx,(aend-a)
mov     edi,[b]
xor     edx,edx
divide:
sub     ecx,4
jc      .Imdone
mov     eax,[a+ecx]
div     edi
mov     [result+ecx],eax
jmp     divide
.Imdone:

a       dd      0987F39A8h,09BC542E2h, 089713A53h, 0E6845D43h
aend = \$

b       dd      00089F1C3h

result  rd      (aend-a)/4
```

What you'd have to do to have arbitrary precision division is to divide the divisor into multiple parts and then cascade the results.

Like with 75/15=5 (these are all DECimals) you can divide 75/5 and 75/3 because 3*5=15. Then you can do something like this: 75/5/3=5 where 5 and 3 both fit in the machine word.

The only problem is to find a way to find factors of the divisor that doesn't fit in the machine word :S
27 Apr 2006, 09:54

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Okay, first thank you shoorick and second, take a look at what I made out of this code-fragment:

The only sad part is that I saw a rather elegant division algo but I can't write one myself. I don't understand how to do it 32-bits at a time NOT 1 bit at a time like now. The example takes 12Kclocks on a P4 and 10Kclocks on a P!!!

For comparison: 10 DIVs take ~800 clocks

_________________
My updated idol http://www.agner.org/optimize/
27 Apr 2006, 16:44
shoorick

Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 28 Apr 2006, 09:47
i just showed a way: you can extend it into any sizes and optimize so far so you need - as well as i divided 32bit/16bit on 8bit cpu (without exception), you can do 128bit/64bit similar way on 32bit cpu and 256bit/128bit on 64bit cpu, etc., etc.
regards!
28 Apr 2006, 09:47
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum