flat assembler
Message board for the users of flat assembler.
Index
> Main > your thoughts on division... 
Author 

shoorick
once i wrote division sub 32bit/16bit for 8080 (8bit CPU without div operation)  align, then subtract/shift  as well as division in column on the paper  32 cycles max.


13 Apr 2006, 04:33 

r22
The MagicNumber program optimized integer division, by finding the 'inverse' of the integer so that you could multiply it and shift the result to get the answer. But I'm pretty sure that program used the FPU to find the 'MagicNumber'.
The best way is to get an optimized BigNum lib would probably be the fastest. I'd probably try to convert the numbers to decimal and emulate long division OR try to emulate long division using the binary. But when your working with large numbers unless you have some fancy algorithm you'll pretty much end up with a Big Number subtract loop and compare loop. 

13 Apr 2006, 05:34 

moriman
r22 wrote: The MagicNumber program optimized integer division, by finding the 'inverse' of the integer so that you could multiply it and shift the result to get the answer. But I'm pretty sure that program used the FPU to find the 'MagicNumber'. I'd considered starting with 1/x but after the first division things end up as messy anyway r22 wrote: The best way is to get an optimized BigNum lib would probably be the fastest. Not sure I follow exactly here¿ r22 wrote: I'd probably try to convert the numbers to decimal and emulate long division OR try to emulate long division using the binary. But when your working with large numbers unless you have some fancy algorithm you'll pretty much end up with a Big Number subtract loop and compare loop. This is where I was tending to, but instead of simple sub\compare, I was thinking along the lines of starting with the midrange possible value of the quotient then half all the way up or down to the correct value something like 63/3 = 32? too large = 16 too small = 24 too large = 20 too small = 22 to large = 21 phew! 

13 Apr 2006, 16:16 

Borsuc
That's a pretty good idea, but it can get messy with some numbers, and you'll have lots of ifs. I know there's gotta be some math magic numbers here (maybe make a small lookup table to help you, from nibble to nibble, or byte to byte, I dunno). I'll try to look from a different perspective (lowlevel math you name it ). Though I don't have much time, I'll try.
Still it's good to think at it 

13 Apr 2006, 16:23 

revolution
There are three main algorithms for doing this type of integer division: the classical algorithm, Barrett's algorithm and Montgomery's algorithm. There is a myriad of information about all these methods available with your favourite search engines. r22's hint about bignum lib's was right on the mark. These operations are used for encryption/decryption and having a suitable algorithm combined with proper optimisation is a very important thing.
Depending on exactly what relative sizes of numbers and how often they change will dictate the best algorithm for your needs. I written have both ARM and x86 optimized versions of all three algorithms but unfortunately they are copyrighted by my company so I cannot paste them here 

13 Apr 2006, 17:54 

revolution
Oops, I forgot to mention, the RSA macros I post here some time back use Barrett's algorithm for the modular reduction but it can easily be modified for division.


13 Apr 2006, 18:00 

moriman
thx rev, I'll take a look when I get a chance


13 Apr 2006, 19:48 

shoorick
it took a lot of my efforts to found this in my archives (essp. because it was called 2.txt)
Code: MVI A,33 UDM1: DCR A JZ UDM6 XCHG DAD H XCHG RAL XTHL DAD H RAR JNC UDM2 INX H UDM2: XTHL DAD H RAL JNC UDM3 INX H UDM3: ORA A RAR PUSH H PUSH PSW MOV A,L SUB C MOV L,A MOV A,H SBB B MOV H,A JNC UDM4 POP PSW JC UDM5 POP H JMP UDM1 UDM4: POP PSW UDM5: INX SP INX SP INX D JMP UDM1 UDM6: it is so complex because 8080 has poor set of instructions unlike modern cpus, but anyway as museum exponat it was written to use in forth interpreter and provided division of 32bit number to 16bit divider and receive 32bit result and 16bit reminder. how does it work  shown on the picture below. can be easy converted to z80 cpu (hope, this post is not so spammy )[/code]
_________________ UNICODE forever! 

14 Apr 2006, 06:46 

shoorick
just wrote x86 equivalent
Code: proc intdiv ; edx/eax > eax  result, edx  reminder push ebx mov ecx,33 xor ebx,ebx @@: dec ecx jz @F add edx,edx adc ebx,ebx jz @B cmp ebx,eax jb @B sub ebx,eax inc edx jmp @B @@: mov eax,edx mov edx,ebx pop ebx ret endp _________________ UNICODE forever! 

14 Apr 2006, 07:45 

moriman
Thanks for the time in rewriting this shoorick
Unfortunately, this isn't really the type of thing I'm after. In the OP I was saying that the numbers would be too large to fit in the 32bit registers, so I would have to do the division in stages. Revolution... Quote:
I have searched (and found) the myriad of info on Barret's and Montgomery's algorithms. Sad to say it appears that I only thought I was smart at math. I cannot find any 'plain english' descriptions of these algorithms and cannot seem to get my head around the info that I have read Oh well, suppose I'll just have to keep plugging away... 

26 Apr 2006, 22:46 

Madis731
Okay, I've definately seen 64bit/64bit somewhere on the net and that was pretty neat. Too bad I can't remember how that was done
...but at least your divider fits in 32 bits  you can use the following: Code: mov ecx,(aenda) mov edi,[b] xor edx,edx divide: sub ecx,4 jc .Imdone mov eax,[a+ecx] div edi mov [result+ecx],eax jmp divide .Imdone: a dd 0987F39A8h,09BC542E2h, 089713A53h, 0E6845D43h aend = $ b dd 00089F1C3h result rd (aenda)/4 What you'd have to do to have arbitrary precision division is to divide the divisor into multiple parts and then cascade the results. Like with 75/15=5 (these are all DECimals) you can divide 75/5 and 75/3 because 3*5=15. Then you can do something like this: 75/5/3=5 where 5 and 3 both fit in the machine word. The only problem is to find a way to find factors of the divisor that doesn't fit in the machine word :S 

27 Apr 2006, 09:54 

Madis731
Okay, first thank you shoorick and second, take a look at what I made out of this codefragment:
The only sad part is that I saw a rather elegant division algo but I can't write one myself. I don't understand how to do it 32bits at a time NOT 1 bit at a time like now. The example takes 12Kclocks on a P4 and 10Kclocks on a P!!! For comparison: 10 DIVs take ~800 clocks


27 Apr 2006, 16:44 

shoorick
i just showed a way: you can extend it into any sizes and optimize so far so you need  as well as i divided 32bit/16bit on 8bit cpu (without exception), you can do 128bit/64bit similar way on 32bit cpu and 256bit/128bit on 64bit cpu, etc., etc.
regards! 

28 Apr 2006, 09:47 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.