flat assembler
Message board for the users of flat assembler.
Index
> Main > your thoughts on division... |
Author |
|
shoorick 13 Apr 2006, 04:33
once i wrote division sub 32bit/16bit for 8080 (8bit CPU without div operation) - align, then subtract/shift -- as well as division in column on the paper -- 32 cycles max.
|
|||
13 Apr 2006, 04:33 |
|
r22 13 Apr 2006, 05:34
The MagicNumber program optimized integer division, by finding the 'inverse' of the integer so that you could multiply it and shift the result to get the answer. But I'm pretty sure that program used the FPU to find the 'MagicNumber'.
The best way is to get an optimized BigNum lib would probably be the fastest. I'd probably try to convert the numbers to decimal and emulate long division OR try to emulate long division using the binary. But when your working with large numbers unless you have some fancy algorithm you'll pretty much end up with a Big Number subtract loop and compare loop. |
|||
13 Apr 2006, 05:34 |
|
moriman 13 Apr 2006, 16:16
r22 wrote: The MagicNumber program optimized integer division, by finding the 'inverse' of the integer so that you could multiply it and shift the result to get the answer. But I'm pretty sure that program used the FPU to find the 'MagicNumber'. I'd considered starting with 1/x but after the first division things end up as messy anyway r22 wrote: The best way is to get an optimized BigNum lib would probably be the fastest. Not sure I follow exactly here¿ r22 wrote: I'd probably try to convert the numbers to decimal and emulate long division OR try to emulate long division using the binary. But when your working with large numbers unless you have some fancy algorithm you'll pretty much end up with a Big Number subtract loop and compare loop. This is where I was tending to, but instead of simple sub\compare, I was thinking along the lines of starting with the midrange possible value of the quotient then half all the way up or down to the correct value something like 63/3 = 32? too large = 16 too small = 24 too large = 20 too small = 22 to large = 21 phew! |
|||
13 Apr 2006, 16:16 |
|
Borsuc 13 Apr 2006, 16:23
That's a pretty good idea, but it can get messy with some numbers, and you'll have lots of ifs. I know there's gotta be some math magic numbers here (maybe make a small lookup table to help you, from nibble to nibble, or byte to byte, I dunno). I'll try to look from a different perspective (low-level math you name it ). Though I don't have much time, I'll try.
Still it's good to think at it |
|||
13 Apr 2006, 16:23 |
|
revolution 13 Apr 2006, 17:54
There are three main algorithms for doing this type of integer division: the classical algorithm, Barrett's algorithm and Montgomery's algorithm. There is a myriad of information about all these methods available with your favourite search engines. r22's hint about bignum lib's was right on the mark. These operations are used for encryption/decryption and having a suitable algorithm combined with proper optimisation is a very important thing.
Depending on exactly what relative sizes of numbers and how often they change will dictate the best algorithm for your needs. I written have both ARM and x86 optimized versions of all three algorithms but unfortunately they are copyrighted by my company so I cannot paste them here |
|||
13 Apr 2006, 17:54 |
|
revolution 13 Apr 2006, 18:00
Oops, I forgot to mention, the RSA macros I post here some time back use Barrett's algorithm for the modular reduction but it can easily be modified for division.
|
|||
13 Apr 2006, 18:00 |
|
moriman 13 Apr 2006, 19:48
thx rev, I'll take a look when I get a chance
|
|||
13 Apr 2006, 19:48 |
|
shoorick 14 Apr 2006, 06:46
it took a lot of my efforts to found this in my archives (essp. because it was called 2.txt)
Code: MVI A,33 UDM1: DCR A JZ UDM6 XCHG DAD H XCHG RAL XTHL DAD H RAR JNC UDM2 INX H UDM2: XTHL DAD H RAL JNC UDM3 INX H UDM3: ORA A RAR PUSH H PUSH PSW MOV A,L SUB C MOV L,A MOV A,H SBB B MOV H,A JNC UDM4 POP PSW JC UDM5 POP H JMP UDM1 UDM4: POP PSW UDM5: INX SP INX SP INX D JMP UDM1 UDM6: it is so complex because 8080 has poor set of instructions unlike modern cpus, but anyway as museum exponat it was written to use in forth interpreter and provided division of 32-bit number to 16-bit divider and receive 32-bit result and 16-bit reminder. how does it work - shown on the picture below. can be easy converted to z80 cpu (hope, this post is not so spammy )[/code]
_________________ UNICODE forever! |
||||||||||
14 Apr 2006, 06:46 |
|
shoorick 14 Apr 2006, 07:45
just wrote x86 equivalent
Code: proc intdiv ; edx/eax -> eax - result, edx - reminder push ebx mov ecx,33 xor ebx,ebx @@: dec ecx jz @F add edx,edx adc ebx,ebx jz @B cmp ebx,eax jb @B sub ebx,eax inc edx jmp @B @@: mov eax,edx mov edx,ebx pop ebx ret endp _________________ UNICODE forever! |
|||
14 Apr 2006, 07:45 |
|
moriman 26 Apr 2006, 22:46
Thanks for the time in rewriting this shoorick
Unfortunately, this isn't really the type of thing I'm after. In the OP I was saying that the numbers would be too large to fit in the 32-bit registers, so I would have to do the division in stages. Revolution... Quote:
I have searched (and found) the myriad of info on Barret's and Montgomery's algorithms. Sad to say it appears that I only thought I was smart at math. I cannot find any 'plain english' descriptions of these algorithms and cannot seem to get my head around the info that I have read Oh well, suppose I'll just have to keep plugging away... |
|||
26 Apr 2006, 22:46 |
|
Madis731 27 Apr 2006, 09:54
Okay, I've definately seen 64-bit/64-bit somewhere on the net and that was pretty neat. Too bad I can't remember how that was done
...but at least your divider fits in 32 bits - you can use the following: Code: mov ecx,(aend-a) mov edi,[b] xor edx,edx divide: sub ecx,4 jc .Imdone mov eax,[a+ecx] div edi mov [result+ecx],eax jmp divide .Imdone: a dd 0987F39A8h,09BC542E2h, 089713A53h, 0E6845D43h aend = $ b dd 00089F1C3h result rd (aend-a)/4 What you'd have to do to have arbitrary precision division is to divide the divisor into multiple parts and then cascade the results. Like with 75/15=5 (these are all DECimals) you can divide 75/5 and 75/3 because 3*5=15. Then you can do something like this: 75/5/3=5 where 5 and 3 both fit in the machine word. The only problem is to find a way to find factors of the divisor that doesn't fit in the machine word :S |
|||
27 Apr 2006, 09:54 |
|
Madis731 27 Apr 2006, 16:44
Okay, first thank you shoorick and second, take a look at what I made out of this code-fragment:
The only sad part is that I saw a rather elegant division algo but I can't write one myself. I don't understand how to do it 32-bits at a time NOT 1 bit at a time like now. The example takes 12Kclocks on a P4 and 10Kclocks on a P!!! For comparison: 10 DIVs take ~800 clocks
|
|||||||||||
27 Apr 2006, 16:44 |
|
shoorick 28 Apr 2006, 09:47
i just showed a way: you can extend it into any sizes and optimize so far so you need - as well as i divided 32bit/16bit on 8bit cpu (without exception), you can do 128bit/64bit similar way on 32bit cpu and 256bit/128bit on 64bit cpu, etc., etc.
regards! |
|||
28 Apr 2006, 09:47 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.