Message board for the users of flat assembler.
> Windows > Somewhat useful function
Madis731 26 Apr 2004, 17:37
I am planning to write a code, that does enormous calculations with integers a^b.
What would be the best way to go and could it be optimized for power of 2(not very neccessary)?
I was thinking of not using any MMX SSE instructions, but if it tends to be easy,
I'm willing to dig in and start learning.
I was thinking of something like this:
...or take it a step further and try to split that 5:
I think it will take much stack space, but it will be blindingly fast
Have you ever tried that kind of function and what are your suggestions.
Most robust would be 5*5*5*5*5*5*5=78125 hmm, but what about 2^1000
P.S. Please don't give me any code, that I don't understand, because I
wanna BE there, when something happens. I WANT to do it by myself. I
just need some suggestions.
|26 Apr 2004, 17:37||
Tomasz Grysztar 26 Apr 2004, 18:20
The simplest algorithm is to split power by bits, in the same way as you do multiplication by adding values of one number shifted by the counts of bits determined by the position of ones in the second number.
For example, to multiply 5=101b by 7=111b, you do such addition:
5*1=101b (bit 0 of 7 is 1)
+5*2=1010b (bit 1 of 7 is 1)
+5*4=10100b (bit 2 of 7 is 1)
(at each stage you shift the value to add one bit left)
And analogously, to calculate the power 5^7, you do the multiplications:
5^1 (bit 0 of 7)
*5^2 (bit 1 of 7)
*5^4 (bit 2 of 7)
This way you need only to calculate the powers that are themselves powers of two, so each one is calculated just as a x*x, where x is previous power (for instance 5^4=5^2*5^2).
So the algorithm is: start with x equal to the base of power and result equal to 1, and in the loop shift the value of power by one bit left, if the bit you shifted out was 1, multiply result by your current x value, otherwise keep the result as it is, then do x=x*x and repeat loop until the shifted power value is 0.
This algorithm is very simple to implement and doesn't need any large stack space or anything similar.
|26 Apr 2004, 18:20||
Madis731 26 Apr 2004, 23:22
I'm having some problems:
OK, the input should be in DEC and this shouldn't be a problem to convert, but the output should also be in DEC. Converting long strings HEX=>DEC will be rough
The other thing is that I tried making a scheme for overflow handling, but lightly said, I failed
I got stuck there where I found, that MUL doesn't always flow over with just one bit. Sometimes its even 32 bits.
Maybe I could preserve EDX in some temp register ...troubles, troubles
|26 Apr 2004, 23:22||
Tomasz Grysztar 27 Apr 2004, 05:38
What do you mean by "converting long strings HEX=>DEC"? Are we not talking about elementary binary calculations here?
|27 Apr 2004, 05:38||
Madis731 27 Apr 2004, 08:20
I was thinking of something like:
I made my program read user input and start coding on calculations as we speak:)
My primary goal would be integers only, but in the future, maybe floats too...
Oh, and for simplicity just console output for now.
|27 Apr 2004, 08:20||
Tomasz Grysztar 27 Apr 2004, 08:23
Well, to get 2^100 and such you will need to do the calculations on very large (128 bit or more) numbers, that means a bit more work to do multiplications.
|27 Apr 2004, 08:23||
Madis731 27 Apr 2004, 08:29
I mean, there are no fixed lengths because 100^100 is even bigger and I want to output them all, like a 10000 character buffer.
That is why I said LONG strings, because it is hard to convert 100byte hex numbers to strings for output
|27 Apr 2004, 08:29||
Madis731 27 Apr 2004, 08:32
Some progress stalls
edx=0 and eax=*buffer and ebx=*endOfBuf
NextNo:shl edx,1 ;/* mov ecx,edx ; Multiply by 10 shl edx,2 ; x=2Â³x+2x add edx,ecx ;*/ add dl,[eax] ;edx has "0"-"9" sub dl,"0" ;edx has 0-9 add eax,1 ;take the next number sub ebx,1 ;The end is getting near:) jnz NextNo
lea edx,[5*edx] ;that is lea edx,[edx+4*edx] shl edx,1 ;that is add edx,edx ;... here edx is 10times bigger than before ;and no need to waste ecx (speed remains the same on 3Âµop CPUs)
The program only gave junk in console mode(*.com), but from the point
where I moved to exe, it did ok.
Last edited by Madis731 on 10 Aug 2004, 15:40; edited 1 time in total
|27 Apr 2004, 08:32||
Tomasz Grysztar 27 Apr 2004, 09:26
This will work only for numbers that can fit in 32 bits. For larger numbers you need to store all dwords in memory and perform shifts and additions on the whole dword sequences (as you need calculations on numbers with no fixed length, you will need also to store the count of dword for each number somewhere). This is quite easy, it's the multiplication of such numbers that may be your problem.
For floats it would be much easier, but you would loose the precision.
|27 Apr 2004, 09:26||
Madis731 28 Apr 2004, 22:38
Heh I just realized, that I can't calculate it at all, because it would take alot (ALOT) of time to convert them to decimal. The best way to go would be to make a string array and do calculations on them. To conserve space, I would have to pack 8 decimals in 32 bits. The only problem is that I can't abuse arithmetical advantages of digital logic :'(
Isn't there a way to get over this?
If there is an easy algorithm to get that FFFFFFFFFFFFFFFF is 18446744073709551615...
Hehe, I just came to an idea of lookup-tables, but I can't possibly make one for 1000+ byte numbers, can I?
P.S. discovered a nice trick that might help, to get the length of the output, I would do this: A^B=C length(C)=int(log(A)*B+1)
Now I know how much memory should I set aside for calculations.
P.P.S, another problem arises, because I don't know how to take a log() from a number. FPU has got 4 constants, but no log() operation. I'm trying to combine the constants to make my own operation haven't had much success
To all of you wondering hot to get log(that's base 10 logarithm)
I have a solution for you:
fninit ;If you haven't had calc in FP before fldlg2 ;ST0=log10(2) fild dword [dat] ;ST0=X,ST1=log10(2) fyl2x ;ST0=log10(2)*log2(X) fist dword [dat+4] ;Y=ST0
Last edited by Madis731 on 06 Jul 2004, 11:21; edited 1 time in total
|28 Apr 2004, 22:38||
Madis731 10 Aug 2004, 12:10
Available @ http://www.itcollege.ee/~mkalme/ => click "Progemine" and then "Power" to download.
NB! in 7-zip format
...and if I'm lucky, should soon be in the examples section also.
P.S. Privalov, you were right about the limit: the code only handled 1byte long
integers meaning 255^255 was the biggest input, but I got over that and now
it can handle upto 4294967295^4294967295 *theoretically* but another limit
is 9char buffer so 99^999999 is about the largest number you can input.
Caution!!! 9542425bytes long (9.4MB) can take hours to calculate even on high
|10 Aug 2004, 12:10||
Imagist 10 Aug 2004, 19:26
Privalov, I'm not sure the algo you described is any different from the brute force algo. 5^7 can be calculated as 5*5*5*5*5*5*5 just as quickly. Your algo simply does 5*(5*5)*(5*5*5*5). Is there any difference that I'm not seeing?
|10 Aug 2004, 19:26||
Madis731 10 Aug 2004, 19:43
Yes, a big difference:
when doing 5*5*5*5*5*5*5 you have 6 multiplies, but when using some
neat tricks you can have as less as 4 multiplies!
Here we go:
5 is our input
let's square it: 5*5=25 <= the first
now let's do the first part: 5*(5*5)=25 <=the second
now we don't have to go through this again, do we?
5*5*5*5=25*25=625 <=the third
and finally, all together 625*25=...=78125 voila!
When 6/4=1.5X improvement isn't enough, try 2^32
2*2*...*2 - that is 31 multiplies comparing to 5 multiplies with that
ingenious trick :$ 31/5=6.2X improvement WOW!!!
|10 Aug 2004, 19:43||
Imagist 10 Aug 2004, 20:13
Here's the steps as I see it:
3. 5*5=25 (you already did this, but you're doing it again because your computer doesn't know you did this, and if you tell it that it will have to do two things: be told that it knows it (through a condition test) and go find it. One thing is better than two, no?)
4. 5*5=25 (same thing)
So while the human mind, while thinking of the thing it just did, doesn't need to recalculate 5*5, the computer does, or must go off and get it, which takes longer.
I see what you are trying to do, using the same ideas as binary searching and sorting, but it doesn't work here because of the nature of the operations being performed.
You may argue that this is only the case when using small numbers like 5 and 7, but this is not the case. That algorithm only ever matches the speed of the brute force algo when the exponent is a power of two: the farther away from being a power of two the exponent is, the worse the performance. 7 is only one away from 8 (a power of two), but numbers like 96, which are far away from the nearest powers of two would be horrible to calculate in cases like 5^96.
|10 Aug 2004, 20:13||
Madis731 10 Aug 2004, 20:47
;We'd like to calculate 5^96 ;OK, which one is faster? ;1) data dd 5 mov eax,1 ;Multiplying w/ 0 would be pointless mov ecx,95 Recalc: mul [data] dec ecx jnc ;The result in eax ;OR ;2) data dd 5 mov eax,[data] mul eax ;25 mul eax ;625 mul eax ;390625 mul eax ;152587890625 mul eax ;23283064365386962890625 mov ebx,eax ;-"- mul eax ;54210108624275221700372640043497\ ;0855712890625 mul ebx ;12621774483536188886587657044524\ ;57967477130296174436807632446290\ ;625 ;The result in eax ;This is a VERY bad example, because 5^96 is 7dwords ;in length and you get only the lowest dword, but ;it is just to show you that computer does NOT have ;to forget everything, computers DO have memory!
|10 Aug 2004, 20:47||
Imagist 11 Aug 2004, 00:26
The second, of course. But that is only because the parts which make the macro adaptable to any input (not just 5 and 96) have been taken out.
|11 Aug 2004, 00:26||
Madis731 11 Aug 2004, 06:10
Well, if you still don't believe me, try out my code. Change some lines to
do the multiplying the hard way. I've implemented the same technique
WIHTOUT any macros and it's working fine. 167^765 is calculated in only
a second. The interesting part is that the binary value is calculated within
under 1% of programs time, the conversion HEX=>BCD=>DEC takes longer.
|11 Aug 2004, 06:10||
Imagist 11 Aug 2004, 16:46
That's exactly my point: creating a macro that adapts to any input makes the second method slower. If you make the macro non-adaptive, or take it out of a macro entirely, then you are right, the second macro is faster. But that code would also defeat the purpose of the macro, because it can only be used for one specific situation.
|11 Aug 2004, 16:46||
< Last Thread | Next Thread >
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.