flat assembler
Message board for the users of flat assembler.

Index > Windows > Somewhat useful function

Author
Thread Post new topic Reply to topic
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 26 Apr 2004, 17:37
I am planning to write a code, that does enormous calculations with integers a^b.
What would be the best way to go and could it be optimized for power of 2(not very neccessary)?
I was thinking of not using any MMX SSE instructions, but if it tends to be easy,
I'm willing to dig in and start learning.
I was thinking of something like this:
/*
c=a^b
a=5:b=7
b=2*2+3
c=(a²)²*a³
c=25²*125=78125
*/
...or take it a step further and try to split that 5:
/*
a=2*2+1:b=2*2+2+1
c=((2*2+1)²)²*(2*2+1)²*(2*2+1)
c=(2<<2+2<<2+1²)²*...
*/
I think it will take much stack space, but it will be Shocked blindingly fast
Have you ever tried that kind of function and what are your suggestions.
Most robust would be 5*5*5*5*5*5*5=78125 hmm, but what about 2^1000 Sad

P.S. Please don't give me any code, that I don't understand, because I
wanna BE there, when something happens. I WANT to do it by myself. I
just need some suggestions.
Post 26 Apr 2004, 17:37
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 26 Apr 2004, 18:20
The simplest algorithm is to split power by bits, in the same way as you do multiplication by adding values of one number shifted by the counts of bits determined by the position of ones in the second number.

For example, to multiply 5=101b by 7=111b, you do such addition:
5*1=101b (bit 0 of 7 is 1)
+5*2=1010b (bit 1 of 7 is 1)
+5*4=10100b (bit 2 of 7 is 1)
(at each stage you shift the value to add one bit left)

And analogously, to calculate the power 5^7, you do the multiplications:
5^1 (bit 0 of 7)
*5^2 (bit 1 of 7)
*5^4 (bit 2 of 7)

This way you need only to calculate the powers that are themselves powers of two, so each one is calculated just as a x*x, where x is previous power (for instance 5^4=5^2*5^2).

So the algorithm is: start with x equal to the base of power and result equal to 1, and in the loop shift the value of power by one bit left, if the bit you shifted out was 1, multiply result by your current x value, otherwise keep the result as it is, then do x=x*x and repeat loop until the shifted power value is 0.

This algorithm is very simple to implement and doesn't need any large stack space or anything similar.
Post 26 Apr 2004, 18:20
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 26 Apr 2004, 23:22
I'm having some problems:
OK, the input should be in DEC and this shouldn't be a problem to convert, but the output should also be in DEC. Converting long strings HEX=>DEC will be rough Razz

The other thing is that I tried making a scheme for overflow handling, but lightly said, I failed Sad
I got stuck there where I found, that MUL doesn't always flow over with just one bit. Sometimes its even 32 bits.
Maybe I could preserve EDX in some temp register Confused ...troubles, troubles Sad
Post 26 Apr 2004, 23:22
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 27 Apr 2004, 05:38
What do you mean by "converting long strings HEX=>DEC"? Are we not talking about elementary binary calculations here?
Post 27 Apr 2004, 05:38
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Apr 2004, 08:20
I was thinking of something like:
2^100=1267650600228229401496703205376 Razz
I made my program read user input and start coding on calculations as we speak:)
My primary goal would be integers only, but in the future, maybe floats too...

Oh, and for simplicity just console output for now.
Post 27 Apr 2004, 08:20
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 27 Apr 2004, 08:23
Well, to get 2^100 and such you will need to do the calculations on very large (128 bit or more) numbers, that means a bit more work to do multiplications.
Post 27 Apr 2004, 08:23
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Apr 2004, 08:29
I mean, there are no fixed lengths because 100^100 is even bigger and I want to output them all, like a 10000 character buffer.
That is why I said LONG strings, because it is hard to convert 100byte hex numbers to strings for output
Post 27 Apr 2004, 08:29
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Apr 2004, 08:32
Some progress stalls Sad

edx=0 and eax=*buffer and ebx=*endOfBuf
Code:
NextNo:shl  edx,1      ;/*
       mov  ecx,edx    ;    Multiply by 10
       shl  edx,2      ;       x=2³x+2x
       add  edx,ecx    ;*/
       add  dl,[eax]   ;edx has "0"-"9"
       sub  dl,"0"     ;edx has 0-9
       add  eax,1      ;take the next number
       sub  ebx,1      ;The end is getting near:)
jnz  NextNo 
    

Update:
Code:
lea edx,[5*edx]     ;that is lea edx,[edx+4*edx]
shl edx,1           ;that is add edx,edx
;... here edx is 10times bigger than before
;and no need to waste ecx (speed remains the same on 3µop CPUs)
    


The program only gave junk in console mode(*.com), but from the point
where I moved to exe, it did ok.


Last edited by Madis731 on 10 Aug 2004, 15:40; edited 1 time in total
Post 27 Apr 2004, 08:32
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 27 Apr 2004, 09:26
This will work only for numbers that can fit in 32 bits. For larger numbers you need to store all dwords in memory and perform shifts and additions on the whole dword sequences (as you need calculations on numbers with no fixed length, you will need also to store the count of dword for each number somewhere). This is quite easy, it's the multiplication of such numbers that may be your problem.

For floats it would be much easier, but you would loose the precision.
Post 27 Apr 2004, 09:26
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 28 Apr 2004, 22:38
Heh I just realized, that I can't calculate it at all, because it would take alot (ALOT) of time to convert them to decimal. The best way to go would be to make a string array and do calculations on them. To conserve space, I would have to pack 8 decimals in 32 bits. The only problem is that I can't abuse arithmetical advantages of digital logic :'(

Isn't there a way to get over this?
If there is an easy algorithm to get that FFFFFFFFFFFFFFFF is 18446744073709551615...

Hehe, I just came to an idea of lookup-tables, but I can't possibly make one for 1000+ byte numbers, can I?

P.S. Rolling Eyes discovered a nice trick that might help, to get the length of the output, I would do this: A^B=C length(C)=int(log(A)*B+1)
Now I know how much memory should I set aside for calculations.

P.P.S, another problem arises, because I don't know how to take a log() from a number. FPU has got 4 constants, but no log() operation. I'm trying to combine the constants to make my own operation Arrow haven't had much success Crying or Very sad

Math tricks:
To all of you wondering hot to get log(that's base 10 logarithm)
I have a solution for you:

Code:
fninit  ;If you haven't had calc in FP before
fldlg2                  ;ST0=log10(2)
fild    dword [dat]     ;ST0=X,ST1=log10(2)
fyl2x                   ;ST0=log10(2)*log2(X)
fist    dword [dat+4]   ;Y=ST0    


Last edited by Madis731 on 06 Jul 2004, 11:21; edited 1 time in total
Post 28 Apr 2004, 22:38
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 10 Aug 2004, 12:10
It's done:
Available @ http://www.itcollege.ee/~mkalme/ => click "Progemine" and then "Power" to download.
NB! in 7-zip format

...and if I'm lucky, should soon be in the examples section also.

P.S. Privalov, you were right about the limit: the code only handled 1byte long
integers meaning 255^255 was the biggest input, but I got over that and now
it can handle upto 4294967295^4294967295 *theoretically* but another limit
is 9char buffer so 99^999999 is about the largest number you can input.
Caution!!! 9542425bytes long (9.4MB) can take hours to calculate even on high
end computers Sad
Post 10 Aug 2004, 12:10
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Imagist



Joined: 13 Jun 2004
Posts: 114
Location: Pennsylvania (USA)
Imagist 10 Aug 2004, 19:26
Privalov, I'm not sure the algo you described is any different from the brute force algo. 5^7 can be calculated as 5*5*5*5*5*5*5 just as quickly. Your algo simply does 5*(5*5)*(5*5*5*5). Is there any difference that I'm not seeing?
Post 10 Aug 2004, 19:26
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 10 Aug 2004, 19:43
Yes, a big difference:
when doing 5*5*5*5*5*5*5 you have 6 multiplies, but when using some
neat tricks you can have as less as 4 multiplies!
Here we go:
5 is our input
let's square it: 5*5=25 <= the first
now let's do the first part: 5*(5*5)=25 <=the second
now we don't have to go through this again, do we?
5*5*5*5=25*25=625 <=the third
and finally, all together 625*25=...=78125 voila!

When 6/4=1.5X improvement isn't enough, try 2^32
2*2*...*2 - that is 31 multiplies comparing to 5 multiplies with that
ingenious trick :$ 31/5=6.2X improvement WOW!!! Smile
Post 10 Aug 2004, 19:43
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Imagist



Joined: 13 Jun 2004
Posts: 114
Location: Pennsylvania (USA)
Imagist 10 Aug 2004, 20:13
Here's the steps as I see it:

First algo:
1. 5*5=25
2. 25*5=125
3. 125*5=625
4. 625*5=3125
5. 3125*5=15625
6. 15625*5=71825

Second algo:
1. 5=5

2. 5*5=25

3. 5*5=25 (you already did this, but you're doing it again because your computer doesn't know you did this, and if you tell it that it will have to do two things: be told that it knows it (through a condition test) and go find it. One thing is better than two, no?)
4. 5*5=25 (same thing)
5. 25*25=625

6. 5*25=125
7. 125*625=71825

So while the human mind, while thinking of the thing it just did, doesn't need to recalculate 5*5, the computer does, or must go off and get it, which takes longer.

I see what you are trying to do, using the same ideas as binary searching and sorting, but it doesn't work here because of the nature of the operations being performed.

You may argue that this is only the case when using small numbers like 5 and 7, but this is not the case. That algorithm only ever matches the speed of the brute force algo when the exponent is a power of two: the farther away from being a power of two the exponent is, the worse the performance. 7 is only one away from 8 (a power of two), but numbers like 96, which are far away from the nearest powers of two would be horrible to calculate in cases like 5^96.
Post 10 Aug 2004, 20:13
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 10 Aug 2004, 20:47
Code:
;We'd like to calculate 5^96
;OK, which one is faster?
;1)
data  dd 5
mov   eax,1 ;Multiplying w/ 0 would be pointless
mov   ecx,95
Recalc:
    mul   [data]
    dec   ecx
    jnc
;The result in eax

;OR
;2)
data  dd 5
mov   eax,[data]
mul   eax        ;25
mul   eax        ;625
mul   eax        ;390625
mul   eax        ;152587890625
mul   eax        ;23283064365386962890625
mov   ebx,eax    ;-"-
mul   eax        ;54210108624275221700372640043497\
                 ;0855712890625
mul   ebx        ;12621774483536188886587657044524\
                 ;57967477130296174436807632446290\
                 ;625
;The result in eax

;This is a VERY bad example, because 5^96 is 7dwords
;in length and you get only the lowest dword, but
;it is just to show you that computer does NOT have
;to forget everything, computers DO have memory!


    
Post 10 Aug 2004, 20:47
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Imagist



Joined: 13 Jun 2004
Posts: 114
Location: Pennsylvania (USA)
Imagist 11 Aug 2004, 00:26
The second, of course. But that is only because the parts which make the macro adaptable to any input (not just 5 and 96) have been taken out.
Post 11 Aug 2004, 00:26
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 11 Aug 2004, 06:10
Well, if you still don't believe me, try out my code. Change some lines to
do the multiplying the hard way. I've implemented the same technique
WIHTOUT any macros and it's working fine. 167^765 is calculated in only
a second. The interesting part is that the binary value is calculated within
under 1% of programs time, the conversion HEX=>BCD=>DEC takes longer.
Post 11 Aug 2004, 06:10
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Imagist



Joined: 13 Jun 2004
Posts: 114
Location: Pennsylvania (USA)
Imagist 11 Aug 2004, 16:46
That's exactly my point: creating a macro that adapts to any input makes the second method slower. If you make the macro non-adaptive, or take it out of a macro entirely, then you are right, the second macro is faster. But that code would also defeat the purpose of the macro, because it can only be used for one specific situation.
Post 11 Aug 2004, 16:46
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.