flat assembler
Message board for the users of flat assembler.

Index > Main > Code optimization (AKA C vs. Asm)

Goto page Previous  1, 2, 3, 4, 5
Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 03 Jan 2010, 17:27
tthsqe wrote:
how did you get that I have two cores? only one 64 bit core trapped in a 32 bit OS ....
anyways, the result for three runs of the program was 1.83, 2.08, 1.84 GB/s
I was confused by the CPU-Z output.

The binary powering method only needs 20 itterations for 1024^2; wereas the add method needs over a million itterations ( O(log_2 N) verses O(N) ). Because the larger numbers cannot fit in cache, the goal should be to reduce the number of passes over the data. In the case of operations like ADD/SUB and shift - simple pass type - the speed reduced to memory bus speed.

Conversely, if an algorithm could operate on cacheable representations without producing large intermediary non-cacheable numbers, then about ten times more passes could be made over the cacheable fragments. The binary powering method is almost such an algorithm as the number of bits double each itteration -- only the last couple itterations are likely uncacheable.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 03 Jan 2010, 17:27
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 04 Jan 2010, 02:48
Do you think there is a better way of computing f(2^n) than the a b c thingy posted before? Might mathematica uses exactly the same algorithm, but the reason the manual code was slower was for the memory allocation/deallocation and possible type checking on a, b, and c?
Post 04 Jan 2010, 02:48
View user's profile Send private message Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 04 Jan 2010, 10:18
bitRAKE wrote:
The binary powering method only needs 20 itterations for 1024^2; wereas the add method needs over a million itterations ( O(log_2 N) verses O(N) ).
Wow, you make it sound trivial. What about the iterations (is that the right word?) of all the multiplications you'll have to do. I'm really interested to see what you come up with. Smile

Well, I've got to thank you guys, I actually found out what fibo' numbers are Laughing and actually got to do some thinking. Anyway I discovered this algo while playing with the numbers and for now I'm going to call it the 'Alphonso algorithm' however I would be somewhat surprised if it doesn't already exist.

For calculating fib(2^n)
Code:
The 'alphonso series' where x is an integer >0 and when x = 1 then alph(x) = 3
 
alph(x+1) = alph(x)^2 - 2

Therefore

alph(1) = 3
alph(2) = 3^2 - 2  = 7
alph(3) = 7^2 - 2        = 47
alph(4) = 47^2 - 2      = 2207
alph(5) = 2207^2 - 2  = 4870847
alph(6) = 4870847^2 - 2 = 23725150497407
etc...


Where x is an integer >0 and knowing fib(2) = 1 then fib(2^(x + 1)) = fib(2^x) * alph(x)

so when 
x=1 then fib(2^(1 + 1)) = fib(2^1) * alph(1) then fib(4) = fib(2) * alph(1) then fib(4) = 1 * 3 = 3   
x=2 then fib(2^(2 + 1)) = fib(2^2) * alph(2) then fib(8) = fib(4) * alph(2) then fib(8) = 3 * 7 = 21
x=3 then fib(2^(3 + 1)) = fib(2^3) * alph(3) then fib(16) = fib(8) * alph(3) then fib(16) = 21 * 47 = 987
x=4 then fib(2^(4 + 1)) = fib(2^4) * alph(4) then fib(32) = fib(16) * alph(4) then fib(32) = 987 * 2207 = 2178309
etc...    
Of course it's still going to need some of those big multipliers that hopefully bitRAKE will shed light on. Very Happy Wonder if I should have used semicolons instead of all those then's, ohwell Rolling Eyes
Post 04 Jan 2010, 10:18
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 04 Jan 2010, 15:34
tthsqe wrote:
Do you think there is a better way of computing f(2^n) than the a b c thingy posted before? Might mathematica uses exactly the same algorithm, but the reason the manual code was slower was for the memory allocation/deallocation and possible type checking on a, b, and c?
At the very least there might be a way to insure that only the final destination is not in cache by computing: fib(4n-3), fib(4n-2), fib(4n-1), fib(4n). Or, leveraging some such series of smaller numbers - like Alphonso, has pointed out.

I have no intention of creating some new multiply code - look at GMP (and others) development for that. The itterations consist of three squares and three additions. I don't know: How many additions equals a square with your hardware?

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 04 Jan 2010, 15:34
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 04 Jan 2010, 19:45
Squaring a power of 2 is just 1 shift operation. On a "big number" (depending on the implementation) it would be 1 mem-copy/reallocation or 1 update of the explonent.

Squaring a NON power of 2 efficiently is likely done with clever tricks based around the ease of squaring the closest power of 2 and polynomial expansion.
Code:
SQUARE( N ) = 
X = ClosestPowerOfTwo( N )
A = N - X
RETURN X^2 + 2 * X + A

SQUARE( 513 ) 
X = 2 ^ Most Significant Bit = 512
A = 513 - 512 = 1
RETURN (X + A)^2 = (512 + 1)^2 = 512^2 + 2 * 512 + 1
    

To conclude, binary squaring takes...
1 Search and lookup for the BSR and cloestest power of 2
1 Shift/mem-move/realloc for X^2
3 Adds for + X + X + A

*** EDIT *** lest I be crucified, this is of course BEST CASE scenario where N is a power of 2 + 1.
For worst case you would have to recursively call the algorithm for A.

Code:
SQUARE( N ) = 
X = ClosestPowerOfTwo( N )
A = N - X
IF A = 1
   RETURN X^2 + 2 * X + A
ELSE
   RETURN X^2 + 2 * X + SQUARE( A )
    
Post 04 Jan 2010, 19:45
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 05 Jan 2010, 09:38
alphonso, great iead with the lucas numbers (your 'alphonso series')....
with them we can beat mathematica! Laughing


The following consistently beats the built in function by 5-10% on most inputs:
[Edited]
Code:
f[n_Integer] :=  
  Block[{f = 1, l = 1, d = IntegerDigits[n, 2]},  
    If[ 
    Length[d] > 1, (* if n has more than one bit *) 
      (* then *) 
          MapThread[ 
             If[#1 === 0,       (* #1 comes from the first list, #2 from the second *) 
               {f, l} = {f*l, l^2 - 2*#2};,                     (* #1 is a zero *) 
               {f, l} = {l*(f + l)/2 - #2, l*(5*f + l)/2 - #2}; (* else #1 is a one *) 
             ] &,  
          {d[[2 ;; -2]], (-1)^d[[1 ;; -3]]}    (* two list of bits to thread over *) 
          ]; 
          If[d[[-1]] === 0,          
             If[n < 0,          (* if lsb is 0 return one of these *)
                -f*l,
                 f*l
             ]
          ,    
          l*(f+l)/2-(-1)^d[[-2]]  (* else lsb is 1 and return this *)  
          ] 
    , (* else *) 
        n^2     (* else return n^2 *) 
    ] 
  ];    


The performance evens out as n contains more and more ones or gets larger in general.

ex:
f(2^26): 5.1 vs mathematica's timing of 5.6
f(1000^3): 100 vs what was below
f(1024^3): 105 " "

Note that it is still not optimal on inputs like 3*n where the triplication formula can be used to get even better results.


Last edited by tthsqe on 05 Jan 2010, 20:02; edited 3 times in total
Post 05 Jan 2010, 09:38
View user's profile Send private message Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 05 Jan 2010, 12:14
Thanks tthsqe, I didn't think it would be anything new but it was fun discovering the sequence. Smile

BitRAKE, I didn't mean to suggest you come up with a new method, I was just interested in seeing a 'generic' FASM solution for huge numbers. Sorry if I lead you to believe otherwise and no problem if there isn't any FASM code to be had.
Post 05 Jan 2010, 12:14
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 05 Jan 2010, 16:18
I'm interested in the algorithm that tthsqe posted but I find that math-program syntax fairly ambiguous. Is there a link to a syntax reference so I can properly interpret what the hell it's doing?

The liberal use of square brackets, commas and semicolons in the syntax and the lack of comments is probably what's throwing me off. Perhaps this is one method that math/science types ensure job security Very Happy

In any case I would like to be enlightened.
Post 05 Jan 2010, 16:18
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 05 Jan 2010, 18:15
r22,
It is a iteration of the equations:

f(2n) = f(n) * l(n)
l(2n) = l(n) ^ 2 - 2 (-1) ^ n

f(2n+1) = l(n) * (f(n) + l(n)) / 2 - (-1)^n
l(2n+1) = l(n) * (5 * f(n) + l(n)) / 2 - (-1)^n

Basically, just scan the binary digits msb to lsb, and apply the appropriate equation. Of course, at the last step we don't need to compute l.
A nice aspect is that it does eliminate the need for most of those mundane while and for loops, but I agree that the brackted syntax, though it may unify the whole landuage, is not always the most readable. Check out the documentation at http://reference.wolfram.com/mathematica/guide/Mathematica.html.
Note:
* f(-1), f(0) and f(1) need special cases, and f(-n) needs a factor of (-1)^(n-1)
* a[[b]] is the Part function
* MapThread really should be ScanThread if there was such a function
Edit: previous post has been updated


Last edited by tthsqe on 05 Jan 2010, 18:52; edited 1 time in total
Post 05 Jan 2010, 18:15
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 05 Jan 2010, 18:17
tthsqe, has been using Mathematica long enough to learn the shortcuts - there is a more verbose syntax which executes just as fast (*hint*). The inner IF statement is a pure function which performs the first operation if a bit is zero, and the second line if one. More obvious, {f, l} = {a, b} is shortcut for f=a, l=b. The #n notation within the pure function represents the arguments. Double brackets access elements of arrays. Much of the syntax is C-like.
Post 05 Jan 2010, 18:17
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 05 Jan 2010, 19:16
It's much clearer now, I am sufficiently enlightened.
Post 05 Jan 2010, 19:16
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 05 Jan 2010, 19:20
Quote:
there is a more verbose syntax which executes just as fast (*hint*).

What are you hinting at here?
Post 05 Jan 2010, 19:20
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 05 Jan 2010, 19:23
Ah, got it. Laughing Laughing
Post 05 Jan 2010, 19:23
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.