flat assembler
Message board for the users of flat assembler.

Index > Main > Fastest Next Power of Two function

Author
Thread Post new topic Reply to topic
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 20 Jun 2006, 01:06
The fastest I've come up with is this:

Code:
xor ecx,ecx
dec eax ; so that the next power of two of a power of two is itself
top:
inc ecx
shr eax,1
cmp eax,0
jnz top
inc eax
shl eax,cl    


input into eax
output from eax

Let's see if anyone can come up with anything else. bsr seems to use lots more clock cycles than this function, btw.
Post 20 Jun 2006, 01:06
View user's profile Send private message Visit poster's website Reply with quote
Kain



Joined: 26 Oct 2003
Posts: 108
Kain 20 Jun 2006, 02:32
shr sets zero flag so "cmp eax,0" is unnecessary.
you might also try using add instead of inc. add is faster on many modern processors.

anyway, here is mine, don't know if it's faster.

Code:
        sub eax, 1
pow:
        mov     ecx, eax
        add     eax, 1
        and     ecx, eax
        jnz     pow
    
Post 20 Jun 2006, 02:32
View user's profile Send private message Reply with quote
Ivan2k2



Joined: 08 Sep 2004
Posts: 80
Location: Russia, Angarsk
Ivan2k2 20 Jun 2006, 03:47
2 Kain: it's much slower for big values
Post 20 Jun 2006, 03:47
View user's profile Send private message ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 20 Jun 2006, 04:09
Nice idea anyway, and if you replace sub/add with dec/inc you get a really small function, fits in an MMX register!! Very Happy
Post 20 Jun 2006, 04:09
View user's profile Send private message Reply with quote
Ivan2k2



Joined: 08 Sep 2004
Posts: 80
Location: Russia, Angarsk
Ivan2k2 20 Jun 2006, 04:35
and mine:
Code:
dec eax 
mov edx,eax 
shr edx,01h 
or  edx,eax 
mov eax,edx 
shr eax,02h 
or  eax,edx 
mov edx,eax 
shr edx,04h 
or  edx,eax 
mov eax,edx 
shr eax,08h 
or  eax,edx 
mov edx,eax 
shr edx,10h 
or  edx,eax
inc edx
mov eax,edx
    
Post 20 Jun 2006, 04:35
View user's profile Send private message ICQ Number Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 21 Jun 2006, 04:00
Code:
;;in eax;;out eax
nextpow2:
bsr ecx,eax
mov eax,1
jz .end
add ecx,1 ;; could replace with LUT mov eax,dword[LUT+ecx*4]
shl eax,cl ;; .data LUT dd 1,2,4,8,16,32,64,128, ... 2147483648
.end:
ret 0
    


ON modern processors BSR is pretty fast 11clocks the only time when the above would be slower is when eax contains 0, 1, 2 ,3 or 4 but for all other numbers up to ~ 2billion the above is fastest. The reason it's wouldn't be fastest for the small numbers is because the looping methods would only do 1 or 2 iterations so they would be slightly faster.
Post 21 Jun 2006, 04:00
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 21 Jun 2006, 23:20
r22 wrote:
ON modern processors BSR is pretty fast 11clocks


Where can I find information on numbers of clock cycles? The only stuff I can find is for 486 clock cycles, and I believe x86 CPUs have progressed a little since then. Wink
Post 21 Jun 2006, 23:20
View user's profile Send private message Visit poster's website Reply with quote
Ivan2k2



Joined: 08 Sep 2004
Posts: 80
Location: Russia, Angarsk
Ivan2k2 21 Jun 2006, 23:25
Post 21 Jun 2006, 23:25
View user's profile Send private message ICQ Number Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 22 Jun 2006, 20:54
Thanks, I've tried that before. I assume I'm doing something wrong though. Here's my code:

Code:
#include <stdio.h>
#include "asmlib.h"

inline unsigned int NextPow2(unsigned int value) {
        unsigned int x;
        asm("xorl %%ecx, %%ecx\n"
                "dec %0\n"
                "1:\n"
                "inc %%ecx\n"
                "shr $1, %0\n"
                "jnz 1b\n"
                "inc %0\n"
                "shl %%cl, %0\n"
                : "=r"(x)
                : "0"(value)
                : "ecx"
        );
        return x;
}

inline unsigned int NextPow2_BSR(unsigned int value) {   
        unsigned int x;
        asm("dec %1\n\t"
                "movl $2,%0\n\t"
                "bsr %1,%1\n\t"
                "shl %b1,%0\n\t" : "=r" (x) : "c" (value)
        );
        return x;
}

int main() {
        unsigned int x = 0;
        unsigned int y = 0;
        int first, last;
        for (x = 0; x < 32770; x++) {
                first = ReadClock();
                y = NextPow2_BSR(x);
                last = ReadClock();
                printf("%u -> %u - %d clocks\n", x, y, last - first);
        }
        return 0;
}
    


When calling NextPow2 it takes 116 clocks for everything but the first interation, while calling NextPow2_BSR is 128 clocks. Is this correct?
Post 22 Jun 2006, 20:54
View user's profile Send private message Visit poster's website Reply with quote
Ivan2k2



Joined: 08 Sep 2004
Posts: 80
Location: Russia, Angarsk
Ivan2k2 22 Jun 2006, 22:38
What CPU do you have ???
Post 22 Jun 2006, 22:38
View user's profile Send private message ICQ Number Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 22 Jun 2006, 22:43
Sempron 64 2800+.
Post 22 Jun 2006, 22:43
View user's profile Send private message Visit poster's website Reply with quote
Ivan2k2



Joined: 08 Sep 2004
Posts: 80
Location: Russia, Angarsk
Ivan2k2 23 Jun 2006, 05:14
i thought that you have intel, AMD have slightly different optimization technics and different clocks
Post 23 Jun 2006, 05:14
View user's profile Send private message ICQ Number Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 23 Jun 2006, 22:23
So the function utilizing bsr posted in my 3rd post is definitely faster than the other?
Post 23 Jun 2006, 22:23
View user's profile Send private message Visit poster's website Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 01 Dec 2007, 17:15
Code:
; FASM syntax
; Input - eax
; Output - eax
nextpow2:
     bsr ecx,eax
 mov eax,2
   jz .end
     shl eax,cl
  .end:
   ret    


Just change 'mov eax,2' to 'mov eax,1' if you want the previous power of two.

Can anyone do any better?

How about some SIMD versions?
Post 01 Dec 2007, 17:15
View user's profile Send private message Visit poster's website Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki 22 Dec 2007, 15:18
Assuming that you are working with unsigned int how does that code handle input of 0x80000001? What does it return?
Post 22 Dec 2007, 15:18
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 22 Dec 2007, 18:04
$0 with "mov eax, 2" and $80000000 with "mov eax, 1", perfectly valid considering that unsigned numbers are always positive or zero, and obey the laws of arithmetic modulo 2^n, where n is the number of bits in the type (n=32 in this case).
Post 22 Dec 2007, 18:04
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20449
Location: In your JS exploiting you and your system
revolution 22 Dec 2007, 21:13
The code with mov eax,2 generates the wrong answer with an input of 0.
Post 22 Dec 2007, 21:13
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.