flat assembler
Message board for the users of flat assembler.

 Index > Main > counting bits
Author
sloppy

Joined: 10 Nov 2007
Posts: 6
sloppy 08 Dec 2007, 19:43
Hi,

I would like to know if there is an instruction to count how many "ones" (or zeros) there are in a given number (in binary).

Example:
7 ->111 -> 3 ones
5 ->101-> 2 ones

At the moment I do it this way:
Code:
```mov eax,15 ; 15 is a given number
xor edi,edi
xor ebx,ebx
irp count,1,2,4,8,16,32,64,128
{
local nextstep
test eax, count
jz nextstep
inc ebx
nextstep:
}
; now ebx is what I want
```

The above code is only for 8 bits numbers, but if there is an instruction for me also 32bit or 128 bits numbers are ok and I know before how many bits I have. I use a pentium4 and I have no problem of portability.
If there is not anything can you give me some more ideas to solve the problem faster? I am still learning so please be patient.

Thank,
Luca
08 Dec 2007, 19:43
LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 08 Dec 2007, 20:04
There is one, popcnt, but will be available on SSE4 (not publicly available chips exist yet). There are some bit tricks that I can't remember know (and I also lost the link ), but here another method:
Code:
```; EAX = number that its bits will be counted

xor ecx, ecx
mov cl, 32

@@:
rol eax, 1
dec cl
jnz @b

; At this point CH has the number of ones
```
08 Dec 2007, 20:04
sloppy

Joined: 10 Nov 2007
Posts: 6
sloppy 08 Dec 2007, 20:18
Thanks,

I will search more.

as I am still learning... how can I usually know which of two method is faster? Is a *big* loop with "time" command the only way?
08 Dec 2007, 20:18
bitRAKE

Joined: 21 Jul 2003
Posts: 3892
Location: vpcmipstrm
bitRAKE 08 Dec 2007, 21:08
http://graphics.stanford.edu/~seander/bithacks.html

I've converted all those algorithms to x86. AMD optimization manual has a fast implementation for ALU 32-bit population count. It can be extended easily to MMX/SSE - a cycle per byte is possible, iirc.
Code:
```     xor ecx,ecx
.1:  lea edx,[eax-1]
inc ecx
and eax,edx
jne .1    ```
...only loops for number of bits set.
08 Dec 2007, 21:08
LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 08 Dec 2007, 21:59
Hey that was the link (And I think I saw another one too).

AMD has this implementation in the manual that bitRAKE says:
Code:
```unsigned int popcount(unsigned int v)
{
unsigned int retVal;
__asm {
mov eax, [v] ; v
mov edx, eax ; v
shr eax, 1 ; v >> 1
and eax, 055555555h ; (v >> 1) & 0x55555555
sub edx, eax ; w = v - ((v >> 1) & 0x55555555)
mov eax, edx ; w
shr edx, 2 ; w >> 2
and eax, 033333333h ; w & 0x33333333
and edx, 033333333h ; (w >> 2) & 0x33333333
add eax, edx ; x = (w & 0x33333333) + ((w >> 2) &
; 0x33333333)
mov edx, eax ; x
shr eax, 4 ; x >> 4
add eax, edx ; x + (x >> 4)
and eax, 00F0F0F0Fh ; y = (x + (x >> 4) & 0x0F0F0F0F)
imul eax, 001010101h ; y * 0x01010101
shr eax, 24 ; population count = (y *
; 0x01010101) >> 24
mov retVal, eax ; Store result.
}
return(retVal);
}    ```

Unless I'm missing something, it is just the ASM port of the C code at here (the last one).
08 Dec 2007, 21:59
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 19876
Location: In your JS exploiting you and your system
revolution 08 Dec 2007, 23:04
Here's a pop count I used in the past:
Code:
```population_count_eax:
;eax=value
mov     edx,eax         ;v
shr     eax,1           ;v>>1
and     eax,055555555h  ;(v>>1)&0x55555555
sub     edx,eax         ;w=v-((v>>1)&0x55555555)
mov     eax,edx         ;w
shr     edx,2           ;w>>2
and     eax,033333333h  ;w&0x33333333
and     edx,033333333h  ;(w>>2)&0x33333333
mov     edx,eax         ;x
shr     eax,4           ;x>>4
and     eax,00f0f0f0fh  ;y=(x+(x>>4)&0x0f0f0f0f)
imul    eax,001010101h  ;y*0x01010101
shr     eax,24          ;population count=(y*0x01010101)>>24
ret

align 8
c5555     dq      05555555555555555h
c3333     dq      03333333333333333h
c0f0f     dq      00f0f0f0f0f0f0f0fh

population_count_mm0:
movq    mm1,mm0         ;v
psrld   mm0,1           ;v >> 1
pand    mm0,[c5555]     ;(v >> 1) & 0x55555555
psubd   mm1,mm0         ;w = v - ((v >> 1) & 0x55555555)
movq    mm0,mm1         ;w
psrld   mm1,2           ;w >> 2
pand    mm0,[c3333]     ;w & 0x33333333
pand    mm1,[c3333]     ;(w >> 2) & 0x33333333
paddd   mm0,mm1         ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333)
movq    mm1,mm0         ;x
psrld   mm0,4           ;x >> 4
paddd   mm0,mm1         ;x + (x >> 4)
pand    mm0,[c0f0f]     ;y = (x + (x >> 4) & 0x0f0f0f0f)
pxor    mm1,mm1         ;0
psadbw  mm0,mm1         ;sum across all 8 bytes
movd    eax,mm0         ;result in EAX per calling
ret

macro population_count_mac result,source {
local x
x=source
x=x-((x shr 1) and 0x5555555555555555)
x=((x shr 2) and 0x3333333333333333)+(x and 0x3333333333333333)
x=(x+(x shr 4)) and 0x0f0f0f0f0f0f0f0f
x=x+(x shr 8)
x=x+(x shr 16)
result=(x+(x shr 32)) and 0x7f
}
```
Wow, looks exactly like AMD's version. But there is also an MMX version a fasm macro version.
08 Dec 2007, 23:04
sloppy

Joined: 10 Nov 2007
Posts: 6
sloppy 09 Dec 2007, 12:35
Ok ,thank you everybody, I will make some test (and try to understand why those method work )
09 Dec 2007, 12:35
Borsuc

Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 09 Dec 2007, 12:50
revolution's code is by far the fastest since it works on bits in parallel and doesn't require any loops
09 Dec 2007, 12:50
edfed

Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 09 Dec 2007, 15:35
adopted the bit magic bitcount.
but for chaining bits set to 0, need an other algorythm. something with bit test and loop.
09 Dec 2007, 15:35
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 19876
Location: In your JS exploiting you and your system
revolution 09 Dec 2007, 16:01
edfed: For counting zeros just invert your input and count ones.
Code:
```not eax
call population_count_eax
ret    ```
09 Dec 2007, 16:01
edfed

Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 09 Dec 2007, 19:23
to count 0, just invert the instructions and the mask
make a function for sero, and a function for one, or a programmable function.
but to count chaining 0, and return the first chain of N (ecx) 0's or 1's

a pointer (ebx) in a BITmap @[seg:esi].and a size (ecx)

Code:
```zero?:
;esi=bitmap
;ebx=pointer
;ecx=count in bits
db size_of_bitmap_in_bytes
db 001,000,000,000,000,000,000,000,000,000,000,000,000,100
db 000,000,001,000,000,001,000,001,000,000,000,000,000,001
db 800,700,600,100,500,500,500,004,000,000,000,000,000,000
db 000,000,000,000,000,000,000,000,000,000,000,000,000,000
db 000,000,000,000,000,000,000,000,000,000,000,000,000,000
db 255,255,255,255,255,000,000
@@:
```

an error: ;carry=1
then the maximum following 0 in the BIT map.
09 Dec 2007, 19:23
vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 09 Dec 2007, 19:59
wouldn't it be faster with byte lookup table?
09 Dec 2007, 19:59
edfed

Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 09 Dec 2007, 20:09
vid wrote:
wouldn't it be faster with byte lookup table?

Code:
```mov ebx,@f
xlat ???
..
ret
@@:
byte_look_up_table:
db 0 ;00000000
db 1 ;00000001
db 1 ;00000010
db 2 ;00000011
db 1 ;00000100
db 2 ;00000101
db 2 ;00000110
db 3 ;00000111
...
db 7 ;11111101
db 7 ;11111110
db 8 ;11111111
```
09 Dec 2007, 20:09
MCD

Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 07 Jan 2008, 07:39
vid wrote:
wouldn't it be faster with byte lookup table?

sure, this is what you would use if you really need it done fast.

btw, the number of bits set is also called the Hamming weight.

here's another way to count the number of bits set:
Code:
```  mov     edx,eax
and     edx,01010101010101010101010101010101b
and     eax,10101010101010101010101010101010b
shr     eax,1
mov     edx,eax
and     edx,00110011001100110011001100110011b
and     eax,11001100110011001100110011001100b
shr     eax,2
mov     edx,eax
and     edx,00001111000011110000111100001111b
and     eax,11110000111100001111000011110000b
shr     eax,4
mov     edx,eax
and     edx,00000000111111110000000011111111b
and     eax,11111111000000001111111100000000b
shr     eax,8
mov     edx,eax
shr     eax,16
;result in al
```

it's not the fastest way. I just wanted to drop this here for completeness

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
07 Jan 2008, 07:39
LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 07 Jan 2008, 14:20
But it is somewhat more intuitive, perhaps revolution's code is an optimization of this one?
07 Jan 2008, 14:20

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 08 Jan 2008, 09:52
@edfed:
You wanted the algorithm for reading bit streams as I understood.
I warmly suggest: http://www.jjj.de/fxt/ look no more

Code:
```;This example uses 8 bits
;Bits are SET on 1>0 transition.
;Reading order is from lowest bit set to highest.
;IN: EAX=...10010111
mov ebx,eax
mov ecx,eax
shr ecx,1
xor ebx,ecx
and eax,ebx
;OUT: EAX=...10010100
```

You can easily modify this code to suit your needs like in what direction the transition happens (not eax at the beginning to make 0>1 transition) or the counting direction of bits (shl makes it higer-to-lower reading order).

From then on, you can use bsf/bsr & BTR combination loop to check where exactly the set bit lies.[/b]
08 Jan 2008, 09:52

Joined: 04 Oct 2010
Posts: 19
lemask 26 Nov 2010, 08:06
@revolution: your population count snippet reads very fine; every line is commented...
26 Nov 2010, 08:06
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum