flat assembler
Message board for the users of flat assembler.

Index > Main > counting bits

Author
Thread Post new topic Reply to topic
sloppy



Joined: 10 Nov 2007
Posts: 6
sloppy 08 Dec 2007, 19:43
Hi,

I would like to know if there is an instruction to count how many "ones" (or zeros) there are in a given number (in binary).

Example:
7 ->111 -> 3 ones
5 ->101-> 2 ones

At the moment I do it this way:
Code:
mov eax,15 ; 15 is a given number
xor edi,edi
xor ebx,ebx
irp count,1,2,4,8,16,32,64,128
{
local nextstep
test eax, count  
jz nextstep
inc ebx
nextstep:
}
; now ebx is what I want  
    

The above code is only for 8 bits numbers, but if there is an instruction for me also 32bit or 128 bits numbers are ok and I know before how many bits I have. I use a pentium4 and I have no problem of portability.
If there is not anything can you give me some more ideas to solve the problem faster? I am still learning so please be patient.

Thank,
Luca
Post 08 Dec 2007, 19:43
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 08 Dec 2007, 20:04
There is one, popcnt, but will be available on SSE4 (not publicly available chips exist yet). There are some bit tricks that I can't remember know (and I also lost the link Sad), but here another method:
Code:
; EAX = number that its bits will be counted

xor ecx, ecx
mov cl, 32

@@:
  rol eax, 1
  adc ch, 0
  dec cl
  jnz @b

; At this point CH has the number of ones
    
Post 08 Dec 2007, 20:04
View user's profile Send private message Reply with quote
sloppy



Joined: 10 Nov 2007
Posts: 6
sloppy 08 Dec 2007, 20:18
Thanks,

I will search more.

as I am still learning... how can I usually know which of two method is faster? Is a *big* loop with "time" command the only way?
Post 08 Dec 2007, 20:18
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4121
Location: vpcmpistri
bitRAKE 08 Dec 2007, 21:08
http://graphics.stanford.edu/~seander/bithacks.html

I've converted all those algorithms to x86. AMD optimization manual has a fast implementation for ALU 32-bit population count. It can be extended easily to MMX/SSE - a cycle per byte is possible, iirc.
Code:
     xor ecx,ecx
.1:  lea edx,[eax-1]
     inc ecx
     and eax,edx
 jne .1    
...only loops for number of bits set.
Post 08 Dec 2007, 21:08
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 08 Dec 2007, 21:59
Hey that was the link Very Happy (And I think I saw another one too).

AMD has this implementation in the manual that bitRAKE says:
Code:
unsigned int popcount(unsigned int v)
{
unsigned int retVal;
__asm {
mov eax, [v] ; v
mov edx, eax ; v
shr eax, 1 ; v >> 1
and eax, 055555555h ; (v >> 1) & 0x55555555
sub edx, eax ; w = v - ((v >> 1) & 0x55555555)
mov eax, edx ; w
shr edx, 2 ; w >> 2
and eax, 033333333h ; w & 0x33333333
and edx, 033333333h ; (w >> 2) & 0x33333333
add eax, edx ; x = (w & 0x33333333) + ((w >> 2) &
; 0x33333333)
mov edx, eax ; x
shr eax, 4 ; x >> 4
add eax, edx ; x + (x >> 4)
and eax, 00F0F0F0Fh ; y = (x + (x >> 4) & 0x0F0F0F0F)
imul eax, 001010101h ; y * 0x01010101
shr eax, 24 ; population count = (y *
; 0x01010101) >> 24
mov retVal, eax ; Store result.
}
return(retVal);
}    


Unless I'm missing something, it is just the ASM port of the C code at here (the last one).
Post 08 Dec 2007, 21:59
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20486
Location: In your JS exploiting you and your system
revolution 08 Dec 2007, 23:04
Here's a pop count I used in the past:
Code:
population_count_eax:
;eax=value
        mov     edx,eax         ;v
  shr     eax,1           ;v>>1
 and     eax,055555555h  ;(v>>1)&0x55555555
        sub     edx,eax         ;w=v-((v>>1)&0x55555555)
  mov     eax,edx         ;w
  shr     edx,2           ;w>>2
 and     eax,033333333h  ;w&0x33333333
   and     edx,033333333h  ;(w>>2)&0x33333333
        add     eax,edx         ;x=(w&0x33333333)+((w>>2)&0x33333333)
 mov     edx,eax         ;x
  shr     eax,4           ;x>>4
 add     eax,edx         ;x+(x>>4)
     and     eax,00f0f0f0fh  ;y=(x+(x>>4)&0x0f0f0f0f)
  imul    eax,001010101h  ;y*0x01010101
       shr     eax,24          ;population count=(y*0x01010101)>>24
  ret

align 8
c5555     dq      05555555555555555h
c3333     dq      03333333333333333h
c0f0f     dq      00f0f0f0f0f0f0f0fh

population_count_mm0:
 movq    mm1,mm0         ;v
  psrld   mm0,1           ;v >> 1
       pand    mm0,[c5555]     ;(v >> 1) & 0x55555555
    psubd   mm1,mm0         ;w = v - ((v >> 1) & 0x55555555)
  movq    mm0,mm1         ;w
  psrld   mm1,2           ;w >> 2
       pand    mm0,[c3333]     ;w & 0x33333333
 pand    mm1,[c3333]     ;(w >> 2) & 0x33333333
    paddd   mm0,mm1         ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333)
       movq    mm1,mm0         ;x
  psrld   mm0,4           ;x >> 4
       paddd   mm0,mm1         ;x + (x >> 4)
 pand    mm0,[c0f0f]     ;y = (x + (x >> 4) & 0x0f0f0f0f)
  pxor    mm1,mm1         ;0
  psadbw  mm0,mm1         ;sum across all 8 bytes
     movd    eax,mm0         ;result in EAX per calling
  ret

macro population_count_mac result,source {
  local x
     x=source
    x=x-((x shr 1) and 0x5555555555555555)
      x=((x shr 2) and 0x3333333333333333)+(x and 0x3333333333333333)
     x=(x+(x shr 4)) and 0x0f0f0f0f0f0f0f0f
      x=x+(x shr 8)
       x=x+(x shr 16)
      result=(x+(x shr 32)) and 0x7f
}
    
Wow, looks exactly like AMD's version. But there is also an MMX version a fasm macro version.
Post 08 Dec 2007, 23:04
View user's profile Send private message Visit poster's website Reply with quote
sloppy



Joined: 10 Nov 2007
Posts: 6
sloppy 09 Dec 2007, 12:35
Ok ,thank you everybody, I will make some test (and try to understand why those method work Smile )
Post 09 Dec 2007, 12:35
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 09 Dec 2007, 12:50
revolution's code is by far the fastest since it works on bits in parallel and doesn't require any loops Smile
Post 09 Dec 2007, 12:50
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4354
Location: Now
edfed 09 Dec 2007, 15:35
adopted the bit magic bitcount.
but for chaining bits set to 0, need an other algorythm. something with bit test and loop.
Post 09 Dec 2007, 15:35
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20486
Location: In your JS exploiting you and your system
revolution 09 Dec 2007, 16:01
edfed: For counting zeros just invert your input and count ones.
Code:
not eax
call population_count_eax
ret    
Post 09 Dec 2007, 16:01
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4354
Location: Now
edfed 09 Dec 2007, 19:23
to count 0, just invert the instructions and the mask
make a function for sero, and a function for one, or a programmable function.
but to count chaining 0, and return the first chain of N (ecx) 0's or 1's

a pointer (ebx) in a BITmap @[seg:esi].and a size (ecx)

Code:
zero?:
;esi=bitmap
;ebx=pointer 
;ecx=count in bits
db size_of_bitmap_in_bytes
db 001,000,000,000,000,000,000,000,000,000,000,000,000,100
db 000,000,001,000,000,001,000,001,000,000,000,000,000,001
db 800,700,600,100,500,500,500,004,000,000,000,000,000,000
db 000,000,000,000,000,000,000,000,000,000,000,000,000,000
db 000,000,000,000,000,000,000,000,000,000,000,000,000,000
db 255,255,255,255,255,000,000
@@:
    

an error: ;carry=1
then the maximum following 0 in the BIT map.
Post 09 Dec 2007, 19:23
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 09 Dec 2007, 19:59
wouldn't it be faster with byte lookup table?
Post 09 Dec 2007, 19:59
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4354
Location: Now
edfed 09 Dec 2007, 20:09
vid wrote:
wouldn't it be faster with byte lookup table?


Code:
mov ebx,@f
xlat ???
..
ret
@@:
byte_look_up_table:
db 0 ;00000000
db 1 ;00000001
db 1 ;00000010
db 2 ;00000011
db 1 ;00000100
db 2 ;00000101
db 2 ;00000110
db 3 ;00000111
...
db 7 ;11111101
db 7 ;11111110
db 8 ;11111111
    
Post 09 Dec 2007, 20:09
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 07 Jan 2008, 07:39
vid wrote:
wouldn't it be faster with byte lookup table?

sure, this is what you would use if you really need it done fast.

btw, the number of bits set is also called the Hamming weight.

here's another way to count the number of bits set:
Code:
  mov     edx,eax
     and     edx,01010101010101010101010101010101b
       and     eax,10101010101010101010101010101010b
       shr     eax,1
       add     eax,edx
     mov     edx,eax
     and     edx,00110011001100110011001100110011b
       and     eax,11001100110011001100110011001100b
       shr     eax,2
       add     eax,edx
     mov     edx,eax
     and     edx,00001111000011110000111100001111b
       and     eax,11110000111100001111000011110000b
       shr     eax,4
       add     eax,edx
     mov     edx,eax
     and     edx,00000000111111110000000011111111b
       and     eax,11111111000000001111111100000000b
       shr     eax,8
       add     eax,edx
     mov     edx,eax
     shr     eax,16
      add     eax,edx
     ;result in al
    

it's not the fastest way. I just wanted to drop this here for completeness

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 07 Jan 2008, 07:39
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 07 Jan 2008, 14:20
But it is somewhat more intuitive, perhaps revolution's code is an optimization of this one?
Post 07 Jan 2008, 14:20
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 08 Jan 2008, 09:52
@edfed:
You wanted the algorithm for reading bit streams as I understood.
I warmly suggest: http://www.jjj.de/fxt/ look no more Wink

Code:
;This example uses 8 bits
;Bits are SET on 1>0 transition.
;Reading order is from lowest bit set to highest.
;IN: EAX=...10010111
mov ebx,eax
mov ecx,eax
shr ecx,1
xor ebx,ecx
and eax,ebx
;OUT: EAX=...10010100
    


You can easily modify this code to suit your needs like in what direction the transition happens (not eax at the beginning to make 0>1 transition) or the counting direction of bits (shl makes it higer-to-lower reading order).

From then on, you can use bsf/bsr & BTR combination loop to check where exactly the set bit lies.[/b]
Post 08 Jan 2008, 09:52
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
lemask



Joined: 04 Oct 2010
Posts: 19
lemask 26 Nov 2010, 08:06
@revolution: your population count snippet reads very fine; every line is commented...
Post 26 Nov 2010, 08:06
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.