flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
LocoDelAssembly 08 Dec 2007, 20:04
There is one, popcnt, but will be available on SSE4 (not publicly available chips exist yet). There are some bit tricks that I can't remember know (and I also lost the link
![]() Code: ; EAX = number that its bits will be counted xor ecx, ecx mov cl, 32 @@: rol eax, 1 adc ch, 0 dec cl jnz @b ; At this point CH has the number of ones |
|||
![]() |
|
sloppy 08 Dec 2007, 20:18
Thanks,
I will search more. as I am still learning... how can I usually know which of two method is faster? Is a *big* loop with "time" command the only way? |
|||
![]() |
|
bitRAKE 08 Dec 2007, 21:08
http://graphics.stanford.edu/~seander/bithacks.html
I've converted all those algorithms to x86. AMD optimization manual has a fast implementation for ALU 32-bit population count. It can be extended easily to MMX/SSE - a cycle per byte is possible, iirc. Code: xor ecx,ecx .1: lea edx,[eax-1] inc ecx and eax,edx jne .1 |
|||
![]() |
|
LocoDelAssembly 08 Dec 2007, 21:59
Hey that was the link
![]() AMD has this implementation in the manual that bitRAKE says: Code: unsigned int popcount(unsigned int v) { unsigned int retVal; __asm { mov eax, [v] ; v mov edx, eax ; v shr eax, 1 ; v >> 1 and eax, 055555555h ; (v >> 1) & 0x55555555 sub edx, eax ; w = v - ((v >> 1) & 0x55555555) mov eax, edx ; w shr edx, 2 ; w >> 2 and eax, 033333333h ; w & 0x33333333 and edx, 033333333h ; (w >> 2) & 0x33333333 add eax, edx ; x = (w & 0x33333333) + ((w >> 2) & ; 0x33333333) mov edx, eax ; x shr eax, 4 ; x >> 4 add eax, edx ; x + (x >> 4) and eax, 00F0F0F0Fh ; y = (x + (x >> 4) & 0x0F0F0F0F) imul eax, 001010101h ; y * 0x01010101 shr eax, 24 ; population count = (y * ; 0x01010101) >> 24 mov retVal, eax ; Store result. } return(retVal); } Unless I'm missing something, it is just the ASM port of the C code at here (the last one). |
|||
![]() |
|
revolution 08 Dec 2007, 23:04
Here's a pop count I used in the past:
Code: population_count_eax: ;eax=value mov edx,eax ;v shr eax,1 ;v>>1 and eax,055555555h ;(v>>1)&0x55555555 sub edx,eax ;w=v-((v>>1)&0x55555555) mov eax,edx ;w shr edx,2 ;w>>2 and eax,033333333h ;w&0x33333333 and edx,033333333h ;(w>>2)&0x33333333 add eax,edx ;x=(w&0x33333333)+((w>>2)&0x33333333) mov edx,eax ;x shr eax,4 ;x>>4 add eax,edx ;x+(x>>4) and eax,00f0f0f0fh ;y=(x+(x>>4)&0x0f0f0f0f) imul eax,001010101h ;y*0x01010101 shr eax,24 ;population count=(y*0x01010101)>>24 ret align 8 c5555 dq 05555555555555555h c3333 dq 03333333333333333h c0f0f dq 00f0f0f0f0f0f0f0fh population_count_mm0: movq mm1,mm0 ;v psrld mm0,1 ;v >> 1 pand mm0,[c5555] ;(v >> 1) & 0x55555555 psubd mm1,mm0 ;w = v - ((v >> 1) & 0x55555555) movq mm0,mm1 ;w psrld mm1,2 ;w >> 2 pand mm0,[c3333] ;w & 0x33333333 pand mm1,[c3333] ;(w >> 2) & 0x33333333 paddd mm0,mm1 ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333) movq mm1,mm0 ;x psrld mm0,4 ;x >> 4 paddd mm0,mm1 ;x + (x >> 4) pand mm0,[c0f0f] ;y = (x + (x >> 4) & 0x0f0f0f0f) pxor mm1,mm1 ;0 psadbw mm0,mm1 ;sum across all 8 bytes movd eax,mm0 ;result in EAX per calling ret macro population_count_mac result,source { local x x=source x=x-((x shr 1) and 0x5555555555555555) x=((x shr 2) and 0x3333333333333333)+(x and 0x3333333333333333) x=(x+(x shr 4)) and 0x0f0f0f0f0f0f0f0f x=x+(x shr 8) x=x+(x shr 16) result=(x+(x shr 32)) and 0x7f } |
|||
![]() |
|
sloppy 09 Dec 2007, 12:35
Ok ,thank you everybody, I will make some test (and try to understand why those method work
![]() |
|||
![]() |
|
Borsuc 09 Dec 2007, 12:50
revolution's code is by far the fastest since it works on bits in parallel and doesn't require any loops
![]() |
|||
![]() |
|
edfed 09 Dec 2007, 15:35
adopted the bit magic bitcount.
but for chaining bits set to 0, need an other algorythm. something with bit test and loop. |
|||
![]() |
|
revolution 09 Dec 2007, 16:01
edfed: For counting zeros just invert your input and count ones.
Code: not eax call population_count_eax ret |
|||
![]() |
|
edfed 09 Dec 2007, 19:23
to count 0, just invert the instructions and the mask
make a function for sero, and a function for one, or a programmable function. but to count chaining 0, and return the first chain of N (ecx) 0's or 1's a pointer (ebx) in a BITmap @[seg:esi].and a size (ecx) Code: zero?: ;esi=bitmap ;ebx=pointer ;ecx=count in bits db size_of_bitmap_in_bytes db 001,000,000,000,000,000,000,000,000,000,000,000,000,100 db 000,000,001,000,000,001,000,001,000,000,000,000,000,001 db 800,700,600,100,500,500,500,004,000,000,000,000,000,000 db 000,000,000,000,000,000,000,000,000,000,000,000,000,000 db 000,000,000,000,000,000,000,000,000,000,000,000,000,000 db 255,255,255,255,255,000,000 @@: an error: ;carry=1 then the maximum following 0 in the BIT map. |
|||
![]() |
|
vid 09 Dec 2007, 19:59
wouldn't it be faster with byte lookup table?
|
|||
![]() |
|
edfed 09 Dec 2007, 20:09
vid wrote: wouldn't it be faster with byte lookup table? Code: mov ebx,@f xlat ??? .. ret @@: byte_look_up_table: db 0 ;00000000 db 1 ;00000001 db 1 ;00000010 db 2 ;00000011 db 1 ;00000100 db 2 ;00000101 db 2 ;00000110 db 3 ;00000111 ... db 7 ;11111101 db 7 ;11111110 db 8 ;11111111 |
|||
![]() |
|
MCD 07 Jan 2008, 07:39
vid wrote: wouldn't it be faster with byte lookup table? sure, this is what you would use if you really need it done fast. btw, the number of bits set is also called the Hamming weight. here's another way to count the number of bits set: Code: mov edx,eax and edx,01010101010101010101010101010101b and eax,10101010101010101010101010101010b shr eax,1 add eax,edx mov edx,eax and edx,00110011001100110011001100110011b and eax,11001100110011001100110011001100b shr eax,2 add eax,edx mov edx,eax and edx,00001111000011110000111100001111b and eax,11110000111100001111000011110000b shr eax,4 add eax,edx mov edx,eax and edx,00000000111111110000000011111111b and eax,11111111000000001111111100000000b shr eax,8 add eax,edx mov edx,eax shr eax,16 add eax,edx ;result in al it's not the fastest way. I just wanted to drop this here for completeness _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || |
|||
![]() |
|
LocoDelAssembly 07 Jan 2008, 14:20
But it is somewhat more intuitive, perhaps revolution's code is an optimization of this one?
|
|||
![]() |
|
Madis731 08 Jan 2008, 09:52
@edfed:
You wanted the algorithm for reading bit streams as I understood. I warmly suggest: http://www.jjj.de/fxt/ look no more ![]() Code: ;This example uses 8 bits ;Bits are SET on 1>0 transition. ;Reading order is from lowest bit set to highest. ;IN: EAX=...10010111 mov ebx,eax mov ecx,eax shr ecx,1 xor ebx,ecx and eax,ebx ;OUT: EAX=...10010100 You can easily modify this code to suit your needs like in what direction the transition happens (not eax at the beginning to make 0>1 transition) or the counting direction of bits (shl makes it higer-to-lower reading order). From then on, you can use bsf/bsr & BTR combination loop to check where exactly the set bit lies.[/b] |
|||
![]() |
|
lemask 26 Nov 2010, 08:06
@revolution: your population count snippet reads very fine; every line is commented...
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.