flat assembler
Message board for the users of flat assembler.

Index > Main > What's the point of instructions like bsr, bsf, and popcnt?

Author
Thread Post new topic Reply to topic
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
What algorithms are they used in?

I know bsr can be used for rounding up or down to a power of two. What about the others? What purpose does counting the number of set bits serve? Something about parity?

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 08 Nov 2008, 21:15
View user's profile Send private message Visit poster's website Reply with quote
DJ Mauretto



Joined: 14 Mar 2007
Posts: 464
Location: Rome,Italy
DJ Mauretto
Ciao Smile
Free your fantasy and immagination and you'll find
the right use of this instruction Wink

_________________
Nil Volentibus Arduum Razz
Post 09 Nov 2008, 09:17
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
I use BSF to convert the least significant bit set to bit index:

Code:
mov eax, 01000000b
bsf eax, eax
; eax = 6
    
Post 10 Nov 2008, 11:37
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2913
Location: [RSP+8*5]
bitRAKE
How about at start of binary GCD algorithm?
Code:
bsf ecx,eax
mov [temp],eax
je .a0
shr eax,cl
bsf ecx,edx
je .d0
or [temp],edx
shr edx,cl
bsf ecx,[temp]

...(left as an exercise)...

shl eax,cl ; GCD(EAX,EDX)
retn    
...they are certainly useful instructions.
Post 11 Nov 2008, 08:24
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
And if, for some reason, you don't find some instructions useful then you always have the option of not using them. I know how compelling it feels to use ALL of the CPU without wasting bits of it, but if you can find the inner strength then you will find it is possible to write code that does not use EVERY instruction that the CPU is capable of. Wink
Post 11 Nov 2008, 08:39
View user's profile Send private message Visit poster's website Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
MazeGen wrote:
I use BSF to convert the least significant bit set to bit index:

Code:
mov eax, 01000000b
bsf eax, eax
; eax = 6
    


Right, but what purpose does that serve? Sorry if my questions aren't clear.

revolution wrote:
And if, for some reason, you don't find some instructions useful then you always have the option of not using them. I know how compelling it feels to use ALL of the CPU without wasting bits of it, but if you can find the inner strength then you will find it is possible to write code that does not use EVERY instruction that the CPU is capable of. Wink


Of course. I'm just wanting to know in what algorithms are they used?

For instance, if you were going to document these, and as an example, state an algorithm which uses the said instruction, what algorithm would it be?

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 11 Nov 2008, 14:13
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
I you are looking for examples, I have used popcnt (or at least a function call since only the very latest CPUs support popcnt) to count the number of available cores returned from GetProcessAffinityMask. And related to the same code I have used BSR/BSF to find the next available core in the mask. I have also used BSR in my BigNum library to find the highest set bit and compute the next power of 2 quickly.
Post 11 Nov 2008, 14:20
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
mattst88 wrote:
MazeGen wrote:
I use BSF to convert the least significant bit set to bit index:

Code:
mov eax, 01000000b
bsf eax, eax
; eax = 6
    


Right, but what purpose does that serve? Sorry if my questions aren't clear.


You mean real use? It depends. For example, I store hardware register codes (0-15) in my disassembler as a bit field for some internal reasons. So register rDX is stored as 0000000000000100b. Sometimes, not very often, I need to convert it to a number. And here goes BSF.
Post 11 Nov 2008, 16:53
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2913
Location: [RSP+8*5]
bitRAKE
Here is an interesting article regarding bit-twiddling:
http://realtimecollisiondetection.net/blog/?p=78

My solution to his NextKBitNumber() function includes BSF.
Post 12 Nov 2008, 16:16
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
From ftp://download.intel.com/technology/architecture/new-instructions-paper.pdf
Quote:
Our second application-targeted extension provides a single instruction,
POPCNT, that can be effectively used to accelerate searches
involving large data sets. It works by counting the number of set bits
in a data object. Applications that could benefit from this instruction
include those involving genome mining, handwriting recognition, digital
health workloads, and fast hamming distance/population count.


[edit] Also see this: http://forums.amd.com/devblog/blogpost.cfm?threadid=88335&catid=271

I couldn't found any references about opcode compatibility between AMD's ABM POPCNT and Intel's SSE4 POPCNT.
Post 12 Nov 2008, 17:46
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
mattst88,

This is from another thread, "Finding string length with the help of mmx instructions.":
Code:
strlen:
; expects:
;   esi == address of ASCIIZ
;
; modifies:
;   mm1
;
; returns:
;   ecx == length of ASCIIZ
;   mm0 == 0

        push    esi
        pxor    mm0, mm0

.compare_64:
        movq    mm1, qword [esi]
        add     esi, 8
        pcmpeqb mm1, mm0                ; mm1.byte[i] == -1 if byte [esi+i] == 0, 0 otherwise
        pmovmskb ecx, mm1               ; cl.bit[i] == mm1.byte[i].bit[7]
        bsf     ecx, ecx                ; ecx == index of least significant set bit, ZF == 1 if none
        jz      .compare_64             ; ZF == 1 if no zero bytes in qword [esi]
        lea     ecx, [esi-8+ecx]        ; ecx points to zero byte
        pop     esi
        sub     ecx, esi
        ret    
I know that pmovmskb is SSE, not MMX… But bsf proves useful.
Post 21 Nov 2008, 01:39
View user's profile Send private message Reply with quote
peter



Joined: 09 May 2006
Posts: 63
peter
Quote:

I couldn't found any references about opcode compatibility between AMD's ABM POPCNT and Intel's SSE4 POPCNT.

Both AMD's manual and Intel's SSE4 manual say it uses opcode F3 0F B8 /r.
Post 03 Dec 2008, 01:29
View user's profile Send private message Visit poster's website Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
Post 12 Apr 2009, 01:56
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.