flat assembler
Message board for the users of flat assembler.
Index
> Main > What's the point of instructions like bsr, bsf, and popcnt? |
Author |
|
mattst88 08 Nov 2008, 21:15
What algorithms are they used in?
I know bsr can be used for rounding up or down to a power of two. What about the others? What purpose does counting the number of set bits serve? Something about parity? _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
08 Nov 2008, 21:15 |
|
DJ Mauretto 09 Nov 2008, 09:17
Ciao
Free your fantasy and immagination and you'll find the right use of this instruction _________________ Nil Volentibus Arduum |
|||
09 Nov 2008, 09:17 |
|
MazeGen 10 Nov 2008, 11:37
I use BSF to convert the least significant bit set to bit index:
Code: mov eax, 01000000b bsf eax, eax ; eax = 6 |
|||
10 Nov 2008, 11:37 |
|
bitRAKE 11 Nov 2008, 08:24
How about at start of binary GCD algorithm?
Code: bsf ecx,eax mov [temp],eax je .a0 shr eax,cl bsf ecx,edx je .d0 or [temp],edx shr edx,cl bsf ecx,[temp] ...(left as an exercise)... shl eax,cl ; GCD(EAX,EDX) retn |
|||
11 Nov 2008, 08:24 |
|
mattst88 11 Nov 2008, 14:13
MazeGen wrote: I use BSF to convert the least significant bit set to bit index: Right, but what purpose does that serve? Sorry if my questions aren't clear. revolution wrote: And if, for some reason, you don't find some instructions useful then you always have the option of not using them. I know how compelling it feels to use ALL of the CPU without wasting bits of it, but if you can find the inner strength then you will find it is possible to write code that does not use EVERY instruction that the CPU is capable of. Of course. I'm just wanting to know in what algorithms are they used? For instance, if you were going to document these, and as an example, state an algorithm which uses the said instruction, what algorithm would it be? _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
11 Nov 2008, 14:13 |
|
revolution 11 Nov 2008, 14:20
I you are looking for examples, I have used popcnt (or at least a function call since only the very latest CPUs support popcnt) to count the number of available cores returned from GetProcessAffinityMask. And related to the same code I have used BSR/BSF to find the next available core in the mask. I have also used BSR in my BigNum library to find the highest set bit and compute the next power of 2 quickly.
|
|||
11 Nov 2008, 14:20 |
|
MazeGen 11 Nov 2008, 16:53
mattst88 wrote:
You mean real use? It depends. For example, I store hardware register codes (0-15) in my disassembler as a bit field for some internal reasons. So register rDX is stored as 0000000000000100b. Sometimes, not very often, I need to convert it to a number. And here goes BSF. |
|||
11 Nov 2008, 16:53 |
|
bitRAKE 12 Nov 2008, 16:16
Here is an interesting article regarding bit-twiddling:
http://realtimecollisiondetection.net/blog/?p=78 My solution to his NextKBitNumber() function includes BSF. |
|||
12 Nov 2008, 16:16 |
|
LocoDelAssembly 12 Nov 2008, 17:46
From ftp://download.intel.com/technology/architecture/new-instructions-paper.pdf
Quote: Our second application-targeted extension provides a single instruction, [edit] Also see this: http://forums.amd.com/devblog/blogpost.cfm?threadid=88335&catid=271 I couldn't found any references about opcode compatibility between AMD's ABM POPCNT and Intel's SSE4 POPCNT. |
|||
12 Nov 2008, 17:46 |
|
baldr 21 Nov 2008, 01:39
mattst88,
This is from another thread, "Finding string length with the help of mmx instructions.": Code: strlen: ; expects: ; esi == address of ASCIIZ ; ; modifies: ; mm1 ; ; returns: ; ecx == length of ASCIIZ ; mm0 == 0 push esi pxor mm0, mm0 .compare_64: movq mm1, qword [esi] add esi, 8 pcmpeqb mm1, mm0 ; mm1.byte[i] == -1 if byte [esi+i] == 0, 0 otherwise pmovmskb ecx, mm1 ; cl.bit[i] == mm1.byte[i].bit[7] bsf ecx, ecx ; ecx == index of least significant set bit, ZF == 1 if none jz .compare_64 ; ZF == 1 if no zero bytes in qword [esi] lea ecx, [esi-8+ecx] ; ecx points to zero byte pop esi sub ecx, esi ret |
|||
21 Nov 2008, 01:39 |
|
peter 03 Dec 2008, 01:29
Quote:
Both AMD's manual and Intel's SSE4 manual say it uses opcode F3 0F B8 /r. |
|||
03 Dec 2008, 01:29 |
|
mattst88 12 Apr 2009, 01:56
I found a good use of instructions like popcnt: checksums.
http://blogs.gnome.org/markmc/2008/05/28/checksums-scatter-gather-io-and-segmentation-offload/ |
|||
12 Apr 2009, 01:56 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.