flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
PopeInnocent 03 May 2005, 23:30
Looks like you destroy RAX and RDX, so you'd need one more bswap at the end of the code sequence.
What about: bsr rcx,rax bsr rbx,rdx cmp rbx,rcx cmovb rdx,rax cmovb rdi,rsi (Untested, but ought to work ![]() IIRC, bsr finds the least significant set bit. This should be the same as your sample code as long as every byte is either 1 or 0. But just looking at it, I can't say whether my code is any faster. |
|||
![]() |
|
Super64 04 May 2005, 06:14
PopeInnocent wrote: Looks like you destroy RAX and RDX, so you'd need one more bswap at the end of the code sequence. NOWAY! BSR(BSF) is a vectorpath instruction. Normally AMD64 processor can dispatch and execute 3(!) simple instruction (directpath) at each clock cycle. The vectorpath instructions block decoding unit for 1 clock cycle and the inner sequencer puts a lot of instructions corresponding to that particular vectorpath (complex) instruction. IN CASE OF BSR/BSF it will take 9-12 clock cycles just to complete that complex instruction. In 9 clock cycles you can complete up to 27(!) simple directpath instructions. Actually I slightly modified that task and found blazingly fast solution. I set up a check for at least 3 right zeroes. No need to bother with counting right zeroes if we have not enough of them... XORQ (RSI),RAX JZ VERYRARECASEJUMP TESTL $0xFFFFFF,EAX JZ RARECASEJUMP so i've effectively skipped bad case behaviour. If there are at least 3 zeroes it will jump to counting handler. But if you have 50% chance JUMP/NOT JUMP it won't do fast for sure. branch prediction won't work at random data. Thank YOU, byebye! |
|||
![]() |
|
UCM 07 May 2005, 00:04
whoa that was rude.
![]() _________________ This calls for... Ultra CRUNCHY Man! Ta da!! *crunch* |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.