flat assembler
Message board for the users of flat assembler.
Index
> Main > Tricky assembler task |
Author |
|
PopeInnocent 03 May 2005, 23:30
Looks like you destroy RAX and RDX, so you'd need one more bswap at the end of the code sequence.
What about: bsr rcx,rax bsr rbx,rdx cmp rbx,rcx cmovb rdx,rax cmovb rdi,rsi (Untested, but ought to work IIRC, bsr finds the least significant set bit. This should be the same as your sample code as long as every byte is either 1 or 0. But just looking at it, I can't say whether my code is any faster. |
|||
03 May 2005, 23:30 |
|
Super64 04 May 2005, 06:14
PopeInnocent wrote: Looks like you destroy RAX and RDX, so you'd need one more bswap at the end of the code sequence. NOWAY! BSR(BSF) is a vectorpath instruction. Normally AMD64 processor can dispatch and execute 3(!) simple instruction (directpath) at each clock cycle. The vectorpath instructions block decoding unit for 1 clock cycle and the inner sequencer puts a lot of instructions corresponding to that particular vectorpath (complex) instruction. IN CASE OF BSR/BSF it will take 9-12 clock cycles just to complete that complex instruction. In 9 clock cycles you can complete up to 27(!) simple directpath instructions. Actually I slightly modified that task and found blazingly fast solution. I set up a check for at least 3 right zeroes. No need to bother with counting right zeroes if we have not enough of them... XORQ (RSI),RAX JZ VERYRARECASEJUMP TESTL $0xFFFFFF,EAX JZ RARECASEJUMP so i've effectively skipped bad case behaviour. If there are at least 3 zeroes it will jump to counting handler. But if you have 50% chance JUMP/NOT JUMP it won't do fast for sure. branch prediction won't work at random data. Thank YOU, byebye! |
|||
04 May 2005, 06:14 |
|
UCM 07 May 2005, 00:04
whoa that was rude.
_________________ This calls for... Ultra CRUNCHY Man! Ta da!! *crunch* |
|||
07 May 2005, 00:04 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.