flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
sylware 30 Dec 2022, 22:30
I was looking at the AMD zen op latencies from Agner doc, and it seems it is better in some cases to favor and/or/xor logical ops for single bit manipulation instead of the classic btc/bts/btc bit manipulation ops.
Did I miss something? (again...) Edit: https://godbolt.org/noscript seems to confirm it: up to dword, and/or/xor, qword is btc/bts/btr. That 64 bits isa... |
|||
![]() |
|
sylware 31 Dec 2022, 11:49
I know that, but optimization tricks "usually" end up in compilers because they were already tested on real life hardware and load profiles.
I am not decided yet to move my single bit (below 32bits) manipulation machine instruction to xor/or/and. Anybody has some more insight about this to help me decide? |
|||
![]() |
|
Ali.Z 31 Dec 2022, 16:43
most of the time dont look for latency, as you cannot measure most instructions with it.
there are many reasons specified by intel themselves, but lets count few: - latency can change from one design/architecture to another - some of the cpu's internal work have more priority than executing user instructions - is the instruction targeted by OOOE? (out of order execution) - how many dispatch and execution ports/gates are used to load/store stuff? (example, once upon a time intel was able to load one xmm register per cycle, but later they were able to load two different xmm registers per cycle e.g. xmm0, xmm1) and many others.. go for instruction throughput instead. _________________ Asm For Wise Humans |
|||
![]() |
|
DimonSoft 31 Dec 2022, 17:49
First thing to think about when choosing an instruction is the task itself. What suits the task better is what matters. Say, if your further code may be more optimal when some of the flags have particular values and the bit operation with and, or or xor gives exactly those values, I’d prefer them instead of bt* that leave 4 status flags undefined.
|
|||
![]() |
|
bitRAKE 31 Dec 2022, 20:32
My usage is similar to DimonSoft's - if I need the carry flag then BT* is the easy choice; or if the bit index is significant in the algorithm then BT* can eliminate several other instructions and dependency chain. Otherwise, the logical instructions seem more performant - not the mention all the multi-bit tricks that are possible!
|
|||
![]() |
|
Furs 31 Dec 2022, 22:32
sylware wrote: I was looking at the AMD zen op latencies from Agner doc, and it seems it is better in some cases to favor and/or/xor logical ops for single bit manipulation instead of the classic btc/bts/btc bit manipulation ops. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.