flat assembler
Message board for the users of flat assembler.

Index > Main > btc/bts/btc VS and/or/xor

Author
Thread Post new topic Reply to topic
sylware



Joined: 23 Oct 2020
Posts: 477
Location: Marseille/France
sylware 30 Dec 2022, 22:30
I was looking at the AMD zen op latencies from Agner doc, and it seems it is better in some cases to favor and/or/xor logical ops for single bit manipulation instead of the classic btc/bts/btc bit manipulation ops.

Did I miss something? (again...)

Edit: https://godbolt.org/noscript seems to confirm it: up to dword, and/or/xor, qword is btc/bts/btr. That 64 bits isa...
Post 30 Dec 2022, 22:30
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 31 Dec 2022, 08:46
Latencies tend to change from model to model, don’t they? Besides, I remember measuring FPU instruction vs the same FPU instruction in a separate procedure, and Intel’s and AMD’d winners were different. Pipelining makes summing up latencies a useless task. We can only talk about possible impact of an instruction or a set of instructions on the whole spherical hardware optimization chain in vacuum.
Post 31 Dec 2022, 08:46
View user's profile Send private message Visit poster's website Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 477
Location: Marseille/France
sylware 31 Dec 2022, 11:49
I know that, but optimization tricks "usually" end up in compilers because they were already tested on real life hardware and load profiles.

I am not decided yet to move my single bit (below 32bits) manipulation machine instruction to xor/or/and.

Anybody has some more insight about this to help me decide?
Post 31 Dec 2022, 11:49
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 772
Ali.Z 31 Dec 2022, 16:43
most of the time dont look for latency, as you cannot measure most instructions with it.

there are many reasons specified by intel themselves, but lets count few:
- latency can change from one design/architecture to another
- some of the cpu's internal work have more priority than executing user instructions
- is the instruction targeted by OOOE? (out of order execution)
- how many dispatch and execution ports/gates are used to load/store stuff? (example, once upon a time intel was able to load one xmm register per cycle, but later they were able to load two different xmm registers per cycle e.g. xmm0, xmm1)

and many others.. go for instruction throughput instead.

_________________
Asm For Wise Humans
Post 31 Dec 2022, 16:43
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 31 Dec 2022, 17:49
First thing to think about when choosing an instruction is the task itself. What suits the task better is what matters. Say, if your further code may be more optimal when some of the flags have particular values and the bit operation with and, or or xor gives exactly those values, I’d prefer them instead of bt* that leave 4 status flags undefined.
Post 31 Dec 2022, 17:49
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4162
Location: vpcmpistri
bitRAKE 31 Dec 2022, 20:32
My usage is similar to DimonSoft's - if I need the carry flag then BT* is the easy choice; or if the bit index is significant in the algorithm then BT* can eliminate several other instructions and dependency chain. Otherwise, the logical instructions seem more performant - not the mention all the multi-bit tricks that are possible!

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 31 Dec 2022, 20:32
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2596
Furs 31 Dec 2022, 22:32
sylware wrote:
I was looking at the AMD zen op latencies from Agner doc, and it seems it is better in some cases to favor and/or/xor logical ops for single bit manipulation instead of the classic btc/bts/btc bit manipulation ops.

Did I miss something? (again...)

Edit: https://godbolt.org/noscript seems to confirm it: up to dword, and/or/xor, qword is btc/bts/btr. That 64 bits isa...
Keep in mind that btc/bts/btr can also be used atomically with lock prefix if you need to update one bit and retrieve the old value at the same time.
Post 31 Dec 2022, 22:32
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.