flat assembler
Message board for the users of flat assembler.

Index > Main > bswap r16

Author
Thread Post new topic Reply to topic
Ali.Z



Joined: 08 Jan 2018
Posts: 839
Ali.Z 18 Dec 2025, 14:37
Code:
use32
db 66h
bswap eax    


apparently valid instruction, it clears r16. (tested under a debugger)
(does not touch eflags either, didnt test in 64bit mode tho)

some research:
https://gynvael.coldwind.pl/?id=268

but it used to do something different on 486 or specific versions of it:
https://www.df.lth.se/~john_e/gems/gem000c.html

_________________
Asm For Wise Humans
Post 18 Dec 2025, 14:37
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1202
Location: Russia
macomics 18 Dec 2025, 14:47
By itself, the bswap instruction for r16 is similar to xchg rl, rh. It was introduced so that it was possible to operate with the highest word in 32-bit registers.
Post 18 Dec 2025, 14:47
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 839
Ali.Z 18 Dec 2025, 15:02
yes I know and understand bswap r16 is useless, but intel says it is undefined, while the behavior is stable for more than two decades and some software rely on this.

but I believe they had to do something about operand size prefix, if I were to think from engineering perspective, then it is probably best option to allow operand size prefix and do almost nothing instead of generating #UD. which is likely the decision intel made here.

_________________
Asm For Wise Humans
Post 18 Dec 2025, 15:02
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1202
Location: Russia
macomics 18 Dec 2025, 15:50
Make sure that 0 is not being pushed into eax by the standard #UD handler.
Post 18 Dec 2025, 15:50
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20797
Location: In your JS exploiting you and your system
revolution 18 Dec 2025, 21:08
The code produces the same result in 32-bit and 64-bit modes, it just zeros the lowest 16-bits of r32/r64.

An advantage to using it is that it is 3 bytes to encode, instead of 4 bytes for mov r16,0.

A disadvantage to using it is that it is undocumented, and future CPUs, or CPUs made by other manufacturers, might do something else.
Post 18 Dec 2025, 21:08
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 839
Ali.Z 19 Dec 2025, 01:26
I doubt intel will change this, but yes valid point 3rd party x86 vendors and/or clones may have different operation.

been querying some intel manuals, specifically in topics related to instruction prefetch and predecode steps of modern architectures, length changing prefix such as operand size prefix can slow down decoding of fetched instructions (*), from typical 1 cycle to 6 cycles, except REX length changing prefix dont have this penalty.

(*) not always, it seems intel only discourage using operand size prefix with word immediate. so bswap r16 isnt affected, but that does not change the fact one should not use bswap r16.

_________________
Asm For Wise Humans
Post 19 Dec 2025, 01:26
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.