bswap r16

Index > Main > bswap r16

Author

Thread

Ali.Z

Joined: 08 Jan 2018
Posts: 846

Ali.Z 18 Dec 2025, 14:37

Code:

use32
db 66h
bswap eax

apparently valid instruction, it clears r16. (tested under a debugger)
(does not touch eflags either, didnt test in 64bit mode tho)

some research:
https://gynvael.coldwind.pl/?id=268

but it used to do something different on 486 or specific versions of it:
https://www.df.lth.se/~john_e/gems/gem000c.html

_________________
Asm For Wise Humans

18 Dec 2025, 14:37

macomics

Joined: 26 Jan 2021
Posts: 1208
Location: Russia

macomics 18 Dec 2025, 14:47

By itself, the bswap instruction for r16 is similar to xchg rl, rh. It was introduced so that it was possible to operate with the highest word in 32-bit registers.

18 Dec 2025, 14:47

Ali.Z

Joined: 08 Jan 2018
Posts: 846

Ali.Z 18 Dec 2025, 15:02

yes I know and understand bswap r16 is useless, but intel says it is undefined, while the behavior is stable for more than two decades and some software rely on this.

but I believe they had to do something about operand size prefix, if I were to think from engineering perspective, then it is probably best option to allow operand size prefix and do almost nothing instead of generating #UD. which is likely the decision intel made here.

_________________
Asm For Wise Humans

18 Dec 2025, 15:02

macomics

Joined: 26 Jan 2021
Posts: 1208
Location: Russia

macomics 18 Dec 2025, 15:50

Make sure that 0 is not being pushed into eax by the standard #UD handler.

18 Dec 2025, 15:50

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20870
Location: In your JS exploiting you and your system

revolution 18 Dec 2025, 21:08

The code produces the same result in 32-bit and 64-bit modes, it just zeros the lowest 16-bits of r32/r64.

An advantage to using it is that it is 3 bytes to encode, instead of 4 bytes for mov r16,0.

A disadvantage to using it is that it is undocumented, and future CPUs, or CPUs made by other manufacturers, might do something else.

18 Dec 2025, 21:08

Ali.Z

Joined: 08 Jan 2018
Posts: 846

Ali.Z 19 Dec 2025, 01:26

I doubt intel will change this, but yes valid point 3rd party x86 vendors and/or clones may have different operation.

been querying some intel manuals, specifically in topics related to instruction prefetch and predecode steps of modern architectures, length changing prefix such as operand size prefix can slow down decoding of fetched instructions (*), from typical 1 cycle to 6 cycles, except REX length changing prefix dont have this penalty.

(*) not always, it seems intel only discourage using operand size prefix with word immediate. so bswap r16 isnt affected, but that does not change the fact one should not use bswap r16.

_________________
Asm For Wise Humans

19 Dec 2025, 01:26

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum