Support MOVBE

Index > Compiler Internals > Support MOVBE

Goto page 1, 2 Next

Author

Thread

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 01 Jul 2009, 07:27

Just found this new instruction in Intel manual:

MOVBE - Move Data After Swapping Bytes

Code:

0F 38 F0 /r MOVBE r16/32/64, m16/32/64
0F 38 F1 /r MOVBE m16/32/64, r16/32/64

fasm 1.69.00 doesn't know it.

01 Jul 2009, 07:27

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20873
Location: In your JS exploiting you and your system

revolution 01 Jul 2009, 08:01

A pity there is no support for MOVBE r16/32/64, r16/32/64 Sad

01 Jul 2009, 08:01

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 01 Jul 2009, 08:33

Yeah, it's weird that they don't implemented it. However, it is still useful because BSWAP can work with register only.

01 Jul 2009, 08:33

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8504
Location: Kraków, Poland

Tomasz Grysztar 01 Jul 2009, 08:50

Yes, looks like a nice instruction. How is it classified - is it SSE4 related?

01 Jul 2009, 08:50

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 01 Jul 2009, 09:12

It's classified among Miscellaneous Instructions (together with LEA, NOP, ...) in Basic Architecture manual. Indicated by CPUID.01H:ECX.MOVBE[bit 22].

01 Jul 2009, 09:12

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8504
Location: Kraków, Poland

Tomasz Grysztar 01 Jul 2009, 09:35

Hmm, perhaps that's why I missed it. Its introduction is not that much different from the one of POPCNT instruction (which is also indicated by its own bit, CPUID.01H:ECX.POPCNT[bit 23]), but while POPCNT is listed as SSE4 instruction, MOVBE is not. Quite a mess.

01 Jul 2009, 09:35

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 02 Jul 2009, 09:34

Code:

movbe ax, dx

Quote:

flat assembler version 1.69.01 (1199842 kilobytes memory)
1 passes, 4 bytes.

(both register operands are not allowed)

02 Jul 2009, 09:34

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8504
Location: Kraków, Poland

Tomasz Grysztar 02 Jul 2009, 09:58

Can someone test it? Maybe its undocumented, but works? Wink

02 Jul 2009, 09:58

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 02 Jul 2009, 10:04

Seems like the first steppings of Core 2 don't support this instruction so it will be hard to find someone able to test it Smile

02 Jul 2009, 10:04

pal

Joined: 26 Aug 2008
Posts: 227

pal 02 Jul 2009, 18:53

Quote:

Maybe its undocumented, but works?

Maybe I have you misunderstood but it is in the Instruction Set Reference A-M. (Vol. 2A 3-657), so it is documented.

02 Jul 2009, 18:53

LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4623
Location: Argentina

LocoDelAssembly 02 Jul 2009, 19:08

Quote:

Maybe I have you misunderstood but it is in the Instruction Set Reference A-M. (Vol. 2A 3-657), so it is documented.

Yes you did Smile

He meant that perhaps "movbe reg, reg" is supported despite the operand combination is undocumented.

I've tried to test on my brother's computer but seems to be a too old Core2 because "movbe [var], edx" crashed the application.

02 Jul 2009, 19:08

r22

Joined: 27 Dec 2004
Posts: 805

r22 02 Jul 2009, 19:09

With PSHUFB already added to SSE[?], MOVBE seems pointless, unless it is optimized (unlike the string instructions) to be faster than MOV/BSWAP.

Seems redundant.

02 Jul 2009, 19:09

LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4623
Location: Argentina

LocoDelAssembly 02 Jul 2009, 19:25

r22, but PSHUFB needs either the FPU state or SSE state restored while MOVBE can work with GRPs.

I don't know the real intent for this instruction but perhaps it is to make networking drivers faster by storing in network order (big-endian) and reading with a direct conversion to native format (litte-endian) in a single step?

02 Jul 2009, 19:25

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8504
Location: Kraków, Poland

Tomasz Grysztar 02 Jul 2009, 19:52

It seems like a very elegant instruction to generally operate on big-endian fields in data structures (including the network addresses, but also perhaps ASN.1/BER structures, smart card APDUs, etc.).

02 Jul 2009, 19:52

pal

Joined: 26 Aug 2008
Posts: 227

pal 02 Jul 2009, 20:42

Quote:

He meant that perhaps "movbe reg, reg" is supported despite the operand combination is undocumented.

I thought I must have to be honest Razz

That would have been a too easy one.

02 Jul 2009, 20:42

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 05 Jul 2009, 12:15

LOAD+BSWAP will always be faster because there are many ways to throw it to the uop-schedulers. MOVBE is hindered with constant uops and therefore will penalize performance. Moreover this instruction seems not to replace BSWAP, but add [mem] capability to it.

If it is this silent and you have to test a separate bit for it (i.e. SSE4 doesn't guarantee this bit being set) then this is the perfect formula for failure.

Summary:
-macroop instuction (at least 2 uops)
-already has an elegant replacement as MOV reg,mem+BSWAP reg
-is not guaranteed on newer CPUs
-is not made popular with advertising

Sad

I don't get this instruction ... it seems to be some instruction from the times of LOOP and AAA and such...

05 Jul 2009, 12:15

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8504
Location: Kraków, Poland

Tomasz Grysztar 05 Jul 2009, 15:12

Madis731 wrote:

-already has an elegant replacement as MOV reg,mem+BSWAP reg

But in the opposite direction you would have to do BSWAP reg + MOV mem,reg + BSWAP reg

05 Jul 2009, 15:12

bitRAKE

Joined: 21 Jul 2003
Posts: 4372
Location: vpcmpistri

bitRAKE 05 Jul 2009, 19:00

...or MOV rtemp,reg + BSWAP rtemp + MOV mem,rtemp

(me thinks intel compiler fails to produce optimal code with BSWAP?)

05 Jul 2009, 19:00

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 06 Jul 2009, 08:13

Tomasz and bitRAKE - you might be on to something Very Happy

- I tend to agree.

06 Jul 2009, 08:13

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20873
Location: In your JS exploiting you and your system

revolution 07 Jul 2009, 16:41

I think MOVBE would be most useful in text sorting. Network address computations are hardly a case for optimising your code, but sorting data can take a considerable time in some cases and a simple optimisation with MOVBE will help a lot.

07 Jul 2009, 16:41

Goto page 1, 2 Next

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum