flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > Support MOVBE

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
Just found this new instruction in Intel manual:

MOVBE - Move Data After Swapping Bytes
Code:
0F 38 F0 /r MOVBE r16/32/64, m16/32/64
0F 38 F1 /r MOVBE m16/32/64, r16/32/64    

fasm 1.69.00 doesn't know it.
Post 01 Jul 2009, 07:27
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17332
Location: In your JS exploiting you and your system
revolution
A pity there is no support for MOVBE r16/32/64, r16/32/64 Sad
Post 01 Jul 2009, 08:01
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
Yeah, it's weird that they don't implemented it. However, it is still useful because BSWAP can work with register only.
Post 01 Jul 2009, 08:33
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
Yes, looks like a nice instruction. How is it classified - is it SSE4 related?
Post 01 Jul 2009, 08:50
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
It's classified among Miscellaneous Instructions (together with LEA, NOP, ...) in Basic Architecture manual. Indicated by CPUID.01H:ECX.MOVBE[bit 22].
Post 01 Jul 2009, 09:12
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
Hmm, perhaps that's why I missed it. Its introduction is not that much different from the one of POPCNT instruction (which is also indicated by its own bit, CPUID.01H:ECX.POPCNT[bit 23]), but while POPCNT is listed as SSE4 instruction, MOVBE is not. Quite a mess.
Post 01 Jul 2009, 09:35
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
Code:
movbe ax, dx    
Quote:
flat assembler version 1.69.01 (1199842 kilobytes memory)
1 passes, 4 bytes.

Razz

(both register operands are not allowed)
Post 02 Jul 2009, 09:34
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
Can someone test it? Maybe its undocumented, but works? Wink
Post 02 Jul 2009, 09:58
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
Seems like the first steppings of Core 2 don't support this instruction so it will be hard to find someone able to test it Smile
Post 02 Jul 2009, 10:04
View user's profile Send private message Visit poster's website Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal
Quote:
Maybe its undocumented, but works?


Maybe I have you misunderstood but it is in the Instruction Set Reference A-M. (Vol. 2A 3-657), so it is documented.
Post 02 Jul 2009, 18:53
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:

Maybe I have you misunderstood but it is in the Instruction Set Reference A-M. (Vol. 2A 3-657), so it is documented.

Yes you did Smile

He meant that perhaps "movbe reg, reg" is supported despite the operand combination is undocumented.

I've tried to test on my brother's computer but seems to be a too old Core2 because "movbe [var], edx" crashed the application.
Post 02 Jul 2009, 19:08
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
With PSHUFB already added to SSE[?], MOVBE seems pointless, unless it is optimized (unlike the string instructions) to be faster than MOV/BSWAP.

Seems redundant.
Post 02 Jul 2009, 19:09
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
r22, but PSHUFB needs either the FPU state or SSE state restored while MOVBE can work with GRPs.

I don't know the real intent for this instruction but perhaps it is to make networking drivers faster by storing in network order (big-endian) and reading with a direct conversion to native format (litte-endian) in a single step?
Post 02 Jul 2009, 19:25
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
It seems like a very elegant instruction to generally operate on big-endian fields in data structures (including the network addresses, but also perhaps ASN.1/BER structures, smart card APDUs, etc.).
Post 02 Jul 2009, 19:52
View user's profile Send private message Visit poster's website Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal
Quote:
He meant that perhaps "movbe reg, reg" is supported despite the operand combination is undocumented.


I thought I must have to be honest Razz That would have been a too easy one.
Post 02 Jul 2009, 20:42
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
LOAD+BSWAP will always be faster because there are many ways to throw it to the uop-schedulers. MOVBE is hindered with constant uops and therefore will penalize performance. Moreover this instruction seems not to replace BSWAP, but add [mem] capability to it.

If it is this silent and you have to test a separate bit for it (i.e. SSE4 doesn't guarantee this bit being set) then this is the perfect formula for failure.

Summary:
-macroop instuction (at least 2 uops)
-already has an elegant replacement as MOV reg,mem+BSWAP reg
-is not guaranteed on newer CPUs
-is not made popular with advertising

Sad

I don't get this instruction ... it seems to be some instruction from the times of LOOP and AAA and such...
Post 05 Jul 2009, 12:15
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
Madis731 wrote:
-already has an elegant replacement as MOV reg,mem+BSWAP reg

But in the opposite direction you would have to do BSWAP reg + MOV mem,reg + BSWAP reg
Post 05 Jul 2009, 15:12
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2914
Location: [RSP+8*5]
bitRAKE
...or MOV rtemp,reg + BSWAP rtemp + MOV mem,rtemp

(me thinks intel compiler fails to produce optimal code with BSWAP?)
Post 05 Jul 2009, 19:00
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Tomasz and bitRAKE - you might be on to something Very Happy - I tend to agree.
Post 06 Jul 2009, 08:13
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17332
Location: In your JS exploiting you and your system
revolution
I think MOVBE would be most useful in text sorting. Network address computations are hardly a case for optimising your code, but sorting data can take a considerable time in some cases and a simple optimisation with MOVBE will help a lot.
Post 07 Jul 2009, 16:41
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.