flat assembler
Message board for the users of flat assembler.
  
|  Index
      > Compiler Internals > Support MOVBE Goto page 1, 2 Next | 
| Author | 
 | 
| revolution 01 Jul 2009, 08:01 A pity there is no support for MOVBE r16/32/64, r16/32/64   | |||
|  01 Jul 2009, 08:01 | 
 | 
| MazeGen 01 Jul 2009, 08:33 Yeah, it's weird that they don't implemented it. However, it is still useful because BSWAP can work with register only. | |||
|  01 Jul 2009, 08:33 | 
 | 
| Tomasz Grysztar 01 Jul 2009, 08:50 Yes, looks like a nice instruction. How is it classified - is it SSE4 related? | |||
|  01 Jul 2009, 08:50 | 
 | 
| MazeGen 01 Jul 2009, 09:12 It's classified among Miscellaneous Instructions (together with LEA, NOP, ...) in Basic Architecture manual. Indicated by CPUID.01H:ECX.MOVBE[bit 22]. | |||
|  01 Jul 2009, 09:12 | 
 | 
| Tomasz Grysztar 01 Jul 2009, 09:35 Hmm, perhaps that's why I missed it. Its introduction is not that much different from the one of POPCNT instruction (which is also indicated by its own bit, CPUID.01H:ECX.POPCNT[bit 23]), but while POPCNT is listed as SSE4 instruction, MOVBE is not. Quite a mess. | |||
|  01 Jul 2009, 09:35 | 
 | 
| MazeGen 02 Jul 2009, 09:34 Code: movbe ax, dx    Quote: flat assembler version 1.69.01 (1199842 kilobytes memory)   (both register operands are not allowed) | |||
|  02 Jul 2009, 09:34 | 
 | 
| Tomasz Grysztar 02 Jul 2009, 09:58 Can someone test it? Maybe its undocumented, but works?   | |||
|  02 Jul 2009, 09:58 | 
 | 
| MazeGen 02 Jul 2009, 10:04 Seems like the first steppings of Core 2 don't support this instruction so it will be hard to find someone able to test it   | |||
|  02 Jul 2009, 10:04 | 
 | 
| pal 02 Jul 2009, 18:53 Quote: Maybe its undocumented, but works? Maybe I have you misunderstood but it is in the Instruction Set Reference A-M. (Vol. 2A 3-657), so it is documented. | |||
|  02 Jul 2009, 18:53 | 
 | 
| LocoDelAssembly 02 Jul 2009, 19:08 Quote: 
 Yes you did   He meant that perhaps "movbe reg, reg" is supported despite the operand combination is undocumented. I've tried to test on my brother's computer but seems to be a too old Core2 because "movbe [var], edx" crashed the application. | |||
|  02 Jul 2009, 19:08 | 
 | 
| r22 02 Jul 2009, 19:09 With PSHUFB already added to SSE[?], MOVBE seems pointless, unless it is optimized (unlike the string instructions) to be faster than MOV/BSWAP.
 Seems redundant. | |||
|  02 Jul 2009, 19:09 | 
 | 
| LocoDelAssembly 02 Jul 2009, 19:25 r22, but PSHUFB needs either the FPU state or SSE state restored while MOVBE can work with GRPs.
 I don't know the real intent for this instruction but perhaps it is to make networking drivers faster by storing in network order (big-endian) and reading with a direct conversion to native format (litte-endian) in a single step? | |||
|  02 Jul 2009, 19:25 | 
 | 
| Tomasz Grysztar 02 Jul 2009, 19:52 It seems like a very elegant instruction to generally operate on big-endian fields in data structures (including the network addresses, but also perhaps ASN.1/BER structures, smart card APDUs, etc.). | |||
|  02 Jul 2009, 19:52 | 
 | 
| pal 02 Jul 2009, 20:42 Quote: He meant that perhaps "movbe reg, reg" is supported despite the operand combination is undocumented. I thought I must have to be honest  That would have been a too easy one. | |||
|  02 Jul 2009, 20:42 | 
 | 
| Madis731 05 Jul 2009, 12:15 LOAD+BSWAP will always be faster because there are many ways to throw it to the uop-schedulers. MOVBE is hindered with constant uops and therefore will penalize performance. Moreover this instruction seems not to replace BSWAP, but add [mem] capability to it.
 If it is this silent and you have to test a separate bit for it (i.e. SSE4 doesn't guarantee this bit being set) then this is the perfect formula for failure. Summary: -macroop instuction (at least 2 uops) -already has an elegant replacement as MOV reg,mem+BSWAP reg -is not guaranteed on newer CPUs -is not made popular with advertising   I don't get this instruction ... it seems to be some instruction from the times of LOOP and AAA and such... | |||
|  05 Jul 2009, 12:15 | 
 | 
| Tomasz Grysztar 05 Jul 2009, 15:12 Madis731 wrote: -already has an elegant replacement as MOV reg,mem+BSWAP reg But in the opposite direction you would have to do BSWAP reg + MOV mem,reg + BSWAP reg | |||
|  05 Jul 2009, 15:12 | 
 | 
| bitRAKE 05 Jul 2009, 19:00 ...or MOV rtemp,reg + BSWAP rtemp + MOV mem,rtemp
 (me thinks intel compiler fails to produce optimal code with BSWAP?) | |||
|  05 Jul 2009, 19:00 | 
 | 
| Madis731 06 Jul 2009, 08:13 Tomasz and bitRAKE - you might be on to something   - I tend to agree. | |||
|  06 Jul 2009, 08:13 | 
 | 
| revolution 07 Jul 2009, 16:41 I think MOVBE would be most useful in text sorting. Network address computations are hardly a case for optimising your code, but sorting data can take a considerable time in some cases and a simple optimisation with MOVBE will help a lot. | |||
|  07 Jul 2009, 16:41 | 
 | 
| Goto page 1, 2  Next < Last Thread | Next Thread > | 
| Forum Rules: 
 | 
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.