flat assembler
Message board for the users of flat assembler.

Index > Main > mov edx, eax bug!

Author
Thread Post new topic Reply to topic
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
very strange bug, long mode

mov edx, eax

is bugged!

example code fragment:

Code:

use64

    mov rdx, 123h
    
    ; echo rdx => 123h     CORRECT

    shl rdx, 32

    ; echo rdx => 12300000000h  CORRECT

    mov rax, 456789abh

    mov edx, eax

    ; echo rdx =>   456789h   WRONG! Mad 

    


doing

mov edx, eax

in long mode clears the upper 32 bits of rdx,

if instead I do:

Code:

use64

     mov rdx, 123h
     shl rdx, 32
     mov rax, 456789abh

     or rdx, rax

     ; echo rdx => 123456789abh   CORRECT!


    


any explanation for this problem? Sad
Post 16 Feb 2007, 01:52
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:
; echo rdx => 456789h WRONG!

I get checking with WinDBG (and generating a PE64 instead) 456789abh and it's right, AMD64 arquitecture clears the upper 32 bits of the 64-bits registers when you write to them using the low 32-bit part.
Post 16 Feb 2007, 02:45
View user's profile Send private message Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
LocoDelAssembly wrote:
Quote:
; echo rdx => 456789h WRONG!

I get checking with WinDBG (and generating a PE64 instead) 456789abh and it's right, AMD64 arquitecture clears the upper 32 bits of the 64-bits registers when you write to them using the low 32-bit part.


have you got access to 64 bit Intel to try this on?

(both my 64 bit machines are AMD)

I needed to do "mov edx, eax" for rdmsr where rdmsr returns
the 64 bit value as edx:eax, doing this I found values which
didnt look right. Sad

rdmsr returns the value as edx:eax so that it can be used in cpu modes
where you cannot use rdx


clearing of the upper bits really should be done by movzx,
mov is supposed to preserve all bits not mentioned in the
dest, Surprised
Post 16 Feb 2007, 14:10
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Yes, I see you found it strange because writing to DL doesn't alter DH and writing to DX doesn't alter the upper 16 bits of EDX but AMD64 arquitecture (and EM64T) follow this rules up to here, if you write to a 32 bit register the uper 32-bits WILL BE zero extended automatically, no need of MOVZX like in the other cases.

If you are operating with 32-bit values then you have no problem, and this rdmsr is returning two 32-bit values so why you have problems with this zeroing?

And no, I have no Intel CPUs anymore sorry but will happen the same because EM64T is a clon of AMD64 arquitecture with VERY few differencies and the handling of 32-bit registers doesn't differ on both arquitectures.
Post 16 Feb 2007, 14:45
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
This has been posted about before, anyway.
Post 16 Feb 2007, 15:07
View user's profile Send private message Visit poster's website Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
LocoDelAssembly wrote:
Yes, I see you found it strange because writing to DL doesn't alter DH and writing to DX doesn't alter the upper 16 bits of EDX but AMD64 arquitecture (and EM64T) follow this rules up to here, if you write to a 32 bit register the uper 32-bits WILL BE zero extended automatically, no need of MOVZX like in the other cases.


where in the AMD docs is this documented?

Quote:

If you are operating with 32-bit values then you have no problem, and this rdmsr is returning two 32-bit values so why you have problems with this zeroing?


well 2 32 bit values isnt very useful, if something is 64 bit I want it
in one 64 bit register,

what they should have done is that rdmsr should store the
64 bit value in rax and the upper 32 bits in edx,
that way it is BOTH as 2 32 bit values and as 1 64 bit value,

there isnt a problem if I use some other path eg

Code:
     mov ecx, MSR_ADDRESS
     mov rax, 0   
     rdmsr   ; we hope this only modifies edx and eax and no other bits,
     shl rdx, 32
     or rdx, rax
    


its just that you expect on all cpu's not just x86 that
a mov instruction only affects the dest not other stuff,

if it affects other stuff then the syntax should reflect this
eg
Code:
move rdx, eax
    


what it means is that AMD64 doesnt have a "mov edx, eax"
but only has "movzx rdx, eax" which it wrongly has misnamed
"mov edx,eax"

the mtrr msr's do use all 64 bits even on a 32 bit system which is
where I realised there was a bug. The problem is that most uses
of msr's the upper 32 bits are zero and the problem goes
undetected,

and the zeroing of the upper 32 bits doesnt appear to be mentioned
in the AMD documentation of mov,
Post 16 Feb 2007, 15:39
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Read chapter "3. General-Purpose Programming" from "AMD64 Architecture Programmer's Manual Volume 1 -- Application Programming".

Especially this part:
Quote:
Zero-Extension of 32-Bit Results.
As Figure 3-3 on page 27 and Figure 3-4 on page 28 show, when
performing 32-bit operations with a GPR destination in 64-bit mode, the processor zero-extends the
32-bit result into the full 64-bit destination. 8-bit and 16-bit operations on GPRs preserve all unwritten
upper bits of the destination GPR. This is consistent with legacy 16-bit and 32-bit semantics for
partial-width results.
Software should explicitly sign-extend the results of 8-bit, 16-bit, and 32-bit operations to the full 64-
bit width before using the results in 64-bit address calculations.


If you want the result of RDMSR in RAX then call it in the following way

Code:
     mov ecx, MSR_ADDRESS 
     rdmsr ; since it returns two 32-bit results I assumed that RAX[63:32] and RDX[63:32] are zeroed after execution of RDMSR
     shl rdx, 32 
     or rax, rdx    

As f0dder said, this was discussed before, try to find those post because there are good reasons for handling 32-bits results this way (which I don't remember just now Sad)

[edit]BTW, in "Appendix B General-Purpose Instructions in
64-Bit Mode" of "AMD64 Architecture Programmer's Manual Volume 3 -- General-Purpose and system Instructions" you have detailed information about every instruction[/edit]
Post 16 Feb 2007, 16:31
View user's profile Send private message Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
LocoDelAssembly wrote:
Read chapter "3. General-Purpose Programming" from "AMD64 Architecture Programmer's Manual Volume 1 -- Application Programming".


ok, that explains why: its one volume I havent looked at yet,

I usually only look at the system or instruction manuals,

Quote:

As f0dder said, this was discussed before, try to find those post because there are good reasons for handling 32-bits results this way (which I don't
remember just now Sad)


do you have a link for this discussion?

my guess is that usually you process numbers at maximum width
but store them at lower widths to save space, eg C promotes smaller numbers to ints,

so you want reading of bytes, words,
dwords to be automatically extended either sign extended or
zero extended.

as I said above the terminology is bad, if it zero extends the
terminology should have been movzx, then there would be
no problem.

that is a criticism of the opcode name scheme and not of the architecture,

sign extension and zero extension are NOT MOVES! Mad

a move is a move, a sign extension or zero extension is a MODIFY, Mad

if I move a book from here to there I cannot tear out all the pages
beyond page 100 as that is no longer a move,
Laughing
Post 16 Feb 2007, 17:06
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
lazer1 wrote:
a move is a move, a sign extension or zero extension is a MODIFY, Mad

if I move a book from here to there I cannot tear out all the pages
beyond page 100 as that is no longer a move,
Laughing
Yeah but the move you are talking about is just a circuitry, representing a computer instruction. Wink
Post 16 Feb 2007, 17:10
View user's profile Send private message Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd!
Quote:

As f0dder said, this was discussed before, try to find those post because there are good reasons for handling 32-bits results this way (which I don't remember just now )


The good reason is that it avoids partial register stalls and false dependence on rdx. To really copy the 32 low bits of rax to the 32 low bits of rdx, you can do something like:

xor eax, edx
xor rdx, rax

This destroys rax. If you want to preserve it, you could try

mov r10, rax
xor r10d, edx
xor rdx, r10

The destruction of the high 32 bits of the destination register on a 32-bit operation takes some getting used to, but the exact behavior of these operations in 64-bit mode must be mastered for effective programming in this mode. After you do get used to it you will see how to use it to advantage.

Edit: cleaned up a couple of simple-minded mistakes. Haven't been doing enough programming lately!
Post 16 Feb 2007, 18:14
View user's profile Send private message Visit poster's website Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
The_Grey_Beast wrote:
lazer1 wrote:
a move is a move, a sign extension or zero extension is a MODIFY, Mad

if I move a book from here to there I cannot tear out all the pages
beyond page 100 as that is no longer a move,
Laughing
Yeah but the move you are talking about is just a circuitry, representing a computer instruction. Wink


its a question of layering, its a move at one level of layering the informational bit level but not at the physical level,

same way that if someone transfers 100 pounds to your bank account and you now go to the bank and withdraw 100 pounds, the banknotes will not
usually have come from the person!

at the economic level though the 100 pounds has moved from them to you
but not at the physical level,

if you talk to someone on the phone you arent actually talking to them
but are talking to your telephone, and when you go on the internet
you arent but are just sitting in your room all day staring at a plastic
rectangle Very Happy


Last edited by lazer1 on 16 Feb 2007, 20:50; edited 1 time in total
Post 16 Feb 2007, 20:15
View user's profile Send private message Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
Xorpd! wrote:
Quote:

As f0dder said, this was discussed before, try to find those post because there are good reasons for handling 32-bits results this way (which I don't remember just now )



Quote:

The good reason is that it avoids partial register stalls and false dependence on rdx.


I can just about understand that, sounds like good AMD reasons,

Quote:

To really copy the 32 low bits of rax to the 32 low bits of rdx, you can do something like:

xor eax, edx
xor rdx, rax



thats a neat way around the problem,

and we are certain that "xor eax, edx" will preserve the upper bits?

Quote:

This destroys rax. If you want to preserve it, you could try

mov r10, rax
xor r10d, edx
xor rdx, r10


motivated by your first idea I can do it thus:

Code:
    xor  edx, edx
    xor  edx, eax
    


(I hope! Razz )

the crazy thing is that often I have needed
"movzx rdx, eax" and if you look up movzx in
the instruction reference manual on p189
there is no such instruction!

in fact there is, it is just "mov edx, eax"

so I always had to do this in software,
AMD keep throwing banana skins in my path

OTOH there is sign extend via "movsxd rdx, eax",
Post 16 Feb 2007, 20:33
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
the problem with all 80x86 instructions set architetures is that the newer the CPUs are, the more crappy bloated they. It's impossible to program something with and a recent 64bit x64 CPU and say "oh that's a very efficient code" since the whole processors are very inefficient from groung up.

Some said the 80x86 ISA is the most "baroque" one in the world!
Post 16 Feb 2007, 20:47
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7781
Location: Kraków, Poland
Tomasz Grysztar
LocoDelAssembly wrote:
As f0dder said, this was discussed before, try to find those post because there are good reasons for handling 32-bits results this way (which I don't remember just now Sad)


Perhaps you mean this thread: http://board.flatassembler.net/topic.php?p=48137#48137

The problem was also earlier discussed here, and here and in many other places.

And note: it's not just the MOV instruction that this applies to, EVERY operation that has 32-bit GPR as a target clears the upper 32 bits of that GPR, even "xchg eax,eax" does the zero extension of EAX into RAX.
Post 17 Feb 2007, 11:33
View user's profile Send private message Visit poster's website Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
Tomasz Grysztar wrote:
LocoDelAssembly wrote:
As f0dder said, this was discussed before, try to find those post because there are good reasons for handling 32-bits results this way (which I don't remember just now Sad)


Perhaps you mean this thread: http://board.flatassembler.net/topic.php?p=48137#48137

The problem was also earlier discussed here, and here and in many other places.

And note: it's not just the MOV instruction that this applies to, EVERY operation that has 32-bit GPR as a target clears the upper 32 bits of that GPR, even "xchg eax,eax" does the zero extension of EAX into RAX.


I tried just now 2 experiments that verify this:

experiment 1:

rdmsr does zero the upper 32 bits of rax and rdx,

experiment 2:

my own suggestion via xor doesnt function as
"xor edx, edx" and "xor edx, eax" zero the upper
32 bits,

I think I have to read through all the long mode code I've written
searching for this problem. Sad

this is a really BAD feature of AMD64 Evil or Very Mad ,

eg it is inconsistent with:

mov dword [ xyz ], eax

where I hope it doesnt zero the next 32 bits Mad
Post 17 Feb 2007, 21:40
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Quote:

even "xchg eax,eax" does the zero extension of EAX into RAX.
...
(recent bug of "push cr0" fixed in 1.67.21)


Yes, but who uses these, if anybody, and why?
Post 18 Feb 2007, 03:54
View user's profile Send private message Visit poster's website Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
rugxulo wrote:
Quote:

even "xchg eax,eax" does the zero extension of EAX into RAX.
...
(recent bug of "push cr0" fixed in 1.67.21)


Yes, but who uses these, if anybody, and why?


"xchg eax,eax" probably is there because it is simpler to
implement the hardware with it than to disallow it:

keeps the operands orthogonal and the design general,

its a bit like allowing you to write yourself a cheque or
send yourself an email,

but now that its there and that it clears the upper 32 bits
it could be used as a trick to clear the upper bits. although
"mov eax, eax" has the same effect.

you may say "xchg eax,eax" is redundant as "mov eax,eax"
is just as good, but the latter is there for exactly the same
reasons namely orthogonality and generality and either
is useful as a trick to clear the upper 32 bits.

(I dont know if there is a better way to clear the upper 32 bits)

likewise sending yourself an email can be used as a trick to
test whether your email setup is functioning,

eg if you get no emails for several days you can email yourself
to see if there is a problem,


push cr0

could be used to set a cr0 flag without using registers

Code:
    push cr0
    or qword [rsp], SOME_FLAG
    pop cr0
    


instead of freeing a register:

Code:
  push rax
  mov rax, cr0
  or rax, SOME_FLAG
  mov cr0, rax
  pop rax
    


if nothing else the first fragment is more readable,
possibly it is slower,

"push cr0" is fiddly to do in software if there are no free
registers eg:

Code:
macro push_cr0
    {
    push rax
    push rax
    mov rax, cr0
    mov [rsp + 8], rax ; I think its +8 and not -8?
    pop rax
    }
    
Post 18 Feb 2007, 20:24
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Well, the only problem is that it isn't an actual instruction! (News to me, not that I ever fathomed such a thing anyways ...):

Privalov wrote:

MCD wrote:

I discovered a bug in fasm (1.67.20) while writing the "only8086.inc", namely that fasm allows instructions like "push cr0" or "pop cr5", which actually don't exist at all

As for the "push cr0" bug, it's fixed in 1.67.21.


Code:
; only works in FASM <= 1.67.20
org 100h

push cr0
    


Quote:

[ WinXP ] Sun 02/18/2007>fasm166d push_cr0.asm
flat assembler version 1.66
1 passes, 3 bytes.

[ WinXP ] Sun 02/18/2007>ndisasm push_cr0.com
00000000 0FF8C3 psubb mm0,mm3

[ WinXP ] Sun 02/18/2007>scrndump
Post 18 Feb 2007, 20:45
View user's profile Send private message Visit poster's website Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1
rugxulo wrote:
Well, the only problem is that it isn't an actual instruction! (News to me, not that I ever fathomed such a thing anyways ...):

Privalov wrote:

MCD wrote:

I discovered a bug in fasm (1.67.20) while writing the "only8086.inc", namely that fasm allows instructions like "push cr0" or "pop cr5", which actually don't exist at all

As for the "push cr0" bug, it's fixed in 1.67.21.


Code:
; only works in FASM <= 1.67.20
org 100h

push cr0
    


Quote:

[ WinXP ] Sun 02/18/2007>fasm166d push_cr0.asm
flat assembler version 1.66
1 passes, 3 bytes.

[ WinXP ] Sun 02/18/2007>ndisasm push_cr0.com
00000000 0FF8C3 psubb mm0,mm3

[ WinXP ] Sun 02/18/2007>scrndump


darn! Embarassed

my version of fasm accepted it so I assumed it was ok, Confused

was that a trick question?

also the or instruction above should have been
"or dword [ rsp ], SOME_FLAG"
as AMD64 "or" sign extends the immediate value for
a 64 bit dest, which would cause a problem if you set
bit 31. (cr0 doesnt use the upper 32 bits),

bts is probably a better way to set an individual bit,

I've had a look at my code how I set multiple bits for a crn
register:

Code:
    mov reg, crn
    bts reg, BIT1
    bts reg, BIT2
    mov crn, reg
    



I can and do make errors and have to hope that any bugs
manifest soon so I can determine the problem:

the "mov edx, eax" problem went undetected for months,
it was in a subroutine "msr_read" and was undetected because
all cases where I used it the upper 32 bits were 0 anyway!

I only realised a problem when I started looking at the mtrr's
and saw memory types of 06060606 where I expected all 8 bytes
the same (for user memory in the first 640K of memory)

no matter what I did I couldnt get all 8 the same, I thought
that maybe the bug was with "shl rdx, 32",

its probably a good idea to echo the operands + result
when trying out any new opcodes, its a good idea anyway
to put echo statements in new code
Post 18 Feb 2007, 22:04
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.