flat assembler
Message board for the users of flat assembler.
Index
> Main > mov edx, eax bug! |
Author |
|
LocoDelAssembly 16 Feb 2007, 02:45
Quote: ; echo rdx => 456789h WRONG! I get checking with WinDBG (and generating a PE64 instead) 456789abh and it's right, AMD64 arquitecture clears the upper 32 bits of the 64-bits registers when you write to them using the low 32-bit part. |
|||
16 Feb 2007, 02:45 |
|
lazer1 16 Feb 2007, 14:10
LocoDelAssembly wrote:
have you got access to 64 bit Intel to try this on? (both my 64 bit machines are AMD) I needed to do "mov edx, eax" for rdmsr where rdmsr returns the 64 bit value as edx:eax, doing this I found values which didnt look right. rdmsr returns the value as edx:eax so that it can be used in cpu modes where you cannot use rdx clearing of the upper bits really should be done by movzx, mov is supposed to preserve all bits not mentioned in the dest, |
|||
16 Feb 2007, 14:10 |
|
LocoDelAssembly 16 Feb 2007, 14:45
Yes, I see you found it strange because writing to DL doesn't alter DH and writing to DX doesn't alter the upper 16 bits of EDX but AMD64 arquitecture (and EM64T) follow this rules up to here, if you write to a 32 bit register the uper 32-bits WILL BE zero extended automatically, no need of MOVZX like in the other cases.
If you are operating with 32-bit values then you have no problem, and this rdmsr is returning two 32-bit values so why you have problems with this zeroing? And no, I have no Intel CPUs anymore sorry but will happen the same because EM64T is a clon of AMD64 arquitecture with VERY few differencies and the handling of 32-bit registers doesn't differ on both arquitectures. |
|||
16 Feb 2007, 14:45 |
|
f0dder 16 Feb 2007, 15:07
This has been posted about before, anyway.
|
|||
16 Feb 2007, 15:07 |
|
lazer1 16 Feb 2007, 15:39
LocoDelAssembly wrote: Yes, I see you found it strange because writing to DL doesn't alter DH and writing to DX doesn't alter the upper 16 bits of EDX but AMD64 arquitecture (and EM64T) follow this rules up to here, if you write to a 32 bit register the uper 32-bits WILL BE zero extended automatically, no need of MOVZX like in the other cases. where in the AMD docs is this documented? Quote:
well 2 32 bit values isnt very useful, if something is 64 bit I want it in one 64 bit register, what they should have done is that rdmsr should store the 64 bit value in rax and the upper 32 bits in edx, that way it is BOTH as 2 32 bit values and as 1 64 bit value, there isnt a problem if I use some other path eg Code: mov ecx, MSR_ADDRESS mov rax, 0 rdmsr ; we hope this only modifies edx and eax and no other bits, shl rdx, 32 or rdx, rax its just that you expect on all cpu's not just x86 that a mov instruction only affects the dest not other stuff, if it affects other stuff then the syntax should reflect this eg Code:
move rdx, eax
what it means is that AMD64 doesnt have a "mov edx, eax" but only has "movzx rdx, eax" which it wrongly has misnamed "mov edx,eax" the mtrr msr's do use all 64 bits even on a 32 bit system which is where I realised there was a bug. The problem is that most uses of msr's the upper 32 bits are zero and the problem goes undetected, and the zeroing of the upper 32 bits doesnt appear to be mentioned in the AMD documentation of mov, |
|||
16 Feb 2007, 15:39 |
|
LocoDelAssembly 16 Feb 2007, 16:31
Read chapter "3. General-Purpose Programming" from "AMD64 Architecture Programmer's Manual Volume 1 -- Application Programming".
Especially this part: Quote: Zero-Extension of 32-Bit Results. If you want the result of RDMSR in RAX then call it in the following way Code: mov ecx, MSR_ADDRESS rdmsr ; since it returns two 32-bit results I assumed that RAX[63:32] and RDX[63:32] are zeroed after execution of RDMSR shl rdx, 32 or rax, rdx As f0dder said, this was discussed before, try to find those post because there are good reasons for handling 32-bits results this way (which I don't remember just now ) [edit]BTW, in "Appendix B General-Purpose Instructions in 64-Bit Mode" of "AMD64 Architecture Programmer's Manual Volume 3 -- General-Purpose and system Instructions" you have detailed information about every instruction[/edit] |
|||
16 Feb 2007, 16:31 |
|
lazer1 16 Feb 2007, 17:06
LocoDelAssembly wrote: Read chapter "3. General-Purpose Programming" from "AMD64 Architecture Programmer's Manual Volume 1 -- Application Programming". ok, that explains why: its one volume I havent looked at yet, I usually only look at the system or instruction manuals, Quote:
do you have a link for this discussion? my guess is that usually you process numbers at maximum width but store them at lower widths to save space, eg C promotes smaller numbers to ints, so you want reading of bytes, words, dwords to be automatically extended either sign extended or zero extended. as I said above the terminology is bad, if it zero extends the terminology should have been movzx, then there would be no problem. that is a criticism of the opcode name scheme and not of the architecture, sign extension and zero extension are NOT MOVES! a move is a move, a sign extension or zero extension is a MODIFY, if I move a book from here to there I cannot tear out all the pages beyond page 100 as that is no longer a move, |
|||
16 Feb 2007, 17:06 |
|
Borsuc 16 Feb 2007, 17:10
lazer1 wrote: a move is a move, a sign extension or zero extension is a MODIFY, |
|||
16 Feb 2007, 17:10 |
|
Xorpd! 16 Feb 2007, 18:14
Quote:
The good reason is that it avoids partial register stalls and false dependence on rdx. To really copy the 32 low bits of rax to the 32 low bits of rdx, you can do something like: xor eax, edx xor rdx, rax This destroys rax. If you want to preserve it, you could try mov r10, rax xor r10d, edx xor rdx, r10 The destruction of the high 32 bits of the destination register on a 32-bit operation takes some getting used to, but the exact behavior of these operations in 64-bit mode must be mastered for effective programming in this mode. After you do get used to it you will see how to use it to advantage. Edit: cleaned up a couple of simple-minded mistakes. Haven't been doing enough programming lately! |
|||
16 Feb 2007, 18:14 |
|
lazer1 16 Feb 2007, 20:15
The_Grey_Beast wrote:
its a question of layering, its a move at one level of layering the informational bit level but not at the physical level, same way that if someone transfers 100 pounds to your bank account and you now go to the bank and withdraw 100 pounds, the banknotes will not usually have come from the person! at the economic level though the 100 pounds has moved from them to you but not at the physical level, if you talk to someone on the phone you arent actually talking to them but are talking to your telephone, and when you go on the internet you arent but are just sitting in your room all day staring at a plastic rectangle Last edited by lazer1 on 16 Feb 2007, 20:50; edited 1 time in total |
|||
16 Feb 2007, 20:15 |
|
lazer1 16 Feb 2007, 20:33
Xorpd! wrote:
Quote:
I can just about understand that, sounds like good AMD reasons, Quote:
thats a neat way around the problem, and we are certain that "xor eax, edx" will preserve the upper bits? Quote:
motivated by your first idea I can do it thus: Code: xor edx, edx xor edx, eax (I hope! ) the crazy thing is that often I have needed "movzx rdx, eax" and if you look up movzx in the instruction reference manual on p189 there is no such instruction! in fact there is, it is just "mov edx, eax" so I always had to do this in software, AMD keep throwing banana skins in my path OTOH there is sign extend via "movsxd rdx, eax", |
|||
16 Feb 2007, 20:33 |
|
MCD 16 Feb 2007, 20:47
the problem with all 80x86 instructions set architetures is that the newer the CPUs are, the more crappy bloated they. It's impossible to program something with and a recent 64bit x64 CPU and say "oh that's a very efficient code" since the whole processors are very inefficient from groung up.
Some said the 80x86 ISA is the most "baroque" one in the world! |
|||
16 Feb 2007, 20:47 |
|
Tomasz Grysztar 17 Feb 2007, 11:33
LocoDelAssembly wrote: As f0dder said, this was discussed before, try to find those post because there are good reasons for handling 32-bits results this way (which I don't remember just now ) Perhaps you mean this thread: http://board.flatassembler.net/topic.php?p=48137#48137 The problem was also earlier discussed here, and here and in many other places. And note: it's not just the MOV instruction that this applies to, EVERY operation that has 32-bit GPR as a target clears the upper 32 bits of that GPR, even "xchg eax,eax" does the zero extension of EAX into RAX. |
|||
17 Feb 2007, 11:33 |
|
lazer1 17 Feb 2007, 21:40
Tomasz Grysztar wrote:
I tried just now 2 experiments that verify this: experiment 1: rdmsr does zero the upper 32 bits of rax and rdx, experiment 2: my own suggestion via xor doesnt function as "xor edx, edx" and "xor edx, eax" zero the upper 32 bits, I think I have to read through all the long mode code I've written searching for this problem. this is a really BAD feature of AMD64 , eg it is inconsistent with: mov dword [ xyz ], eax where I hope it doesnt zero the next 32 bits |
|||
17 Feb 2007, 21:40 |
|
rugxulo 18 Feb 2007, 03:54
Quote:
Yes, but who uses these, if anybody, and why? |
|||
18 Feb 2007, 03:54 |
|
lazer1 18 Feb 2007, 20:24
rugxulo wrote:
"xchg eax,eax" probably is there because it is simpler to implement the hardware with it than to disallow it: keeps the operands orthogonal and the design general, its a bit like allowing you to write yourself a cheque or send yourself an email, but now that its there and that it clears the upper 32 bits it could be used as a trick to clear the upper bits. although "mov eax, eax" has the same effect. you may say "xchg eax,eax" is redundant as "mov eax,eax" is just as good, but the latter is there for exactly the same reasons namely orthogonality and generality and either is useful as a trick to clear the upper 32 bits. (I dont know if there is a better way to clear the upper 32 bits) likewise sending yourself an email can be used as a trick to test whether your email setup is functioning, eg if you get no emails for several days you can email yourself to see if there is a problem, push cr0 could be used to set a cr0 flag without using registers Code: push cr0 or qword [rsp], SOME_FLAG pop cr0 instead of freeing a register: Code: push rax mov rax, cr0 or rax, SOME_FLAG mov cr0, rax pop rax if nothing else the first fragment is more readable, possibly it is slower, "push cr0" is fiddly to do in software if there are no free registers eg: Code: macro push_cr0 { push rax push rax mov rax, cr0 mov [rsp + 8], rax ; I think its +8 and not -8? pop rax } |
|||
18 Feb 2007, 20:24 |
|
rugxulo 18 Feb 2007, 20:45
Well, the only problem is that it isn't an actual instruction! (News to me, not that I ever fathomed such a thing anyways ...):
Privalov wrote:
Code: ; only works in FASM <= 1.67.20 org 100h push cr0 Quote:
|
|||
18 Feb 2007, 20:45 |
|
lazer1 18 Feb 2007, 22:04
rugxulo wrote: Well, the only problem is that it isn't an actual instruction! (News to me, not that I ever fathomed such a thing anyways ...): darn! my version of fasm accepted it so I assumed it was ok, was that a trick question? also the or instruction above should have been "or dword [ rsp ], SOME_FLAG" as AMD64 "or" sign extends the immediate value for a 64 bit dest, which would cause a problem if you set bit 31. (cr0 doesnt use the upper 32 bits), bts is probably a better way to set an individual bit, I've had a look at my code how I set multiple bits for a crn register: Code: mov reg, crn bts reg, BIT1 bts reg, BIT2 mov crn, reg I can and do make errors and have to hope that any bugs manifest soon so I can determine the problem: the "mov edx, eax" problem went undetected for months, it was in a subroutine "msr_read" and was undetected because all cases where I used it the upper 32 bits were 0 anyway! I only realised a problem when I started looking at the mtrr's and saw memory types of 06060606 where I expected all 8 bytes the same (for user memory in the first 640K of memory) no matter what I did I couldnt get all 8 the same, I thought that maybe the bug was with "shl rdx, 32", its probably a good idea to echo the operands + result when trying out any new opcodes, its a good idea anyway to put echo statements in new code |
|||
18 Feb 2007, 22:04 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.