flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > Size operator handling inconsistency.

Author
Thread Post new topic Reply to topic
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 09 May 2012, 18:44
Hi. My question is related to the usage of size operators (word, dword, qword) with immediate values of the push instruction. Previously I thought, there's no difference between using a suffix with the push instruction and using a size operator. However I didn't think, what effect has the size operator on the immediate values. After some empirical investigation my assumption is the following:
1) The suffix of the push instruction enforces the corresponding operand size (i.e. a number of bytes pushed onto the stack) which by default corresponds to the current code generation mode. Thus pushd always puts 4 bytes onto the stack.
2) As opposed to the suffix the size operator has nothing to do with the number of bytes put on the stack, but enforces the size of the immediate value encoded within the instruction.

This behaviour also seems to be pretty convenient, because it allows to use 68 01 00 00 00 instead of 6A 01 (for whatever reason) by writing push dword 1, which would always encode the immediate value 1 in four bytes. However this rule is not always obeyed. For example, to get the same 4-bytes encoding of the immediate value in 64-bit mode I'd have to write push qword 1 instead of push dword 1, and the latter is not allowed at all. push byte is also never allowed.

So my question is, what meaning actually has the size operator in this case?


My secondary bundle of questions is not directly but still related to the topic. Is this made intentionally, that the operand size always defaults to the natural operand size of the current mode when working with the address operand of the multibyte nop(e.g. nop [0])? Am I right assuming, that this is because no real memory access is made? (Multibyte nop's are not described in the fasm documentation). Is this the only instruction, that can be cleanly compiled without any hint about the operand size of the address expression?

Kind regards
Post 09 May 2012, 18:44
View user's profile Send private message Reply with quote
cod3b453



Joined: 25 Aug 2004
Posts: 618
cod3b453 09 May 2012, 19:28
It's worth noting that these are 64bit extensions to the the instruction set and so are a little different from the transition between 16 and 32 bit. Effectively most things that were 32 bit were replaced by 64bit operations but with the restriction that immediates were still limited to 32bit and sign extended -- the exception is mov r64,imm64. [The reason for this was to allow the same 32bit code assembled in 64bit to exhibit the same behavior].

The size operators then allow you choose the encoding that is available which can be useful if you want to change the value without changing the size of the assembled code -- usually useful for instruction alignment or debugging.
Post 09 May 2012, 19:28
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 09 May 2012, 19:38
First of all: the size operator in fasm always reflects the size of the data the operation is to be performed on, and not the size of any field in the code of instruction. This is because fasm's instruction syntax is primarily focused on operation of instruction and not on the encoding. I had a disagreement with vid over this, you may find some interesting discussions on the board.

Now, the side-effect of size operator for PUSH's immediate operand is the case of same special handling as the size operator for immediate for other basic operations like AND, XOR, OR, etc. - it enforces encoding the immediate value as full-size. For example "or ax,1" and "or ax,word 1" differ in that the first one gets optimized, while the latter will have the longer form where the immediate is 16-bit. It was introduced to aid in working with self-modifying code, something like:
Code:
or ax,word 1
label immediate at $-2    
I may have forgotten to document it, I'm not sure about it now. If that's the case, I shall fit this information into documentation somewhere - this feature has a very long history.

The extended NOP instruction syntax is another thing missing from documentation - thank you for pointing that out, I will try to remember to add it in a spare time. Yes, NOP is probably the only exception to the rule that you have to specify operand size when there is possible ambiguity in the size of the memory operand - I felt that there is no need for such rigidity since, as you pointed out, not real memory access is made.
Post 09 May 2012, 19:38
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 09 May 2012, 19:45
OK, I found out that I did not forget to document it, in fact it has a whole dedicated section in manual. Smile

No wonder I forgot about it, it is just as old as this manual is. Perhaps it might even not be a bad idea to refresh it a little.
Post 09 May 2012, 19:45
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 09 May 2012, 20:29
Tomasz Grysztar
Quote:
I found out that I did not forget to document it, in fact it has a whole dedicated section in manual

Oh. I'm sorry. My bad. So the suffix just enforces the operand size, and the size operator enforces the operand size plus full-size encoding for this operand size.
The idea about the inconsistency came from the fact, that in 64 bit mode I should write push qword to achieve full size encoding, but in reality I cannot place a value from the quadword composite range after the operator (however it's also not a value from the double word composite range).

Thank you for pointing me out to the documentation.

Btw. I wanted to ask for a long time, but always forgot. What's the reasoning of allowing the weird notation push eax+1?
Post 09 May 2012, 20:29
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 09 May 2012, 20:52
l_inc wrote:
What's the reasoning of allowing the weird notation push eax+1?
That's an imperfection in implementing the TASM legacy feature caused by the quirk of fasm's parser that you may already be aware of - that register name is treated as a separate unit and not part of expression (unless it is an address expression, like after AT of PTR operator, or in square brackets), so "eax+1" is not the same as "+eax+1".

TASM allowed to write "push eax(+1)", but not "push eax+1" as in the latter case the "eax+1" would be treated as one expression. fasm would need "push +eax+1" to get such interpretation.
Post 09 May 2012, 20:52
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 09 May 2012, 21:08
Tomasz Grysztar
OK. I could guess this one by myself too. Smile Thanks again.
Post 09 May 2012, 21:08
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.