flat assembler
Message board for the users of flat assembler.

Index > Main > PUSH and POP

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
DOS386



Joined: 08 Dec 2006
Posts: 1905
DOS386 25 Jan 2007, 10:32
How is the operand size handled with PUSH and POP instructions ?
FASM manual does not reveal too much Sad

Is it possible at all to push 1 byte only ?

Found out that

Code:
  push 0
    


compiles always into 2 bytes only Shocked - the 0 is stored as a byte,
but nevertheless seems to push 16 bits when in 16-bit mode
and 32 bits when in 32-bit mode Confused

While

Code:
  mov eax,0  ; or should i use "xor eax,eax" Very Happy
    


compiles into 5 bytes Shocked

How is the amount of pushed data controlled ?

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 25 Jan 2007, 10:32
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 Jan 2007, 10:34
There's a short paragraph about the size settings for PUSH at the end of section 1.2.6 of manual.
Post 25 Jan 2007, 10:34
View user's profile Send private message Visit poster's website Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1905
DOS386 25 Jan 2007, 10:46
Quote:
There's a short paragraph about the size settings


OK, there is one ...

Code:
Immediate value as an operand for push instruction without a size operator is
by default treated as a word value if assembler is in 16-bit mode and as a
double word value if assembler is in 32-bit mode, shorter 8-bit form of this
instruction is used if possible, word or dword size operator forces the push
instruction to be generated in longer form for specified size. pushw and pushd
mnemonics force assembler to generate 16-bit or 32-bit code without forcing it
to use the longer form of instruction.
    


It DOES subject to my issue ... but I am still NOT fully smart from it Confused

Quote:

force assembler to generate 16-bit or 32-bit code


Is it useful at all to generate missmatching code (16-bit when other code is
32-bit and CPU is in 32-bit PM) or 32-bit when other code is 16-bit ? Confused

The text does NOT mention the amount of pushed data Confused ... and also
does not say anything on pushing 1 byte only Confused

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 25 Jan 2007, 10:46
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 25 Jan 2007, 11:34
NTOSKRNL_VXE wrote:

Is it useful at all to generate missmatching code (16-bit when other code is
32-bit and CPU is in 32-bit PM) or 32-bit when other code is 16-bit ? Confused

Well, if you code stuff like DOS extenders (flat/unreal mode) where you switch from 16bit to 32bit mode for example

NTOSKRNL_VXE wrote:

The text does NOT mention the amount of pushed data Confused ... and also
does not say anything on pushing 1 byte only Confused
You are right, the fasm docs aren't fully clear in this aspect. I will try to give you an overview:
Code:
use16
;not allowed
push/pop      -2^31...-1-2^15 | 2^15..-1+2^31
pushw/popw    -2^31...-1-2^15 | 2^15..-1+2^31
;BUT allowed pushes/pops dword onto/from stack, 4 bytes in machine code
pushd/popd    -2^31...-1-2^15 | 2^15..-1+2^31

;pushes/pops word onto/from stack, 2 bytes in machine code
push/pop      -2^15...-1-2^7 | 2^7..-1+2^15
pushw/popw    -2^15...-1-2^7 | 2^7..-1+2^15
;BUT this pushes/pops dword onto/from stack, 4 bytes in machine code
pushd/popd    -2^15...-1-2^7 | 2^7..-1+2^15

;pushes/pops word onto/from stack, 1 bytes in machine code
push/pop    -2^7..-1+2^7
;BUT this pushes/pops word onto/from stack, 2 bytes in machine code
pushw/popw    -2^7..-1+2^7
;BUT this pushes/pops dword onto/from stack, 4 bytes in machine code
pushd/popd    -2^7..-1+2^7



use32
;pushes/pops dword onto/from stack, 4 bytes in machine code
push/pop      -2^31...-1-2^15 | 2^15..-1+2^31
pushd/popd    -2^31...-1-2^15 | 2^15..-1+2^31
;BUT not allowed
pushw/popw    -2^31...-1-2^15 | 2^15..-1+2^31

;pushes/pops dword onto/from stack, 4 bytes in machine code
push/pop      -2^15...-1-2^7 | 2^7..-1+2^15
pushd/popd    -2^15...-1-2^7 | 2^7..-1+2^15
;BUT this pushes/pops word onto/from stack, 2 bytes in machine code
pushw/popw    -2^15...-1-2^7 | 2^7..-1+2^15

;pushes/pops word onto/from stack, 1 bytes in machine code
push/pop    -2^7..-1+2^7
;BUT this pushes/pops dword onto/from stack, 4 bytes in machine code
pushd/popd    -2^7..-1+2^7
;BUT this pushes/pops word onto/from stack, 2 bytes in machine code
pushw/popw    -2^7..-1+2^7
//sorry, no use64 yet
    

So you see that there is no push/pop instruction that operates with only 1 byte. You would have to do this with other instructions, like directly writing to the stack, which is problematic to achieve.

Also I highly disrecommend 1 byte pushes/pops because of stack misalignment issues, poor performance and very dangerous/or completely prohibited in most modern OSes.

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 25 Jan 2007, 11:34
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 Jan 2007, 11:36
NTOSKRNL_VXE wrote:
The text does NOT mention the amount of pushed data Confused ... and also does not say anything on pushing 1 byte only Confused

The size of the operand is the amount of pushed data aswell (by the very definition of the PUSH operation). So it is either word or double word, as stated. No byte push is possible.
The 16-bit variant of instruction stores 16 bits on stack, the 32-bit variant stores 32 bits. These two are the only options (not counting the long mode here).
Post 25 Jan 2007, 11:36
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 25 Jan 2007, 12:06
Or to make it even more clear: "the shorte 8bit form" is only usable for values from -128 to 127 and it means that the generated machine code contains only a byte, but this byte is nevertheless pushed as a word/dword by sign extending it.

for even more details refer to the Intel/AMD docs
Post 25 Jan 2007, 12:06
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 Jan 2007, 12:18
MCD wrote:
Or to make it even more clear: "the shorte 8bit form" is only usable for values from -128 to 127 (...)

This is a bit tricky. In 16-bit mode "push 65408" will also generate the short form.

fasm follows the philosophy of assembly language being the abstract layer over machine code, which focuses on the functionality of the instruction. You write "push 65408" in 16-bit mode to push the 16-bit value on stack, and choosing the nicest encoding for this instruction is then a task for assembler.
Post 25 Jan 2007, 12:18
View user's profile Send private message Visit poster's website Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1905
DOS386 25 Jan 2007, 12:21
Thanks.

Quote:
No byte push is possible.


OK. 8080 was the last supporting this Shocked

Quote:
Also I highly disrecommend 1 byte pushes/pops


Then MCD was wrong ... or meant "emulated" PUSH/POPing Confused

Quote:
Well, if you code stuff like DOS extenders (flat/unreal mode) where you switch from 16bit to 32bit mode for example


That's what I do Wink, I do have 16-bit and 32-bit blocks in same
executable, but still don't see a reason for having 16-bit code in a 32-bit
block or vice-versa Confused

Quote:
The size of the operand is the amount of pushed data aswell


OK, but this operand is in source only and its size in NOT the size in
output code Shocked

It's confusing Sad

Code:
Immediate value as an operand for push instruction without a 
size operator results by default into pushing 16 bits if assembler
is in 16-bit mode and 32 bits if assembler is in 32-bit mode, 
shorter 8-bit form of this instruction is used if possible, word or 
dword size operator forces the push instruction to be generated 
in longer form for specified size and also push this size. pushw 
and pushd mnemonics force assembler to generate instructions 
pushing 16-bit or 32-bit without forcing it to use the longer form 
of instruction. Pushing 1 byte only is not possible.
    


Maybe paragraph should be fixed to something like this ^^^ ? Question

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 25 Jan 2007, 12:21
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 Jan 2007, 12:24
NTOSKRNL_VXE wrote:
OK, but this operand is in source only and its size in NOT the size in output code Shocked

As I said in the post above: when you write assembly, you usually focus on what the instruction does, not how it is encoded.
Post 25 Jan 2007, 12:24
View user's profile Send private message Visit poster's website Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1905
DOS386 25 Jan 2007, 12:37
Quote:
This is a bit tricky.


YES. Wink

Quote:
In 16-bit mode "push 65408" will also generate the short form


OK, 65408=$FF80, push $FF80 could be push -$80, this is
the same as push $80, fits into 2 bytes, and then CPU extends $80 back to
$FF80...

and in "use32" even push $FFFFFF80 fits into 2 bytes Shocked

Quote:

you usually focus on what the instruction does, not how it is encoded.


but not always, see XOR EAX,EAX issue Very Happy

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 25 Jan 2007, 12:37
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 Jan 2007, 12:54
NTOSKRNL_VXE wrote:
but not always, see XOR EAX,EAX issue Very Happy

Yes, the optimization that changes the type of operation is left to be done by the programmer, especially when there's a real difference in operation (XOR changes the flags, MOV does not). However for the given EXACT operation I recommend leaving it to assembler to find the best form. That's what fasm is for. Wink (Though also here there are some controversies, like whether assembler should optimize LEA to MOV, etc., that's another story...)
Post 25 Jan 2007, 12:54
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 25 Jan 2007, 17:15
Tomasz Grysztar wrote:
MCD wrote:
Or to make it even more clear: "the shorte 8bit form" is only usable for values from -128 to 127 (...)

This is a bit tricky. In 16-bit mode "push 65408" will also generate the short form.
yep, this comes because the push imm instructions "packs" numbers with sign, whereas numbers in 16bit may both be unsigned or signed.

I guess there is an analogous case with "push 4294967168" in 32bit mode
Post 25 Jan 2007, 17:15
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 Jan 2007, 17:28
MCD wrote:
I guess there is an analogous case with "push 4294967168" in 32bit mode

Oh well, even with the 64-bit "push 18446744073709551488" Wink
Post 25 Jan 2007, 17:28
View user's profile Send private message Visit poster's website Reply with quote
FrozenKnight



Joined: 24 Jun 2005
Posts: 128
FrozenKnight 25 Jan 2007, 23:38
to push a 0 byte
Code:
mov   BYTE [esp-1], 0 ;used esp-1 to avoid address interlock
dec   esp, 1    


to zero a register and push a byte (this code may be slow because it has an address interlock after esp is adjusted) so it will take a minimum of 2 cycles
Code:
sub   al, al
dec   esp, 1
mov  BYTE [esp], al    


i haven't speed tested this method but is should be faster than the previous code (on some processors it may be as fast as only one cycle)
Code:
mov   BYTE [esp-1], 0
dec   esp
sub   al, al    


to pop a byte
Code:
mov   al, BYTE [esp]
inc   esp    



while push/pop don't do the job you can emulate the behavior
note it's to use these after you do any preceding push/pops.
also note because of the stack adjustment if you do this just before a call or ret then you may loose two cycles do to address misalignment and an address interlock.

also to force a 16 bit push/pop use pushw/popw and for 32 bit use pushd/popd
Post 25 Jan 2007, 23:38
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 26 Jan 2007, 05:13
@FrozenKnight: that's exactly what I meant earlier with push/pop of 1 byte Smile They may work on you own OS or in DOS, but very probably not in most modern OSes (Stack misaligned fault or something), even if you but 2 1-byte push immediately one after another. Performence conciderations left aside.
Post 26 Jan 2007, 05:13
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1905
DOS386 26 Jan 2007, 11:27
Thanks. PUSHing clarified.

Quote:
loose two cycles do to address misalignment and an address interlock.


But what is this "address interlock" ? You mentioned this serious (?)
problem in your "Mersenne" thread Confused

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 26 Jan 2007, 11:27
View user's profile Send private message Reply with quote
FrozenKnight



Joined: 24 Jun 2005
Posts: 128
FrozenKnight 30 Jan 2007, 10:14
MCD i've tested code similar to that in Windows Xp (by pushing entire strings onto the stack. without aligning.) the only time you run into problems is if you dont keep track of how many bytes you pushed and make sure to pop them off correctly. however you can loose performance from using such methods. because of buffers on modern processors if you were to mis align the stack then the processor has to waste an extra cycle just to cache the rest of any addresses that are misaligned. (so it's not always a good idea to do this) also it misaligned data in ollydbg making debugging much harder on me.

as for the question about address interlocks.
an address interlock is where the processor has to wait a cycle because you just modified data in a register that you are about to use. so basically
Code:
mov   eax, [edx]
add   eax, 4
inc   ecx    
is slower than
Code:
mov   eax, [edx]
inc   ecx
add   eax, 4    


also another optimization note it's usually faster to use math operations over bit operations. so
Code:
sub   eax, eax    
is usually better than
Code:
xor   eax, eax    
this is because of the way pipes work on pentum and up processors where maht operation can sometimes be executed in a second pipe at the same time as other instructions.
Post 30 Jan 2007, 10:14
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 31 Jan 2007, 13:48
http://board.flatassembler.net/topic.php?t=4485&start=40

revolution wrote:

Also XOR is preferred over SUB because of the special hardware support within the cpu to break dependency chains, SUB reg,reg will wait for any previous results to be written before doing the SUB but XOR reg,reg is smart and won't wait.
Post 31 Jan 2007, 13:48
View user's profile Send private message Visit poster's website Reply with quote
FrozenKnight



Joined: 24 Jun 2005
Posts: 128
FrozenKnight 31 Jan 2007, 18:34
yes, but xor is a bit manipulation instruction so it can only run in the main pipe. sub can (if placed correctly) be called for free. so i guess both have advantages and disadvantages.
Post 31 Jan 2007, 18:34
View user's profile Send private message Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan 31 Jan 2007, 19:27
[Intel® 64 and IA-32 Architectures Optimization Reference Manual]-248966.pdf
Quote:
The Pentium 4 processor provides special support for XOR, SUB, and PXOR opera-
tions when executed within the same register. This recognizes that clearing a register
does not depend on the old value of the register. The XORPS and XORPD instructions
do not have this special support. They cannot be used to break dependence chains.
In Intel Core Solo and Intel Core Duo processors, the XOR, SUB, XORPS, or PXOR
instructions can be used to clear execution dependencies on the zero evaluation of
the destination register.

_________________
Any offers?
Post 31 Jan 2007, 19:27
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.