flat assembler
Message board for the users of flat assembler.

flat assembler > Windows > 64-bit call stack question

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
dv4-fa



Joined: 08 Feb 2013
Posts: 14
I'm doing some research on 64 bit Fasm apps. I've read an article about calling conventions. rcx = 1st parameter, rdx = 2nd parameter ... r9 = 4th parameter. But what if you need to call a function that requires 12 parameters? The new 8 registers means it would only go up to 10 parameters Question

Embarassed Sorry for me bad english
Post 18 Feb 2013, 02:13
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1181
Location: Unknown
Stupid post removed.


Last edited by HaHaAnonymous on 28 Feb 2015, 21:25; edited 1 time in total
Post 18 Feb 2013, 03:38
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16782
Location: In your JS exploiting you and your system
dv4-fa: Windows uses the fastcall convention. It is a little bit more complex than it might first appear. The first four parameters are in registers and the remaining parameters are on the stack, but there is a complication in that there is also a shadow space on the stack to make space for the first four parameters. Anyhow, once you know it is called fastcall then google can help you for all the details about it.
Post 18 Feb 2013, 03:48
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
dv4-fa,

Agner Fog maintains quite thorough manual about various calling conventions (navigate to "Software optimization resources").
Post 18 Feb 2013, 06:10
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 447
Location: Czech republic, Slovak republic
1st param rcx (or ecx if it is dword, or cl if it is boolean etc)
2nd param rdx
3rd param r8
4rd param r9
5th param [rsp+8*4] (put there a pointer, qword, dword, byte - depends on param size)
6th param [rsp+8*5]
...
Post 18 Feb 2013, 08:02
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
dv4-fa



Joined: 08 Feb 2013
Posts: 14
Thanks, I'll check it out. Smile
Post 18 Feb 2013, 11:09
View user's profile Send private message Reply with quote
Spool



Joined: 08 Jan 2013
Posts: 154
[]


Last edited by Spool on 17 Mar 2013, 10:36; edited 1 time in total
Post 18 Feb 2013, 12:23
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16782
Location: In your JS exploiting you and your system
Spool wrote:
is windows 64 calling convention is much faster than windows 32 ?
For what? It depends upon what you are doing.
Post 18 Feb 2013, 12:35
View user's profile Send private message Visit poster's website Reply with quote
Spool



Joined: 08 Jan 2013
Posts: 154
[]


Last edited by Spool on 17 Mar 2013, 10:36; edited 1 time in total
Post 18 Feb 2013, 13:11
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1181
Location: Unknown
Stupid post removed.


Last edited by HaHaAnonymous on 28 Feb 2015, 21:25; edited 1 time in total
Post 18 Feb 2013, 17:03
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16782
Location: In your JS exploiting you and your system
Spool wrote:
I meant register over stack based parameter?
Maybe. And maybe not. It depends upon what you are doing and how you test it. You might find one faster, or the other faster, or no difference.
Post 18 Feb 2013, 17:58
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
64-bit Windows uses the stack space and registers. The shadow space needs to be reserved even if unused. Interfacing the API is rarely a bottleneck to performance -- the meat of a program is done elsewhere. Register contention requires temp storage often enough in general. Leaf routines can use whatever convention favors performance.
Post 26 Feb 2013, 08:11
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1394
Location: Toronto, Canada
I'm starting the 64-bit code generator and some clarification would be nice.
At the moment of a call to Win64 API the following must be true:

1. stack space is aligned by 16 bytes
2. shadow space of 32 bytes is provided
3. first four parameters go into RCX, RDX, R8, R9
4. the rest of parameters go into slots on stack following shadow space:

[esp+20h], [esp+28h] and so on

If that is all correct, then any of my private functions would look like this:
Code:
align 16
ClassA_Method1:
        ;
        ; Each time any CALL happens - the stack is misaligned by 8,
        ; because only the RET address is pushed and CPU jumps to
        ; function entry point, so pushing one register will align it back to 16.
        ; I am assuming here, that at the moment of the call - stack is at 16.
        ; RBP is pushed, because it may hold the caller's local variables.
        ;
        push    rbp
        ;
        ; Optionally here the local variables area will be set as
        ; needed. Local vars room will be aligned on 16 and [RBP+<OFS>] will be used
        ; to access the variables.
        ;
        sub     rsp, <ROOM4LOCALS>
        mov     rbp, rsp
        ;
        ; Provide stack space for 8 additional parameters for APIs
        ;
        sub     rsp, 40h
        ;
        ; Set some register (RDI in this case) to additional parameters area, because
        ; RSP may be used to manipulate stack further in code
        ;
        mov     rdi, rsp
        ;
        ; Provide shadow space
        ;
        sub     rsp, 20h
...
        ;
        ; From this point: any # of API calls can be done setting RCX, RDX, R8, R9 and
        ; MOV-ing extra parameters into address at RDI (in this case)
        ;
...
        ;
        ; Cleanup and return
        ;
        add     rsp, 60h + <ROOM4LOCALS>
        pop     rbp
        ret
    

For my leaf functions I do not use stdcall -- since x64 has a few extra registers -- I pass
all parameters in the registers and RBX as 'this' pointer.

Now, please correct this code sample Smile (or example) if anything is wrong.
What about the callbacks for Win64? How would the WNDPROC look?
What parameters must be preserved? Same as in x32: RBX, RSI, RDI, RBP?
Or is there any changes for x64?

Would that be a proper WNDPROC?
Code:
align 16
TWindow_WndProc:
        push    rbp             ; stack is at 16
        mov     rbp, rsp
        push    rbx rsi rdi rdi ; stack is again at 16 (RDI pushed twice)
...
        ;
        ; Now the following is true:
        ;
        ; [rbp + 8*2] --> HWND
        ; [rbp + 8*3] --> Message
        ; [rbp + 8*4] --> WPARAM
        ; [rbp + 8*5] --> LPARAM
        ;
        ; Shadow room and additional parameters
        ;
        sub     rsp, 60h
        lea     rdi, [rsp + 20h]
...
        ;
        ; Call APIs as needed ...
        ;
...
        ;
        ; Cleanup and return
        ;
        add     rsp, 60h
        pop     rdi rdi rsi rbx rbp
        ret     8*4
    
Post 26 Apr 2013, 13:50
View user's profile Send private message Send e-mail Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
Your understanding is good...except small changes below:
Code:
align 16
TWindow_WndProc: 
        push    rbp             ; stack is at 16 
        mov     rbp, rsp 
        push    rbx rsi rdi rdi ; stack is again at 16 (RDI pushed twice) 
... 
        ; 
        ; Now the following is true: 
        ; 
        ; rcx --> HWND
        ; rdx --> Message
        ; r8  --> WPARAM
        ; r9  --> LPARAM
        ; 
        ; Shadow room and additional parameters 
        ; 
        sub     rsp, 60h 
        lea     rdi, [rsp + 20h] 
... 
        ; 
        ; Call APIs as needed ... 
        ; 
... 
        ; 
        ; Cleanup and return 
        ; 
        add     rsp, 60h 
        pop     rdi rdi rsi rbx rbp 
        retn    
...Windows follows the same rules, so at WndProc parameters are in registers, and windows has setup shadow space for you (32-bytes after return address, and then remaining parameters (rarely)).
Code:
WndProc:
  virtual at rsp
    .return_address rq 1
    .shadow_space rq 4
    .Parm4 rq 1 ; maybe?
    .
    .
  end virtual    

_________________
¯\(°_o)/¯ unlicense.org
Post 26 Apr 2013, 14:05
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1394
Location: Toronto, Canada
Oh.. I see... good info.
Thanks.
Post 26 Apr 2013, 16:54
View user's profile Send private message Send e-mail Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
A question that still remains on my mind: can windows change the value of parameters on the stack. The shadow space is clearly used freely to hold anything needed, but I have yet to see a case where values greater than RSP+40 are changed. Nor has it been clearly documented. Of course, I assume they are safe and caller owns those values.

So, if I wish to create a bunch of windows with the same parameters, it is a small loop updating the needed changes.

MSDN Reference, just for completeness:
http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx

_________________
¯\(°_o)/¯ unlicense.org
Post 26 Apr 2013, 18:19
View user's profile Send private message Visit poster's website Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1181
Location: Unknown
Stupid post removed.


Last edited by HaHaAnonymous on 28 Feb 2015, 20:56; edited 1 time in total
Post 26 Apr 2013, 19:31
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
Maybe that should have been worded differently. Within the Windows 64-bit ABI, can windows change parameters on the stack above the shadow space. Without knowing the correct convention one must fall-back on historical context of similar nature. Hence my sense that that space belongs to the caller. Giving a strange dichotomy of parameter space which doesn't have the parameters in it, lol. and conventional parameter space. An odd thing indeed.

_________________
¯\(°_o)/¯ unlicense.org
Post 26 Apr 2013, 20:45
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 953
Location: Czechoslovakia
IMO, as long as it is not explicitly documented otherwise, I assume that the fastcall convention doesn't preserve the parameters above the spill space, having in mind the fact the stack close to rSP is likely to be cached and therefore it is a great place to store other variables, rearranged by the optimizer.
Post 27 Apr 2013, 07:48
View user's profile Send private message Visit poster's website Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 688
Location: Adelaide
I'm sure that I saw (through windbg) an api call that went something like
Code:
xchg rbx,[rsp+x]
;work on param rbx
mov rbx,[rsp+x]
    

I am pretty sure that x was above the spill space.
Post 27 Apr 2013, 08:11
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.