flat assembler
Message board for the users of flat assembler.
Index
> Windows > PROC64.INC Goto page 1, 2 Next |
Author |
|
Tomasz Grysztar 01 Oct 2005, 14:39
The attachment contains the very first version of PROC64.INC for the Win64 development. It allows to call and define procedures using the Win64 fastcall convention. Because of the complexities of efficient usage of fastcall convention, I would recommend to go back to the "raw" style of assembly language, as it allows to optimize the calls much better and actually may get you a clearer code. If you however prefer to use "invoke" and "proc", read the following information very carefully.
The "invoke" defined by this package uses the "fastcall" macro to call the procedures. The "fastcall" moves first four parameters simply into registers, and fill the rest of parameters into stack frame using the "mov" instructions - it is the responsibility of programmer to make such stack frame of enough size available to the "fastcall". However if you define procedure using the "proc" macro, it automatically detects how larger stack frame the calls inside it require at most and allocates this space at startup. Look at this example: Code: proc sample a,b,c fastcall foo,1,2,3,4,5 ret endp it generates the following code: Code: sample: push rbp mov rbp,rsp sub rsp,28h mov rcx,1 mov rdx,2 mov r8,3 mov r9,4 mov [rsp+20h],5 call foo leave retn you can see that the frame for all the parameters for call is reserved with "sub rsp,28h" instruction (fastcall requires you to reserve space on stack even for those parameters that are passed in registers, in case the called procedure wants to spill them). This means however, that if you do any pushes on the stack, you need to pop those value off the stack before any "fastcall" (or "invoke"), otherwise they may get destroyed. The other solution is to manually allocate the stack frame for fastcall, like: Code: sub rsp,28h fastcall foo,1,2,3,4,5 add rsp,28h I think I may add some option for the "fastcall" to do it itself for every call. But doing single stack allocation for all the calls produces much smaller code, that's why I've chosen this way. One more example to see how this mechanism cooperates with other "proc" features: Code: proc sample uses rbx, a,b,c local d:BYTE fastcall foo,1,2,3,4,5 fastcall bar,[d] ret endp It generates this code: Code: sample: push rbp mov rbp,rsp sub rsp,8 ; allocate local variables push rbx sub rsp,28h ; allocate frame for calls mov rcx,1 mov rdx,2 mov r8,3 mov r9,4 mov [rsp+20],5 call foo mov cl,[rbp-08] call bar add rsp,28h ; go back to where we have stored the used registers pop rbx leave retn Note that "fastcall" tries to automaticlly detect the size of operands to choose the optimal "mov" instruction form to pass the parameter, you can also use the size override to force the size of "mov" you want. If the parameter is in the memory and needs to be passed on stack, the two "mov" instructions are generated, with the accumulator register as a intermediary in transfer. The names for parameters specified in the "proc" definition declare labels on the stack, even for those parameters that are passed in registers - since the fastcall convention requires caller to reserve the space on stack for those parameters, too. You may use it to store any of the first four parameters into the stack space reserved for it, and then access it there, like: Code: proc sample a,b,c,d,e mov [a],rcx ; store the parameters into their slots mov [b],rdx mov [c],r8 mov [d],r9 ; ... ret endp There is no support for passing the floating point parameters in the SSE registers yet, some special prefix for parameter might need to be introduced for this.
|
|||||||||||
01 Oct 2005, 14:39 |
|
Tomasz Grysztar 01 Oct 2005, 14:58
Note I wasn't yet able to test them at run-time, because I don't have any 64-bit machine.
|
|||
01 Oct 2005, 14:58 |
|
MazeGen 12 Oct 2005, 07:54
Your post is very interesting, because it clearly shows how FASTCALL works under win64. Thanks!
|
|||
12 Oct 2005, 07:54 |
|
revolution 13 Oct 2005, 07:29
Quote: if you do any pushes on the stack, you need to pop those value off the stack before any "fastcall" (or "invoke"), otherwise they may get destroyed |
|||
13 Oct 2005, 07:29 |
|
Tomasz Grysztar 13 Oct 2005, 18:10
I think you're right - when someone wants true optimization, should write the code without macros anyway, and the macros should be as universal as possible and not too much confusing. Here's the new version of PROC64 that allocates stack space for each call separately and thus is more similar to the PROC32 in both usage and implementation.
Also this new file will appear with IMPORT32.INC in the next fasmw update.
|
|||||||||||
13 Oct 2005, 18:10 |
|
Tomasz Grysztar 12 Jul 2009, 11:12
I've made some new additions to PROC64.INC to help optimize stack frames a little. It introduces a new "frame" macro, which can be used like this:
Code: frame invoke func1,[arg1],[arg2] invoke func2,[arg1],[arg2],[arg3] endf And it causes the stack space to be allocated only once for the whole frame (of course, if you don't use "frame" macro, everything works as usual, with stack allocation for each call separately). You should not modify stack frame anywhere within "frame" macro definition. Well, you can modify it, but as long as you restore it before the next fastcall/invoke or endf macro. The macros are now also able to merge consecutive "sub rsp,x" and "add rsp,x" generated by them. So if you put "frame" macro just at the beginning of procedure, the allocation of stack space for function calls may get merged with the allocation of space for local variables. This also means that even when you don't use "frame" macro, your code may get a little bit optimized with this new PROC64.INC. Please test it and let me know what do you think about it. Attachment deleted - the new version is posted below. Last edited by Tomasz Grysztar on 16 Jul 2009, 19:45; edited 1 time in total |
|||
12 Jul 2009, 11:12 |
|
Japheth 16 Jul 2009, 19:24
Tomasz Grysztar wrote:
Perhaps I made a very stupid mistake, but after I replaced PROC64.INC with the new version and assembled TEMPLATE.ASM with FASM v1.68, the disassembly looks like this: Code: 0000000000401000: 48 83 EC 08 sub rsp,8 0000000000401004: 83 EC 10 sub esp,10h 0000000000401007: 48 C7 C1 00 00 00 00 mov rcx,0 000000000040100E: FF 15 62 20 00 00 call qword ptr ds:[00403076h] 0000000000401014: 83 C4 10 add esp,10h 0000000000401017: 48 89 05 2B 10 00 00 mov qword ptr ds:[00402049h],rax 000000000040101E: 83 EC 10 sub esp,10h 0000000000401021: 48 C7 C1 00 00 00 00 mov rcx,0 0000000000401028: 48 C7 C2 00 7F 00 00 mov rdx,7F00h 000000000040102F: FF 15 F3 20 00 00 call qword ptr ds:[00403128h] 0000000000401035: 83 C4 10 add esp,10h What strikes me is that ESP is used now, while previously it was RSP. Is this intentionally, to make the code shorter? Another thing which I found confusing is the "sub esp,10h" before the function call. In this reference http://msdn.microsoft.com/en-gb/library/ms235286(VS.100).aspx I found this sentence Quote:
which would mean that rsp/esp has to be decreased/increased by 20h at least. |
|||
16 Jul 2009, 19:24 |
|
Tomasz Grysztar 16 Jul 2009, 19:44
Yes, these are bugs in macros - thanks for testing.
Please try the fixed version.
|
|||||||||||
16 Jul 2009, 19:44 |
|
r22 16 Jul 2009, 20:34
Great compromise/solution to the stack optimization annoyance of Win64.
I think you should just make a new macro ~ProcX that will have Frame built in and possibly other optimizations like minimalist header (arguments referenced with rsp instead of rbp), 'align 16' (possibly on all label not just the procedure itself), etc. Appreciate your ongoing efforts Tomasz |
|||
16 Jul 2009, 20:34 |
|
Japheth 17 Jul 2009, 15:19
Tomasz Grysztar wrote: Yes, these are bugs in macros - thanks for testing. Yes, the "peculiarities" which were mentioned are gone! Thanks for the fast "fix"! |
|||
17 Jul 2009, 15:19 |
|
Tomasz Grysztar 17 Jul 2009, 16:32
r22 wrote: I think you should just make a new macro ~ProcX that will have Frame built in and possibly other optimizations like minimalist header (arguments referenced with rsp instead of rbp), 'align 16' (possibly on all label not just the procedure itself), etc. Yes, I'm also thinking about such RSP-only variant of "proc". If I find some time to play with this idea, I may make such a set of macros - but I don't make any promises. |
|||
17 Jul 2009, 16:32 |
|
ramguru 17 Jul 2009, 17:00
IMO P64DEMO is the only good example on how stack should be allocated in 64bit mode. poasm has special directive PARMAREA & that's how it should be done... So we need 'proc' macro that only does 'sub rsp, X' where [X = size of local variables + size of the largest parameter area passed to any inner procedure] & 'add rsp, X'. Also we need 'invoke' macro that only passes variables (that doesn't allocate stack). So far I'm not very good at macros .. but I can't expect that someone will write it for me :}, so moving slowly towards my goal.
|
|||
17 Jul 2009, 17:00 |
|
Tomasz Grysztar 17 Jul 2009, 17:36
Just use "frame" macro with "proc" and you get the same result. The generic/standard "proc" should not do it by default, however, since doing any stack operation would break it. By choosing to use "frame" you acknowledge, that you're aware of that you cannot modify the stack between the calls.
|
|||
17 Jul 2009, 17:36 |
|
ramguru 17 Jul 2009, 18:01
thank you :}
I didn't know frame was so powerful & that it affects even 'invoke'. Was about to rewrite these macros, .. so much possible knowledge gain about asm parser went wasted :} |
|||
17 Jul 2009, 18:01 |
|
ramguru 17 Jul 2009, 18:44
One more thing :}
Everything looks beautiful etc., but how do I get rid of these three lines ? : Code:
push rbp
mov rbp,rsp
leave
'cuz we only need Code: proc .. sub rsp,.. .. add rsp,.. ret endp |
|||
17 Jul 2009, 18:44 |
|
Tomasz Grysztar 17 Jul 2009, 19:29
That's what we discuss above, as a separate "proc" variant to use RSP only. It does have some certain advantages (that's why I consider to create it as an alternative), but don't dismiss the RBP-addressed variant too soon - you should remeber that addressing with ESP as base takes more space in instruction code. Thus when referencing your arguments and locals many times, the RBP frame may pay off.
|
|||
17 Jul 2009, 19:29 |
|
ramguru 18 Jul 2009, 13:15
One more thing :}
Everything looks beautiful etc., [MY PROLOGUE :>] ..but it would be veeeryyy useful to have addr64 support somehow, because often we need isolated data (from threads, especially for custom controls - thunking) & stack suits well. As I can see fasm ideology is ALWAYS use global data :S. I tried macro inside macro method but it seams addr cannot be implemented that way :/ So there is another option 'addr64.something' which could be a flag to load a structure with lea instead of mov'ing.. Also when having 5 & more parameters I must somehow figure out rsp (or rbp) value what makes things really difficult... With new proc coming and such addr64.something there would be nothing that stops 64bit programming :} |
|||
18 Jul 2009, 13:15 |
|
Tomasz Grysztar 18 Jul 2009, 13:28
What do you mean by addr64?
ramguru wrote: As I can see fasm ideology is ALWAYS use global data :S. In the one of my recent private projects I used it to define RBX-based local data (with help of VirtualAlloc, of course) - the possibilities are almost endless. |
|||
18 Jul 2009, 13:28 |
|
ramguru 18 Jul 2009, 13:37
By addr64 I mean
Code: proc WinMain.. local wc:WNDCLASSEX ... invoke RegisterClassEx, addr64.wc ; or addr64 wc or addr wc ; now I have to use : lea rcx, [wc] invoke RegisterClassEx, rcx endp What if it would be 5-th parameter.. I would have to mess even with rsp :S Last edited by ramguru on 18 Jul 2009, 13:38; edited 1 time in total |
|||
18 Jul 2009, 13:37 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.