flat assembler
Message board for the users of flat assembler.

Index > Windows > PROC64.INC

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 01 Oct 2005, 14:39
The attachment contains the very first version of PROC64.INC for the Win64 development. It allows to call and define procedures using the Win64 fastcall convention. Because of the complexities of efficient usage of fastcall convention, I would recommend to go back to the "raw" style of assembly language, as it allows to optimize the calls much better and actually may get you a clearer code. If you however prefer to use "invoke" and "proc", read the following information very carefully.

The "invoke" defined by this package uses the "fastcall" macro to call the procedures. The "fastcall" moves first four parameters simply into registers, and fill the rest of parameters into stack frame using the "mov" instructions - it is the responsibility of programmer to make such stack frame of enough size available to the "fastcall".

However if you define procedure using the "proc" macro, it automatically detects how larger stack frame the calls inside it require at most and allocates this space at startup. Look at this example:
Code:
proc sample a,b,c

        fastcall foo,1,2,3,4,5

        ret

endp    

it generates the following code:
Code:
sample:
  push      rbp
  mov       rbp,rsp
  sub       rsp,28h
  mov       rcx,1
  mov       rdx,2
  mov       r8,3
  mov       r9,4
  mov       [rsp+20h],5
  call      foo
  leave
  retn    

you can see that the frame for all the parameters for call is reserved with "sub rsp,28h" instruction (fastcall requires you to reserve space on stack even for those parameters that are passed in registers, in case the called procedure wants to spill them). This means however, that if you do any pushes on the stack, you need to pop those value off the stack before any "fastcall" (or "invoke"), otherwise they may get destroyed. The other solution is to manually allocate the stack frame for fastcall, like:
Code:
 sub rsp,28h
 fastcall foo,1,2,3,4,5
 add rsp,28h    

I think I may add some option for the "fastcall" to do it itself for every call. But doing single stack allocation for all the calls produces much smaller code, that's why I've chosen this way.

One more example to see how this mechanism cooperates with other "proc" features:
Code:
proc sample uses rbx, a,b,c

        local d:BYTE

        fastcall foo,1,2,3,4,5
        fastcall bar,[d]

        ret

endp    

It generates this code:
Code:
sample:
  push      rbp
  mov       rbp,rsp
  sub       rsp,8 ; allocate local variables
  push      rbx
  sub       rsp,28h ; allocate frame for calls
  mov       rcx,1
  mov       rdx,2
  mov       r8,3
  mov       r9,4
  mov       [rsp+20],5
  call      foo
  mov       cl,[rbp-08]
  call      bar
  add       rsp,28h ; go back to where we have stored the used registers
  pop       rbx
  leave
  retn    
If there was no "used" list in the "proc" declaration, one common "sub rsp,x" is used instead of the two.

Note that "fastcall" tries to automaticlly detect the size of operands to choose the optimal "mov" instruction form to pass the parameter, you can also use the size override to force the size of "mov" you want. If the parameter is in the memory and needs to be passed on stack, the two "mov" instructions are generated, with the accumulator register as a intermediary in transfer.

The names for parameters specified in the "proc" definition declare labels on the stack, even for those parameters that are passed in registers - since the fastcall convention requires caller to reserve the space on stack for those parameters, too. You may use it to store any of the first four parameters into the stack space reserved for it, and then access it there, like:
Code:
proc sample a,b,c,d,e

        mov     [a],rcx   ; store the parameters into their slots
        mov     [b],rdx
        mov     [c],r8
        mov     [d],r9
        ; ...

        ret

endp    


There is no support for passing the floating point parameters in the SSE registers yet, some special prefix for parameter might need to be introduced for this.


Description: Procedure declaration/calling macros for Win64.
Download
Filename: PROC64.INC
Filesize: 9.92 KB
Downloaded: 721 Time(s)

Post 01 Oct 2005, 14:39
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 01 Oct 2005, 14:58
Note I wasn't yet able to test them at run-time, because I don't have any 64-bit machine.
Post 01 Oct 2005, 14:58
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 12 Oct 2005, 07:54
Your post is very interesting, because it clearly shows how FASTCALL works under win64. Thanks!
Post 12 Oct 2005, 07:54
View user's profile Send private message Visit poster's website Reply with quote
Eoin



Joined: 16 Jun 2003
Posts: 68
Location: Ireland
Eoin 12 Oct 2005, 14:01
Well this small modification I made to PE64DEMO sems to work
Code:
start:

pushq _message
pushq _caption
call testProc

proc testProc,arg1,arg2 
        invoke MessageBox,0,[arg1],[arg2],0     
        
        invoke ExitProcess,eax
ret
endp    


If there is anything more complicated you'd like me to test just ask.
Post 12 Oct 2005, 14:01
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20458
Location: In your JS exploiting you and your system
revolution 13 Oct 2005, 07:29
Quote:
if you do any pushes on the stack, you need to pop those value off the stack before any "fastcall" (or "invoke"), otherwise they may get destroyed
That may make converting code to 64bit troublesome. I would prefer to do the stack allocation and restoration at each OS call. The main reason being to avoid headaches with debugging. I think any drop in performance would be quite minimal unless the software is making heavy use of the OS calls. I find for most processor intensive applications it is not the OS call overhead that uses the CPU time, but the data processing that uses the majority of CPU time.
Post 13 Oct 2005, 07:29
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 13 Oct 2005, 18:10
I think you're right - when someone wants true optimization, should write the code without macros anyway, and the macros should be as universal as possible and not too much confusing. Here's the new version of PROC64 that allocates stack space for each call separately and thus is more similar to the PROC32 in both usage and implementation.

Also this new file will appear with IMPORT32.INC in the next fasmw update.


Description: The new procedure declaration/calling macros for Win64.
Download
Filename: PROC64.INC
Filesize: 9.23 KB
Downloaded: 655 Time(s)

Post 13 Oct 2005, 18:10
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 12 Jul 2009, 11:12
I've made some new additions to PROC64.INC to help optimize stack frames a little. It introduces a new "frame" macro, which can be used like this:
Code:
frame
       invoke func1,[arg1],[arg2]
       invoke func2,[arg1],[arg2],[arg3]
endf    

And it causes the stack space to be allocated only once for the whole frame (of course, if you don't use "frame" macro, everything works as usual, with stack allocation for each call separately).
You should not modify stack frame anywhere within "frame" macro definition. Well, you can modify it, but as long as you restore it before the next fastcall/invoke or endf macro.

The macros are now also able to merge consecutive "sub rsp,x" and "add rsp,x" generated by them. So if you put "frame" macro just at the beginning of procedure, the allocation of stack space for function calls may get merged with the allocation of space for local variables.
This also means that even when you don't use "frame" macro, your code may get a little bit optimized with this new PROC64.INC.

Please test it and let me know what do you think about it.

Attachment deleted - the new version is posted below.


Last edited by Tomasz Grysztar on 16 Jul 2009, 19:45; edited 1 time in total
Post 12 Jul 2009, 11:12
View user's profile Send private message Visit poster's website Reply with quote
Japheth



Joined: 26 Oct 2004
Posts: 151
Japheth 16 Jul 2009, 19:24
Tomasz Grysztar wrote:

Please test it and let me know what do you think about it.


Perhaps I made a very stupid mistake, but after I replaced PROC64.INC with the new version and assembled TEMPLATE.ASM with FASM v1.68, the disassembly looks like this:

Code:
  0000000000401000: 48 83 EC 08                                  sub         rsp,8
  0000000000401004: 83 EC 10                                     sub         esp,10h
  0000000000401007: 48 C7 C1 00 00 00 00                         mov         rcx,0
  000000000040100E: FF 15 62 20 00 00                            call        qword ptr ds:[00403076h]
  0000000000401014: 83 C4 10                                     add         esp,10h
  0000000000401017: 48 89 05 2B 10 00 00                         mov         qword ptr ds:[00402049h],rax
  000000000040101E: 83 EC 10                                     sub         esp,10h
  0000000000401021: 48 C7 C1 00 00 00 00                         mov         rcx,0
  0000000000401028: 48 C7 C2 00 7F 00 00                         mov         rdx,7F00h
  000000000040102F: FF 15 F3 20 00 00                            call        qword ptr ds:[00403128h]
  0000000000401035: 83 C4 10                                     add         esp,10h
    


What strikes me is that ESP is used now, while previously it was RSP. Is this intentionally, to make the code shorter?

Another thing which I found confusing is the "sub esp,10h" before the function call. In this reference


http://msdn.microsoft.com/en-gb/library/ms235286(VS.100).aspx


I found this sentence

Quote:

The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space for the 4 register parameters, even if the callee doesn’t have that many parameters.


which would mean that rsp/esp has to be decreased/increased by 20h at least.
Post 16 Jul 2009, 19:24
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 16 Jul 2009, 19:44
Yes, these are bugs in macros - thanks for testing.

Please try the fixed version.


Description: The improved procedure declaration/calling macros for Win64.
Download
Filename: PROC64.INC
Filesize: 11.17 KB
Downloaded: 639 Time(s)

Post 16 Jul 2009, 19:44
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 16 Jul 2009, 20:34
Great compromise/solution to the stack optimization annoyance of Win64.

I think you should just make a new macro ~ProcX that will have Frame built in and possibly other optimizations like minimalist header (arguments referenced with rsp instead of rbp), 'align 16' (possibly on all label not just the procedure itself), etc.

Appreciate your ongoing efforts Tomasz
Post 16 Jul 2009, 20:34
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Japheth



Joined: 26 Oct 2004
Posts: 151
Japheth 17 Jul 2009, 15:19
Tomasz Grysztar wrote:
Yes, these are bugs in macros - thanks for testing.

Please try the fixed version.


Yes, the "peculiarities" which were mentioned are gone! Thanks for the fast "fix"!
Post 17 Jul 2009, 15:19
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 17 Jul 2009, 16:32
r22 wrote:
I think you should just make a new macro ~ProcX that will have Frame built in and possibly other optimizations like minimalist header (arguments referenced with rsp instead of rbp), 'align 16' (possibly on all label not just the procedure itself), etc.

Yes, I'm also thinking about such RSP-only variant of "proc". If I find some time to play with this idea, I may make such a set of macros - but I don't make any promises.
Post 17 Jul 2009, 16:32
View user's profile Send private message Visit poster's website Reply with quote
ramguru



Joined: 26 Feb 2005
Posts: 19
Location: who cares...
ramguru 17 Jul 2009, 17:00
IMO P64DEMO is the only good example on how stack should be allocated in 64bit mode. poasm has special directive PARMAREA & that's how it should be done... So we need 'proc' macro that only does 'sub rsp, X' where [X = size of local variables + size of the largest parameter area passed to any inner procedure] & 'add rsp, X'. Also we need 'invoke' macro that only passes variables (that doesn't allocate stack). So far I'm not very good at macros .. but I can't expect that someone will write it for me :}, so moving slowly towards my goal.
Post 17 Jul 2009, 17:00
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 17 Jul 2009, 17:36
Just use "frame" macro with "proc" and you get the same result. The generic/standard "proc" should not do it by default, however, since doing any stack operation would break it. By choosing to use "frame" you acknowledge, that you're aware of that you cannot modify the stack between the calls.
Post 17 Jul 2009, 17:36
View user's profile Send private message Visit poster's website Reply with quote
ramguru



Joined: 26 Feb 2005
Posts: 19
Location: who cares...
ramguru 17 Jul 2009, 18:01
thank you :}
I didn't know frame was so powerful & that it affects even 'invoke'.
Was about to rewrite these macros, .. so much possible knowledge gain about asm parser went wasted :}
Post 17 Jul 2009, 18:01
View user's profile Send private message Reply with quote
ramguru



Joined: 26 Feb 2005
Posts: 19
Location: who cares...
ramguru 17 Jul 2009, 18:44
One more thing :}
Everything looks beautiful etc., but how do I get rid of these three lines ?
:
Code:
  push      rbp
  mov       rbp,rsp
  leave
    

'cuz we only need
Code:
proc ..
  sub rsp,..
  ..
  add rsp,..
  ret
endp
    
Post 17 Jul 2009, 18:44
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 17 Jul 2009, 19:29
That's what we discuss above, as a separate "proc" variant to use RSP only. It does have some certain advantages (that's why I consider to create it as an alternative), but don't dismiss the RBP-addressed variant too soon - you should remeber that addressing with ESP as base takes more space in instruction code. Thus when referencing your arguments and locals many times, the RBP frame may pay off.
Post 17 Jul 2009, 19:29
View user's profile Send private message Visit poster's website Reply with quote
ramguru



Joined: 26 Feb 2005
Posts: 19
Location: who cares...
ramguru 18 Jul 2009, 13:15
One more thing :}
Everything looks beautiful etc., [MY PROLOGUE :>]
..but it would be veeeryyy useful to have addr64 support somehow, because often we need isolated data (from threads, especially for custom controls - thunking) & stack suits well. As I can see fasm ideology is ALWAYS use global data :S. I tried macro inside macro method but it seams addr cannot be implemented that way :/ So there is another option 'addr64.something' which could be a flag to load a structure with lea instead of mov'ing.. Also when having 5 & more parameters I must somehow figure out rsp (or rbp) value what makes things really difficult... With new proc coming and such addr64.something there would be nothing that stops 64bit programming :}
Post 18 Jul 2009, 13:15
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 18 Jul 2009, 13:28
What do you mean by addr64?

ramguru wrote:
As I can see fasm ideology is ALWAYS use global data :S.
No, it's not. Smile And "virtual" is all you need, really.
In the one of my recent private projects I used it to define RBX-based local data (with help of VirtualAlloc, of course) - the possibilities are almost endless. Wink
Post 18 Jul 2009, 13:28
View user's profile Send private message Visit poster's website Reply with quote
ramguru



Joined: 26 Feb 2005
Posts: 19
Location: who cares...
ramguru 18 Jul 2009, 13:37
By addr64 I mean
Code:
proc WinMain..
    local wc:WNDCLASSEX
    ...
    invoke RegisterClassEx, addr64.wc ; or addr64 wc or addr wc
    ; now I have to use :
    lea    rcx, [wc]
    invoke RegisterClassEx, rcx
endp
    

What if it would be 5-th parameter.. I would have to mess even with rsp :S


Last edited by ramguru on 18 Jul 2009, 13:38; edited 1 time in total
Post 18 Jul 2009, 13:37
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.