flat assembler
Message board for the users of flat assembler.
Index
> Windows > "Red zone" in Windows? |
Author |
|
tripledot 20 Apr 2012, 16:35
Or even this:
Code: foo: movq mm7, rsp and rsp, -32 lea rbp, [rsp-0x80] movq rsp, mm7 ret Now we can access 256 bytes of local storage using rbp (saving a byte), we avoid writes to memory, and we can still use rsp as a GPR. Is this safe, or am I being an idiot? |
|||
20 Apr 2012, 16:35 |
|
tripledot 20 Apr 2012, 17:18
Never mind, I just read that the OS in fact CAN and WILL crap all over anything beneath rsp, if provoked. Haven't encountered it in practice, but I don't want to risk it.
|
|||
20 Apr 2012, 17:18 |
|
tripledot 20 Apr 2012, 20:44
Evidently I'm just thinking out loud in here But in case anybody notices, I think I've finished being retarded now. Here's a new prologue/epilogue scheme for functions that don't have any stack-passed arguments.
You can address the full 256-byte local space relative to rbp (+/- 128 bytes), Alternatively, if you're running out of registers, you're free to use rbp as a GPR - the epilogue will still restore it from the stack. Too lazy to macro-ify this. Here's the tabloid version: Code: foo: push rsp push rbp push qword [rsp+0x08] and esp, 1110'0000b lea rbp, [rsp-0x80] sub rsp, 0x120 ; Code goes here! mov rsp, qword [rsp+0x130] mov rbp, qword [rsp-0x10] ret And the broadsheet: Code: foo: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; ; Prologue/Epilogue for x64 pass by register functions ; ; ; ; - Ensures stack is 32-byte aligned; ; ; - Allocates 32 bytes of shadow space for x64 ABI calls; ; ; - Can address 128 bytes above rsp using 1-byte immediate displacement; ; ; - rbp is free for general use, or it can be used to address 128 bytes either side of it using 1-byte immediate displacement. ; ; ; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; If 32-byte-aligned on entry: ; If 16-byte-aligned on entry: ; ; ; 0x180 ; 0x170 ; rsp -> 0x178 : rip ; rsp -> 0x168 : rip ; ; push rsp ; ; ; ; 0x180 ; 0x170 ; 0x178 : rip ; 0x168 : rip ; rsp -> 0x170 : 0x178 ; rsp -> 0x160 : 0x168 ; ; push rbp ; ; ; ; 0x180 ; 0x170 ; 0x178 : rip ; 0x168 : rip ; 0x170 : 0x178 ; 0x160 : 0x168 ; rsp -> 0x168 : rbp ; rsp -> 0x158 : rbp ; ; push qword [rsp+0x08] ; ; ; ; 0x180 ; 0x170 ; 0x178 : rip ; 0x168 : rip ; 0x170 : 0x178 ; 0x160 : 0x168 ; 0x168 : rbp ; 0x158 : rbp ; rsp -> 0x160 : 0x178 ; rsp -> 0x150 : 0x168 ; ; and esp, 1110'0000b ; ; ; ; 0x180 ; 0x170 ; 0x178 : rip ; 0x168 : rip ; 0x170 : 0x178 ; 0x160 : 0x168 ; 0x168 : rbp ; 0x158 : rbp ; rsp -> 0x160 : 0x178 ; 0x150 : 0x168 ; ; 0x148 ; ; rsp -> 0x140 ; lea rbp, [rsp-0x80] ; Optional. Comment out if rbp-relative addressing is not required. ; ; ; rbp = 0x0e0 ; rbp = 0xc0 ; ; sub rsp, 0x120 ; 256 bytes for locals + 32 bytes for x64 ABI calls. ; ; ; rsp = 0x040 ; rsp = 0x020 ; ; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; virtual at rsp ; Locals go here! end virtual ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; Code goes here! ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; mov rsp, qword [rsp+0x130] ; 256 bytes for locals + 32 bytes for x64 ABI calls + 16 bytes for potential misalignment ; ; ; rsp = [0x170] = 0x178 ; rsp = [0x150] = 0x168 mov rbp, qword [rsp-0x10] ret I'm finding good use for this approach in audio DSP - having a nice chunk of 32-byte-aligned stack space is very handy for AVX processing. |
|||
20 Apr 2012, 20:44 |
|
Feryno 23 Apr 2012, 12:46
You have very interesting ideas. Keep your colorful thinking, never be frustrated when others don't react. Creativity is the biggest treasure which usually slowly and silently disappears as everybody gets older.
I use this boring method, I stole it while studying executables comming with OS and never developed anything better. But it is fast because only 1 instruction with RSP is performed. RSP must be properly aligned before calling procedures. RSP is always aligned at OEP and then it is the task of correctly programming to keep it aligned. Code: push rbx a = 1 ; number of pushed qwords b = 4 ; number of qwords reserved for API c = sizeof.LV_ITEM64 ; stack frame in bytes d = (c+7)/8 ; stack frame in qwords e = (a+b+d+1) and 1 ; align stack 16 sub rsp,8*(b+d+e) virtual at rsp+8*b lvit LV_ITEM64 end virtual xor ebx,ebx ; counter mov [lvit.mask],LVIF_TEXT another sample: Code: a = 0 ; number of pushed qwords b = 4 ; number of qwords reserved for API c = 6 ; number of qwords for API params (CreateProcess uses 10 input params) d = (sizeof.PROCESS_INFORMATION+7)/8 ; stack frame in qwords e = (sizeof.STARTUPINFO+7)/8 z = (a+b+c+d+e+1) and 1 ; align stack 10h sub rsp,8*(b+c+d+e+z) virtual at rsp+(b+c)*8 ProcessInfo PROCESS_INFORMATION end virtual virtual at rsp+(b+c+d)*8 StartupInfo STARTUPINFO end virtual lea rcx,[StartupInfo] mov [rcx + STARTUPINFO.cb],sizeof.STARTUPINFO call [GetStartupInfoA] lea rax,[ProcessInfo] lea rcx,[StartupInfo] mov [rsp+8*(4+5)],rax ; lpProcessInformation 10th argument mov [rsp+8*(4+4)],rcx ; lpStartupInfo 9th argument xor r9,r9 ; lpThreadAttributes 4th arg xor ecx,ecx ; lpApplicationName 1st arg mov [rsp+8*(4+3)],r9 ; lpCurrentDirectory 8th arg mov [rsp+8*(4+2)],r9 ; lpEnvironment 7th arg mov [rsp+8*(4+1)],ecx ; dwCreationFlags 6th arg mov [rsp+8*(4+0)],cl ; bInheritHandles 5th arg xor r8,r8 ; lpProcessAttributes 3rd arg lea rdx,[process_name] ; lpCommandLine 2nd arg call [CreateProcessA] or eax,eax jz exit |
|||
23 Apr 2012, 12:46 |
|
tripledot 23 Apr 2012, 20:31
Hey, thanks for the encouragement!
Plenty to think about there, that's for sure. |
|||
23 Apr 2012, 20:31 |
|
Tomasz Grysztar 23 Apr 2012, 20:37
Feryno: the method you used is also available as one of the sets of prologue/epilogue proc macros in standard fasm's includes, the "static RSP". See documentation on customizing procedures for details.
|
|||
23 Apr 2012, 20:37 |
|
Feryno 25 Apr 2012, 06:10
Hi Tomasz, that's super.
I missed progress in FASM macros and seems that I stayed living in years when betas and release candidates of win 2003 server x64 were for free download directly from microsoft... calculating these prologues manually till today, have to switch to macros immediately. |
|||
25 Apr 2012, 06:10 |
|
tripledot 25 Apr 2012, 11:55
Just out of curiosity, why does "and esp, -32" work, but "and esp, 1100000b" fail? It's as if the binary representation doesn't get sign-extended correctly... Is this a bug in fasm?
revolution says: Oops, sorry I pressed then wrong button and accidentally edited instead of quoting. But anyhow it is not a bug. The generated code is different. Have a look with a debugger to see what is generated. Hint: The 32-bit values for -32 and 0xe0 are not the same. |
|||
25 Apr 2012, 11:55 |
|
revolution 25 Apr 2012, 12:12
tripledot: My apologies for accidentally editing and erasing some of your post. Please see my response above.
|
|||
25 Apr 2012, 12:12 |
|
tripledot 25 Apr 2012, 12:40
Gotcha. Thanks rev!
[edit]:headsmack: :headsmack: :headsmack: :headsmack:[/edit] |
|||
25 Apr 2012, 12:40 |
|
SergeyN 22 Sep 2015, 12:21
Quote:
Hi. Sorry to resurrect old thread, but where did you read about this exactly? Thanks. |
|||
22 Sep 2015, 12:21 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.