flat assembler
Message board for the users of flat assembler.

Index > Windows > "Red zone" in Windows?

Author
Thread Post new topic Reply to topic
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot
I know we're not "supposed" to use it, but is there any real danger in using the area below the stack pointer in Win64? I'm talking about user-mode apps here... it's not like an interrupt is going to come and piss all over my thread's stack, so where's the harm?

Take this prologue for example:

Code:
myFunc:
    mov     rax, rsp        ; align rsp to 32 bytes,
    and     rax, 31         ; keeping a backup of
    and     rsp, -32        ; the misalignment in [rsp]
    mov     [rsp], rax

; insert code here

    add     rsp, [rsp]
    ret
    


We can use rbp as a general purpose register, and we have 128 bytes of local storage accessible from rsp with a 1-byte displacement. Plus the stack is 32-byte aligned for AVX fun.

Obviously this is only useful for leaf functions, but still... where's the catch? Can I safely use this for my inner-most DSP functions?
Post 20 Apr 2012, 15:40
View user's profile Send private message Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot
Or even this:

Code:
foo:
        movq    mm7, rsp
        and     rsp, -32
        lea     rbp, [rsp-0x80]



        movq    rsp, mm7
        ret
    


Now we can access 256 bytes of local storage using rbp (saving a byte), we avoid writes to memory, and we can still use rsp as a GPR. Is this safe, or am I being an idiot?
Post 20 Apr 2012, 16:35
View user's profile Send private message Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot
Never mind, I just read that the OS in fact CAN and WILL crap all over anything beneath rsp, if provoked. Haven't encountered it in practice, but I don't want to risk it.
Post 20 Apr 2012, 17:18
View user's profile Send private message Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot
Evidently I'm just thinking out loud in here Very Happy But in case anybody notices, I think I've finished being retarded now. Here's a new prologue/epilogue scheme for functions that don't have any stack-passed arguments.

You can address the full 256-byte local space relative to rbp (+/- 128 bytes), Alternatively, if you're running out of registers, you're free to use rbp as a GPR - the epilogue will still restore it from the stack.

Too lazy to macro-ify this. Here's the tabloid version:
Code:
foo:
        push    rsp
        push    rbp
        push    qword [rsp+0x08]
        and     esp, 1110'0000b
        lea     rbp, [rsp-0x80]
        sub     rsp, 0x120

        ; Code goes here!

        mov     rsp, qword [rsp+0x130]
        mov     rbp, qword [rsp-0x10]
        ret
    


And the broadsheet:
Code:
foo:
        ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
        ;                                                                                                                              ;
        ; Prologue/Epilogue for x64 pass by register functions                                                                         ;
        ;                                                                                                                              ;
        ; - Ensures stack is 32-byte aligned;                                                                                          ;
        ; - Allocates 32 bytes of shadow space for x64 ABI calls;                                                                      ;
        ; - Can address 128 bytes above rsp using 1-byte immediate displacement;                                                       ;
        ; - rbp is free for general use, or it can be used to address 128 bytes either side of it using 1-byte immediate displacement. ;
        ;                                                                                                                              ;
        ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

        ; If 32-byte-aligned on entry:          ; If 16-byte-aligned on entry:
        ;                                       ;
        ;        0x180                          ;        0x170
        ; rsp -> 0x178 : rip                    ; rsp -> 0x168 : rip
        ;                                       ;
        push    rsp                             ;
        ;                                       ;
        ;        0x180                          ;        0x170
        ;        0x178 : rip                    ;        0x168 : rip
        ; rsp -> 0x170 : 0x178                  ; rsp -> 0x160 : 0x168
        ;                                       ;
        push    rbp                             ;
        ;                                       ;
        ;        0x180                          ;        0x170
        ;        0x178 : rip                    ;        0x168 : rip
        ;        0x170 : 0x178                  ;        0x160 : 0x168
        ; rsp -> 0x168 : rbp                    ; rsp -> 0x158 : rbp
        ;                                       ;
        push    qword [rsp+0x08]                ;
        ;                                       ;
        ;        0x180                          ;        0x170
        ;        0x178 : rip                    ;        0x168 : rip
        ;        0x170 : 0x178                  ;        0x160 : 0x168
        ;        0x168 : rbp                    ;        0x158 : rbp
        ; rsp -> 0x160 : 0x178                  ; rsp -> 0x150 : 0x168
        ;                                       ;
        and     esp, 1110'0000b                 ;
        ;                                       ;
        ;        0x180                          ;        0x170
        ;        0x178 : rip                    ;        0x168 : rip
        ;        0x170 : 0x178                  ;        0x160 : 0x168
        ;        0x168 : rbp                    ;        0x158 : rbp
        ; rsp -> 0x160 : 0x178                  ;        0x150 : 0x168
        ;                                       ;        0x148
        ;                                       ; rsp -> 0x140
        ;
        lea     rbp, [rsp-0x80]         ; Optional. Comment out if rbp-relative addressing is not required.
        ;                                       ;
        ; rbp = 0x0e0                           ; rbp = 0xc0
        ;                                       ;
        sub     rsp, 0x120              ; 256 bytes for locals + 32 bytes for x64 ABI calls.
        ;                                       ;
        ; rsp = 0x040                           ; rsp = 0x020
        ;                                       ;
        ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

        virtual at rsp

                ; Locals go here!

        end virtual

        ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;



        ; Code goes here!



        ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
        ;                                       ;
        mov     rsp, qword [rsp+0x130]  ; 256 bytes for locals + 32 bytes for x64 ABI calls + 16 bytes for potential misalignment
        ;                                       ;
        ; rsp = [0x170] = 0x178                 ; rsp = [0x150] = 0x168

        mov     rbp, qword [rsp-0x10]

        ret
    


I'm finding good use for this approach in audio DSP - having a nice chunk of 32-byte-aligned stack space is very handy for AVX processing.
Post 20 Apr 2012, 20:44
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 454
Location: Czech republic, Slovak republic
Feryno
You have very interesting ideas. Keep your colorful thinking, never be frustrated when others don't react. Creativity is the biggest treasure which usually slowly and silently disappears as everybody gets older.

I use this boring method, I stole it while studying executables comming with OS and never developed anything better. But it is fast because only 1 instruction with RSP is performed. RSP must be properly aligned before calling procedures. RSP is always aligned at OEP and then it is the task of correctly programming to keep it aligned.

Code:
      push    rbx
a        =       1                       ; number of pushed qwords
b  =       4                       ; number of qwords reserved for API
c        =       sizeof.LV_ITEM64        ; stack frame in bytes
d     =       (c+7)/8                 ; stack frame in qwords
e    =       (a+b+d+1) and 1         ; align stack 16
    sub     rsp,8*(b+d+e)

virtual at rsp+8*b
lvit LV_ITEM64
end virtual

        xor     ebx,ebx                 ; counter
   mov     [lvit.mask],LVIF_TEXT    


another sample:
Code:
a        =       0                       ; number of pushed qwords
b  =       4                       ; number of qwords reserved for API
c        =       6                       ; number of qwords for API params (CreateProcess uses 10 input params)
d     =       (sizeof.PROCESS_INFORMATION+7)/8        ; stack frame in qwords
e    =       (sizeof.STARTUPINFO+7)/8
z   =       (a+b+c+d+e+1) and 1     ; align stack 10h
   sub     rsp,8*(b+c+d+e+z)

virtual at rsp+(b+c)*8
ProcessInfo  PROCESS_INFORMATION
end virtual

virtual at rsp+(b+c+d)*8
StartupInfo       STARTUPINFO
end virtual

      lea     rcx,[StartupInfo]
   mov     [rcx + STARTUPINFO.cb],sizeof.STARTUPINFO
   call    [GetStartupInfoA]

       lea     rax,[ProcessInfo]
   lea     rcx,[StartupInfo]
   mov     [rsp+8*(4+5)],rax       ; lpProcessInformation 10th argument
        mov     [rsp+8*(4+4)],rcx       ; lpStartupInfo 9th argument
        xor     r9,r9                   ; lpThreadAttributes 4th arg
        xor     ecx,ecx                 ; lpApplicationName 1st arg
 mov     [rsp+8*(4+3)],r9        ; lpCurrentDirectory 8th arg
        mov     [rsp+8*(4+2)],r9        ; lpEnvironment 7th arg
     mov     [rsp+8*(4+1)],ecx       ; dwCreationFlags 6th arg
   mov     [rsp+8*(4+0)],cl        ; bInheritHandles 5th arg
   xor     r8,r8                   ; lpProcessAttributes 3rd arg
       lea     rdx,[process_name]      ; lpCommandLine 2nd arg
     call    [CreateProcessA]
    or      eax,eax
     jz      exit    
Post 23 Apr 2012, 12:46
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot
Hey, thanks for the encouragement! Smile

Plenty to think about there, that's for sure.
Post 23 Apr 2012, 20:31
View user's profile Send private message Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 7718
Location: Kraków, Poland
Tomasz Grysztar
Feryno: the method you used is also available as one of the sets of prologue/epilogue proc macros in standard fasm's includes, the "static RSP". See documentation on customizing procedures for details.
Post 23 Apr 2012, 20:37
View user's profile Send private message Visit poster's website Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 454
Location: Czech republic, Slovak republic
Feryno
Hi Tomasz, that's super.
I missed progress in FASM macros and seems that I stayed living in years when betas and release candidates of win 2003 server x64 were for free download directly from microsoft... calculating these prologues manually till today, have to switch to macros immediately.
Post 25 Apr 2012, 06:10
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot
Just out of curiosity, why does "and esp, -32" work, but "and esp, 1100000b" fail? It's as if the binary representation doesn't get sign-extended correctly... Is this a bug in fasm?

revolution says: Oops, sorry I pressed then wrong button and accidentally edited instead of quoting. But anyhow it is not a bug. The generated code is different. Have a look with a debugger to see what is generated.

Hint: The 32-bit values for -32 and 0xe0 are not the same.
Post 25 Apr 2012, 11:55
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17267
Location: In your JS exploiting you and your system
revolution
tripledot: My apologies for accidentally editing and erasing some of your post. Please see my response above.
Post 25 Apr 2012, 12:12
View user's profile Send private message Visit poster's website Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot
Gotcha. Thanks rev!
[edit]:headsmack: :headsmack: :headsmack: :headsmack:[/edit]
Post 25 Apr 2012, 12:40
View user's profile Send private message Reply with quote
SergeyN



Joined: 22 Sep 2015
Posts: 1
SergeyN
Quote:

Never mind, I just read that the OS in fact CAN and WILL crap all over anything beneath rsp, if provoked. Haven't encountered it in practice, but I don't want to risk it.


Hi. Sorry to resurrect old thread, but where did you read about this exactly?

Thanks.
Post 22 Sep 2015, 12:21
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.