flat assembler
Message board for the users of flat assembler.

Index > Macroinstructions > requesting help with a stack touching macro

Author
Thread Post new topic Reply to topic
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 03 May 2016, 04:59
consider the following function which takes rax as an input.

Code:
;        align   16
__chkstk_ms:
        push    rcx
        push    rax
        cmp     rax, PAGE_SIZE
        lea     rcx, [rsp + 24]
        jb      ._LessThanAPage
._MoreThanAPage:
        sub     rcx, PAGE_SIZE
        or      byte[rcx], 0
        sub     rax, PAGE_SIZE
        cmp     rax, PAGE_SIZE
        ja      ._MoreThanAPage
._LessThanAPage:
        sub     rcx, rax
        or      byte[rcx], 0
        pop     rax
        pop     rcx
        ret    


if this is only ever called on compile time constant values of rax, this can be unrolled into the explicit
Code:
or byte[rsp-?], 0
or byte[rsp-??], 0    


How to make a macro to do this? It should start like this
Code:
macro _chkstk_ms stackpointer, size {
...
}    
Post 03 May 2016, 04:59
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 03 May 2016, 05:13
'repeat' should be just fine for the job.
Code:
        repeat (size + STACK_PAGE_SIZE - 1) / STACK_PAGE_SIZE
                cmp     byte[esp - % * STACK_PAGE_SIZE],al
        end repeat
        if size < STACK_PAGE_SIZE
                cmp     byte[esp - size],al
        end if    
Post 03 May 2016, 05:13
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 03 May 2016, 05:19
For the record I just want to add that stack touching at the procedure entry is a terrible idea. Much better to expand the stack once at startup and don't waste time touching a stack that already exists every time a function is subsequently called. Anyhow, not saying this applies to the code above, it might well be that this is in your startup code, in which case, good job Smile
Post 03 May 2016, 05:19
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 07 May 2016, 20:12
Thanks! I didn't realize it was this simple - I wanted to emulate the x86 instructions inside the macro. Anyways, there are some portions of code that you just want to work and other portions that you want to work fast. This is sufficient for the former.

Code:
; use this macro if you are too lazy to touch beforehand the required amount of stack
;  for functions that need more than 4K of stack space
; here we assume that the current stack pointer is in the committed range
; if size > 4096, [rsp-size] might be past the guard page
;  so touch the pages up to it
STACK_PAGE_SIZE = 4096
macro _chkstk_ms stackptr, size {
        repeat (size+8) / STACK_PAGE_SIZE
                cmp   al, byte[stackptr - % * STACK_PAGE_SIZE]
        end repeat
}
    


[Edited] commited->RESERVED
[Edited] RESERVED->committed


Last edited by tthsqe on 09 May 2016, 04:12; edited 2 times in total
Post 07 May 2016, 20:12
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 08 May 2016, 12:33
tthsqe wrote:
Code:
... we assume that the current stack pointer is in the commited range    
I assume you actually mean the reserved range. Wink
Post 08 May 2016, 12:33
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 08 May 2016, 17:32
tthsqe, revolution
revolution wrote:
For the record I just want to add that stack touching at the procedure entry is a terrible idea. Much better to expand the stack once at startup and don't waste time touching a stack that already exists every time a function is subsequently called

tthsqe wrote:
use this macro if you are too lazy to touch beforehand the required amount of stack

I actually expected tthsqe to disagree, because revolution's suggestion makes absolutely no sense (to me). Stack probing must be done at the procedure entry only, because in most cases the stack frame address at function execution time is not known beforehand. Moreover in most cases it's not even known, whether that specific stack-greedy function is gonna be called at all, and therefore the suggestion contradicts revolution's own point here. And even for stack locations known for sure to be used at runtime (guaranteed minimal stack consumption) preliminary stack probing makes no sense as well, because in this case one just needs to specify SizeOfStackCommit correctly and omit stack probing completely.

revolution wrote:
I assume you actually mean the reserved range

No, he actually correctly means commited range. Because only in this case following stack probing is guaranteed to not miss the guard page.


tthsqe
As for your original function, it has a large optimization potential. The sub instruction directly followed by the cmp instruction looks especially disturbing. There are many other ways in which the function is suboptimal, so I'm just gonna summarize them by rewriting the function (stack probing and stack allocation combined):
Code:
__allocstk:
        sub rsp,rax
        assert bsr PAGE_SIZE = bsf PAGE_SIZE
        and rax,-PAGE_SIZE
        jz .return
.next_page:
        cmp [rsp+rax],edx     ;3 bytes only, avoids any kinds of partial register access stall
        sub rax,PAGE_SIZE
        jae  .next_page
.return:
        ret    

Note that you don't need a single stack access in case the frame needed is smaller than a page. The function is OK in case you really don't know the stack frame size at compile time, which is quite uncommon. To differentiate between the known and unknown stack frame size cases you might be willing to use the relativeto operator:
Code:
macro m_allocstk stackptr, size
{
    if size relativeto 0
    ;statically sized stack frame
        sub rsp,size
        repeat size / STACK_PAGE_SIZE
            mov byte[rsp + size / STACK_PAGE_SIZE - % * STACK_PAGE_SIZE]
        end repeat
    else
    ;dynamically sized stack frame
        if ~ size eq rax
            mov rax,size
        end if
        call __allocstk
    else
}    

_________________
Faith is a superposition of knowledge and fallacy
Post 08 May 2016, 17:32
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 08 May 2016, 22:27
tthsqe
l_inc wrote:
There are many other ways in which the function is suboptimal

That was so nice of me to replace the suboptimal implementation with a totally wrong one. ^_^ To my justification I was in a hurry. Here's a little fix:
Code:
__allocstk:
        pop r10
        sub rsp,rax
        assert bsr PAGE_SIZE = bsf PAGE_SIZE
        and rax,-PAGE_SIZE
        jz .return
.next_page:
        cmp [rsp+rax],edx     ;3 bytes only, avoids any kinds of partial register access stall
        sub rax, PAGE_SIZE
        ja .next_page
.return:
        push r10
        ret    

The macro m_allocstk became a victim of my reduced attention even more: senseless mov and else, unneeded argument. Additionally I forgot to note that a loop might still be more preferrable, as the number of the 7-bytes-long probe instructions could become undesireably large. I wasn't going to take that into account, but as long as I screwed up and need to fix it anyway, here's the full version that tries to do a good job doing size optimization:
Code:
macro m_allocstk size*
{
    local sz,off,..next_page
    if 0 relativeto size
        ;statically sized stack frame
        sz = ((size)+7) and (-8)
        assert bsr PAGE_SIZE = bsf PAGE_SIZE
        off = sz and (-PAGE_SIZE)
        if off < PAGE_SIZE*3 & ~sz = PAGE_SIZE*2
            sub rsp,sz
            times off/PAGE_SIZE+1 : cmp [rsp + off - (%-1)*PAGE_SIZE],edx
        else
            mov rax,off
            if sz = off
                sub rsp,rax
            else
                sub rsp,sz
            end if
            ..next_page:
                cmp [rsp+rax],edx
                sub rax,PAGE_SIZE
            jae ..next_page
        end if
    else
        ;dynamically sized stack frame
        if ~ rax eq size
            mov rax,size
        end if
        call __allocstk
    end if
}    

_________________
Faith is a superposition of knowledge and fallacy
Post 08 May 2016, 22:27
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 08 May 2016, 23:10
l_inc wrote:
I actually expected tthsqe to disagree, because revolution's suggestion makes absolutely no sense (to me). Stack probing must be done at the procedure entry only, because in most cases the stack frame address at function execution time is not known beforehand. Moreover in most cases it's not even known, whether that specific stack-greedy function is gonna be called at all, and therefore the suggestion contradicts revolution's own point here. And even for stack locations known for sure to be used at runtime (guaranteed minimal stack consumption) preliminary stack probing makes no sense as well, because in this case one just needs to specify SizeOfStackCommit correctly and omit stack probing completely.
Part of software engineering is to know the programs limits. If you can't predict how much stack it will use then you can't guarantee that the touching will not fall outside the reserved range. I don't think your link is not relevant, because that is not stack touching in any sense, it is to overcome a limitation with fasm's design architecture of greedily grabbing the most amount of memory it can.
l_inc wrote:
revolution wrote:
I assume you actually mean the reserved range

No, he actually correctly means commited range. Because only in this case following stack probing is guaranteed to not miss the guard page.
If it is already committed then there is no need to touch it. And if it is reserved then you'll be fine committing new pages with touching as long as you don't fall outside the reserved range. Perhaps I misunderstood tthsqe's statement, but if you assume it doesn't fall outside the committed range the touching is not required since it is already committed.
Post 08 May 2016, 23:10
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 08 May 2016, 23:33
revolution
Quote:
If you can't predict how much stack it will use then you can't guarantee that the touching will not fall outside the reserved range.

For most of real world programs it's not possible to predict how much stack exactly they need. Just because the exact amount of stack depends on runtime conditions. What you can predict is how much stack the program needs at least (that's SizeOfStackCommit) and how much it needs at most (that's SizeOfStackReserved). The latter gives the guarantee for not falling outside the reserved range.

Quote:
I don't think your link is not relevant

I assume, you mean you don't think it's relevant. Well it is. For the exact same reason I mentioned above: you cannot know beforehand how much memory fasm will actually use during the compilation. For that reason you don't allocate all the requested amount of memory at once, same as most programs don't allocate all the amount of stack, but do that on-demand instead by touching the guard page.

Quote:
If it is already committed then there is no need to touch it

If you look more attentively at his code, you'll notice that he doesn't touch it. He starts touching it one page before.

Quote:
And if it is reserved then you'll be fine committing new pages with touching as long as you don't fall outside the reserved range.

No, you won't. And that's the exact reason, why stack probing exists and is inserted by compilers as a regular part of function prologues. You'll only be fine if you won't miss the guard page. Otherwise the program will crash.

_________________
Faith is a superposition of knowledge and fallacy
Post 08 May 2016, 23:33
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 09 May 2016, 00:27
revolution wrote:
tthsqe wrote:
Code:
... we assume that the current stack pointer is in the commited range    
I assume you actually mean the reserved range. Wink


Yes - the names are confusing for me. I mean that accessing [rsp] will not cause a page fault (and is not the guard page)
Post 09 May 2016, 00:27
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 09 May 2016, 00:32
tthsqe
Quote:
Yes - the names are confusing for me. I mean that accessing [rsp] will not cause a page fault (and is not the guard page)

Oh my god. That means "No"! Because reserved range will cause a page fault. The commited range won't. You were perfectly correct the first time.

_________________
Faith is a superposition of knowledge and fallacy
Post 09 May 2016, 00:32
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 09 May 2016, 04:13
:O
Post 09 May 2016, 04:13
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.