flat assembler
Message board for the users of flat assembler.

Index > Windows > Zeroing the stack (locals)

Author
Thread Post new topic Reply to topic
AE



Joined: 07 Apr 2022
Posts: 72
AE 14 Jan 2023, 22:37
When we declare local variables in a procedure, in some cases it is necessary that they be set to zero. In particular, this applies to structures.

In the simplest case (for example, with a variable, or a small structure), you can simply do
Code:
local INFO     dq 0    

But is there a convenient way to zero out large structures?
An option comes to mind
Code:
lea     rcx, [First local var]
invoke  RtlZeroMemory, rcx, sizeof.AllVarsSize
    

If there is no better way, then maybe is a way to automatically get the size of the stack allocated for all variables for not to write
Code:
sizeof.BIGSTRUCT1+sizeof.STRUCT2+etc    
Post 14 Jan 2023, 22:37
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20513
Location: In your JS exploiting you and your system
revolution 15 Jan 2023, 01:08
You can compute the size of all local variables with this:
Code:
include "win64ax.inc"

.code

proc foo
        locals
                bar rb 128
        endl
        lea     rcx, [qword parmbase@proc - (localbase@proc) - 16]
        ret
endp

.end foo    
Post 15 Jan 2023, 01:08
View user's profile Send private message Visit poster's website Reply with quote
AE



Joined: 07 Apr 2022
Posts: 72
AE 15 Jan 2023, 06:01
Thanks!
Post 15 Jan 2023, 06:01
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20513
Location: In your JS exploiting you and your system
revolution 15 Jan 2023, 06:32
To eliminate the lea and use mov:
Code:
include "win64ax.inc"

.code

proc foo uses rdi
        locals
                bar rb 128
        endl
        lea     rdi, [localbase@proc]
        virtual at parmbase@proc - (localbase@proc) - 16
                locals_size = $
        end virtual
        mov     ecx, locals_size shr 3
        xor     eax, eax
        rep     stosq
        ret
endp

.end foo    
Post 15 Jan 2023, 06:32
View user's profile Send private message Visit poster's website Reply with quote
AE



Joined: 07 Apr 2022
Posts: 72
AE 15 Jan 2023, 06:57
revolution wrote:
To eliminate the lea and use mov

Quite a good solution for standard cases. Thank you!
I will add that if a large stack size is used (for example, modern browsers on Windows use 8Mb), then the RtlZeroMemory will be a little bit faster to nullify large allocations because it uses xmmwords.
Post 15 Jan 2023, 06:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20513
Location: In your JS exploiting you and your system
revolution 15 Jan 2023, 07:01
AE wrote:
then the RtlZeroMemory will be a little bit faster to nullify large allocations because it uses xmmwords.
Maybe. You'd need to test it to confirm. Just the extra call overhead alone might eat up any gains it might have.
Post 15 Jan 2023, 07:01
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 15 Jan 2023, 17:34
Plus the additional item(s?) in the import table.
Post 15 Jan 2023, 17:34
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20513
Location: In your JS exploiting you and your system
revolution 15 Jan 2023, 17:45
If your stack is larger than 4k you might encounter an exception. To fix that fill the stack downwards.
Code:
include "win64ax.inc"

.code

proc foo uses rdi
        locals
                bar rb 8192
        endl
        virtual at parmbase@proc - (localbase@proc) - 16
                locals_size = $
        end virtual
        lea     rdi, [localbase@proc + locals_size - 8]
        mov     ecx, locals_size shr 3
        xor     eax,eax
        std
        rep     stosq
        cld
        ret
endp

.end foo    
Post 15 Jan 2023, 17:45
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20513
Location: In your JS exploiting you and your system
revolution 15 Jan 2023, 17:48
Can RtlZeroMemory fill downwards?
Post 15 Jan 2023, 17:48
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 15 Jan 2023, 20:37
I’d say, better questions are:
1) Is large data structure on the stack a good idea in the first place? (Depends on the task.)
2) Is initializing any local data on the stack in some generic way a good idea? (I’d say, even small arrays should better not be initialized inside the locals/endl block, ’cause almost always there’s a better way than what is already in proc macros.)
3) How often do we really need large zero-initialized blocks of memory, or do we usually just fill those large memory blocks with some non-zero data? If so, do we really need to spend time zeroing out what is going to become non-zero at the very next step? The only reason to zero out I can see is “just in case some C-style string doesn’t get terminated by mistake”. Preventive programming might be good. C-strings aren’t.
Post 15 Jan 2023, 20:37
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2595
Furs 15 Jan 2023, 21:45
revolution wrote:
AE wrote:
then the RtlZeroMemory will be a little bit faster to nullify large allocations because it uses xmmwords.
Maybe. You'd need to test it to confirm. Just the extra call overhead alone might eat up any gains it might have.
You can write a trivial memset in this case with SSE, or even AVX2 or AVX512 (ymm or zmm). Especially if you're zeroing pages and aligned blocks, it's a trivial loop for performance (unaligned tends to have worse performance on cache line splits). Then no call overhead or import table. Wink
Post 15 Jan 2023, 21:45
View user's profile Send private message Reply with quote
AE



Joined: 07 Apr 2022
Posts: 72
AE 15 Jan 2023, 23:46
In general, I use the standard stack and added a clarification purely theoretically.
Quote:
additional item(s?) in the import table

If the function is not used somewhere else for other purposes, then yes.
By the way, in fact, it is just a wrapper for memset.
Code:
RtlZeroMemory proc near
    mov     r8, rdx         ; Size
    xor     edx, edx        ; Val
    jmp     memset
RtlZeroMemory endp    
Quote:
How often do we really need large zero-initialized blocks of memory

Not really often. In my case, it was necessary to nullify the structures for NT functions and their size is quite small.
Quote:
To fix that fill the stack downwards
Can RtlZeroMemory fill downwards?

Well noticed! For experimental purposes, I tried doubling the stack size and got exactly the kind of exception.
Thx.
Post 15 Jan 2023, 23:46
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.