flat assembler
Message board for the users of flat assembler.

Index > Main > RB locals - different place, different size

Author
Thread Post new topic Reply to topic
system error



Joined: 01 Sep 2013
Posts: 671
system error
I was surprised to see that my code size was slashed down to some 200 bytes when I relocated some of my data (rb) in locals. Here is one pseudo-example;

Code:
format elf64 executable
include 'PROC64.INC'

call PSEUDO
call exit

proc PSEUDO
      locals
        ;data2 rb 300 ;==> 182 bytes
        data3 dq 0.0
        data4 dw 0
        ;data2 rb 300 ;==> 191 bytes
        data5 db 0
        data1 rb 34
        ;data2 rb 300 ;==> 194 bytes
      endl
      ret
endp    


Try enabling one of the commented data and see how it affects the code size at compile time. I am on 1.71.24.

Is this expected? Thanks.
Post 20 Nov 2014, 22:39
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
system error
Quote:
Is this expected? Thanks.

Sure. The encoding scheme of the x86/x64 processors historically tries to be very thrifty with respect to the code size. In particular instruction encoding depends on how large an immediate value directly encoded into the instruction is. E.g. sub esp,$70 would be encoded into 3 bytes, but sub esp,$80 is already 6 bytes. The difference of 3 bytes is quite common in such situations, because an immediate can normally be encoded either in a truncated form as 1 byte (if it fits into one byte) or in a full form as 4 bytes.

If you now look at your example, you'll see, that the increase in code size is always divisible by 3:
1) uncommenting the first data2 affects the size of the stack, so that an immediate in the stack allocation instruction does not fit into the truncated form. Hence +3 bytes.
2) then uncommenting the second data2 affects in-stack positions of data3 and data4, so that the in-stack offsets do not fit into the truncated form. Additionally the qword-sized data3 has to be initialized with 2 instructions. Hence (2+1)*3 = +9 bytes.
3) and then uncommenting the third data2 affects the in-stack position of data5 the same way, so that it's initialization requires 3 more bytes.

For a more visual demonstration of what happens, I'd suggest you to use the macros from this post: just put ilen_ iset and _ilen around the blocks of code, encoding of which you wanna see. The macros make use of some other macros. Thus you'd need the latter as well (the display macro must be renamed into xdisplay). Additionally I'd suggest to use the console version of fasm with a third-party IDE (fasmw's output window is too small), so that if you do everything correctly, the compilation output would look like this:
Image


Description:
Filesize: 10 KB
Viewed: 3834 Time(s)

capture.png



_________________
Faith is a superposition of knowledge and fallacy
Post 21 Nov 2014, 03:06
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Thanks for the very detailed explanation i_inc.

But I am not quite sure about the "thrifty" policy especially when we consider that maybe, just maybe the compiler could just be trying to prevent jagged access to the stack (internal stack alignment perhaps?). I think this is more related to stacking policy rather than encoding because this doesn't happen to DB/DW/DQ without any RB. For example, quite often I see "sub rsp,11" be compiled as "sub rsp,16" under debugger. Could this be the reason?

One more question, is DB and RB treated differently by the compiler/assembler/preprocessor?
Post 21 Nov 2014, 10:35
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
system error
Quote:
just maybe the compiler could just be trying to prevent jagged access to the stack

It does not. The procedure macros put stack variables into the stack frame without any internal alignment in the same order you declare them in the locals block.
Quote:
I think this is more related to stacking policy rather than encoding

Well, it's not. Not more, and not at all. Just look at the encodings and compare.
Quote:
because this doesn't happen to DB/DW/DQ without any RB

As I already said, the rb's in your example significantly change the in-stack positions of initialized variables. Therefore instructions accessing them require longer encodings. No rb's -> no large stack offsets -> no long encodings.
Quote:
For example, quite often I see "sub rsp,11" be compiled as "sub rsp,16" under debugger. Could this be the reason?

No. It's an alignment of the top of stack (only). It's required according to most of the calling conventions. It does not affect instructions accessing local variables. And it's not related to the example you provided.
Quote:
is DB and RB treated differently by the compiler/assembler/preprocessor?

These directives have different syntax. The former takes an arbitrary number of initialization values as arguments, the latter takes the number of bytes to reserve without any initialization. Therefore they are surely treated differently. But they may have the same effect in some cases. E.g., db ?,?,? has the same effect as rb 3.

_________________
Faith is a superposition of knowledge and fallacy
Post 21 Nov 2014, 12:07
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
l_inc wrote:

As I already said, the rb's in your example significantly change the in-stack positions of initialized variables. Therefore instructions accessing them require longer encodings.


This pretty much sums it up.

Technically you are suggesting that I should relocate my RBs to the 'best' position where it compiles to lesser/smallest code size. Am I making a correct assumption here?

Can u give me some more tips on the best data layout technique especially in locals block? Thanks
Post 21 Nov 2014, 12:50
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
system error
Quote:
Am I making a correct assumption here?

Yes. Relocating less often accessed stack variables closer to the top of stack (and hence farther from the stack frame base) will result in smaller code size.
Quote:
Can u give me some more tips on the best data layout technique especially in locals block?

Well, maybe do not allocate that much space on the stack. Smile Put variables of the same size adjacent to each other, so that it's possible to manually align (fasm's procedure macros do not do that automatically, and the align directive doesn't work correctly inside the locals blocks) larger variables on their natural boundaries with less spatial overhead. Aside from that there isn't much more to say, because it always depends.

_________________
Faith is a superposition of knowledge and fallacy
Post 21 Nov 2014, 14:26
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
l_inc wrote:
system error
Quote:
Am I making a correct assumption here?

Yes. Relocating less often accessed stack variables closer to the top of stack (and hence farther from the stack frame base) will result in smaller code size.
Quote:
Can u give me some more tips on the best data layout technique especially in locals block?

Well, maybe do not allocate that much space on the stack. Smile Put variables of the same size adjacent to each other, so that it's possible to manually align (fasm's procedure macros do not do that automatically, and the align directive doesn't work correctly inside the locals blocks) larger variables on their natural boundaries with less spatial overhead. Aside from that there isn't much more to say, because it always depends.


Thanks Very Happy

Plus, RBs should be the first entry in locals...endl and be grouped together. I discovered a significant reduction in code size this way.
Post 22 Nov 2014, 06:41
View user's profile Send private message Reply with quote
ejamesr



Joined: 04 Feb 2011
Posts: 52
Location: Provo, Utah, USA
ejamesr
The thing that impacts the code size is whether the locals are initialized or not.

Unless you initialize it, RB does not initialize the variable to 0 and is functionally the same as 'DB ?' inside a locals/endl block. When used in their own section outside a locals block, or at the very end of the data section, RBs take up no space in your code, but expand and are initialized to 0 when the OS loader loads the code.

You can shrink the code 23 bytes by initializing the vars yourself. Try changing your code as follows:
Code:
proc PSEUDO
      locals
        ;data2 rb 300 ;==> 182 bytes
        data3 dq ?     ; change to uninitialized
        data4 dw ?     ; change to uninitialized
        ;data2 rb 300 ;==> 191 bytes
        data5 db      ; change to uninitialized
        data1 rb 34
        ;data2 rb 300 ;==> 194 bytes
      endl
; And manually initialize the data yourself if size is important to you (and this may run
; slightly faster if this function is called a zillion times because the code is much smaller).
;
; NOTE: FASM can't tell which reg is safe to use to initialize your vars, but
; you can.  So use whatever reg you want in place of eax/rax.
      xor eax, eax        ; 2 bytes (also clears upper dword of rax)
      mov [data3], rax    ; 4 bytes
      mov [data4], ax     ; 4 bytes
      mov [data5], al     ; 3 bytes
                         ; total initialization is just 13 bytes instead of 36 bytes
      ret
endp
    


ejamesr
Post 25 Nov 2014, 19:19
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.