flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Linux > Entry point alignment for ELF64 executable

Author
Thread Post new topic Reply to topic
fasmnewbie



Joined: 01 Mar 2011
Posts: 463

Entry point alignment for ELF64 executable

Seems to me that FASM doesn't make any attempt to align the entry point to any particular boundary, if the code segment is preceded by a data / writeable segment.


Code:
format ELF64 executable 3
entry main

segment readable writeable
msg db 'ello World',0ah,0

segment readable executable
;align 16 ;manual alignment
main:
        mov     rax,msg
        mov     rbx,main
        call    dumpreg
        call    exitx



Produces the following;

Code:
RAX|00000000004000B0 RBX|00000000004010BC RCX|0000000000000000 
RDX|0000000000000000 RSI|0000000000000000 RDI|0000000000000000 
R8 |0000000000000000 R9 |0000000000000000 R10|0000000000000000 
R11|0000000000000000 R12|0000000000000000 R13|0000000000000000 
R14|0000000000000000 R15|0000000000000000 RBP|0000000000000000 
RSP|00007FFF9A9D4520 RIP|00000000004010CA [UHEX]



It is not much of a problem because one can always manually align the entry point, but this would go unnoticed by coders expecting that FASM internal linker would do it for them (as in the case with external linkers).
Post 14 Dec 2017, 12:09
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15383
Location: Monstropolis

Code doesn't need to be aligned.
Post 14 Dec 2017, 12:10
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463


revolution wrote:
Code doesn't need to be aligned.

Any reason?
Post 14 Dec 2017, 12:12
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15383
Location: Monstropolis

The instruction lengths are variable. x86 has no requirement for code alignment.
Post 14 Dec 2017, 12:41
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463

@revolution

ofc code alignment is not a requirement. But it does help in branch prediction, instruction fetch, front-end stalls and similar performance stuff. In fact, FASM PE's executable version does align the entry point to 8 or 16 boundaries (i can't remember which one). If you switched the position of the two segments shown above, the entry point is in fact aligned to 16-byte boundaries.

It gives me the impression that FASM aligns only the first segment it finds in the source, and also suggests that the entry point should be in the first (code) segment in the source. This is quite limiting IMHO. idk, perhaps it has something to do with how kernel handles the entry point.
Post 14 Dec 2017, 16:31
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 959

Usually, the entry point gets executed only once. GCC even marks it as "cold" by default (which means no alignment either, as far as I know). The entry point doesn't have to be at the beginning either.

It makes no sense to align the entry point when a "hot path" function (one called during an inner loop) would benefit from alignment or cache locality instead (stuck hot functions called near each other in close space, and align them of course). Just place the entry point somewhere in darkest pits of anti-performance because it's not important for speed. Wink
Post 14 Dec 2017, 21:09
View user's profile Send private message Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463

@Furs
GCC aligns main entry point to 8/16-byte boundaries. Obviously there is branch prediction involved when GCC calls main and therefore it is crucial for performance for __main to be aligned. So does ld and golink. Even FASM PE's/PE64 console format does it all the time, every time. The only one off here is ELF64 executable format.
Post 15 Dec 2017, 06:02
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463

Here is an example (static) with GCC 64 (Windows). Observe the entry point's address in RAX


Code:
;-------------------------------------------
; fasm this.asm
; gcc -m64 this.obj cpu64.dll -s -o this.exe
;-------------------------------------------
format MS64 coff
public main

extrn dumpreg

section '.data' writeable
msg db 'hello world',0ah,0

section '.text' executable
main:
        sub     rsp,40
        mov     rax,main
        mov     rbx,msg
        call    dumpreg
        add     rsp,40
        ret



Observe RAX

Code:
RAX|00000000004015B0 RBX|0000000000403010 RCX|0000000000000001
RDX|00000000001E1360 RSI|0000000000000012 RDI|00000000001E1330
R8 |00000000001E42A0 R9 |00000000001E1360 R10|0000000000000000
R11|0000000000000246 R12|0000000000000001 R13|0000000000000008
R14|0000000000000000 R15|0000000000000000 RBP|00000000001E1360
RSP|000000000060FE30 RIP|00000000004015C8 [UHEX

Post 15 Dec 2017, 06:23
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463

Another example, this time using GoLink with entry point "start". The entry point is aligned to 16 regardless of the positions of the sections.


Code:
;-------------------------------------------
; fasm this.asm
; golink /console this.obj cpu64.dll
;-------------------------------------------
format MS64 coff
public start

extrn dumpreg
extrn exitx

section '.data' writeable
msg db 'hello world',0ah,0

section '.text' executable
start:
        sub     rsp,40
        mov     rax,start
        mov     rbx,msg
        call    dumpreg
        add     rsp,40
        call    exitx



"start" entry point's address is shown in RAX, well-aligned to page boundaries.


Code:
RAX|0000000000401000 RBX|0000000000402000 RCX|00000000002CE000
RDX|0000000000401000 RSI|0000000000000000 RDI|0000000000000000
R8 |00000000002CE000 R9 |0000000000401000 R10|0000000000000000
R11|0000000000000000 R12|0000000000000000 R13|0000000000000000
R14|0000000000000000 R15|0000000000000000 RBP|0000000000000000
RSP|000000000014FF30 RIP|0000000000401018 [UHEX]

Post 15 Dec 2017, 06:29
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463

Now the equivalent example in Windows PE64 console format as comparison to the Linux ELF64 executable 3 format.


Code:
format PE64 console
include 'win64axp.inc'
entry start

section '.data' data readable writeable
msg db 'hello world',0ah,0

section '.text' code readable executable
start:
        sub     rsp,40
        mov     rax,start
        mov     rbx,msg
        call    [dumpreg]
        call    [exitp]

section '.idata' import data readable
library cpu64f,'cpu64f.dll'
 import cpu64f,dumpreg,'dumpreg',exitp,'exitp'



Observe entry point in RAX


Code:
RAX|0000000000402000 RBX|0000000000401000 RCX|0000000000243000
RDX|0000000000402000 RSI|0000000000000000 RDI|0000000000000000
R8 |0000000000243000 R9 |0000000000402000 R10|0000000000000000
R11|0000000000000000 R12|0000000000000000 R13|0000000000000000
R14|0000000000000000 R15|0000000000000000 RBP|0000000000000000
RSP|000000000008FF30 RIP|0000000000402013 [UHEX]

Post 15 Dec 2017, 06:39
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15383
Location: Monstropolis


fasmnewbie wrote:
GCC aligns main entry point to 8/16-byte boundaries. Obviously there is branch prediction involved when GCC calls main and therefore it is crucial for performance for __main to be aligned. So does ld and golink.

Just because something is aliigned in no way implies it is "crucial for performance". A single call/jmp, even if it is unpredicted and unaligned and uncached, would not even register on any performance monitoring tools.

fasmnewbie wrote:
Even FASM PE's/PE64 console format does it all the time, every time. The only one off here is ELF64 executable format.

PE format is an entirely different thing from ELF. They are not comparable, they load differently and they are formatted differently.
Post 15 Dec 2017, 07:22
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6717
Location: Kraków, Poland

Neither of the formats does align the entry point, this would be pointless. Usually the sections are aligned (this is needed to set up the section attributes, for example) so if you put your entry point right at the start of an section, like in these PE samples, it is going to be aligned just because it has been given the same address as entire section. But an address given to ENTRY directive can be any address:

Code:
format PE
entry start

section '.text' code readable executable

          nop
start:                  ; definitely unaligned

Post 15 Dec 2017, 08:13
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463


Tomasz Grysztar wrote:
Neither of the formats does align the entry point, this would be pointless. Usually the sections are aligned (this is needed to set up the section attributes, for example) so if you put your entry point right at the start of an section, like in these PE samples, it is going to be aligned just because it has been given the same address as entire section. But an address given to ENTRY directive can be any address:

Code:
format PE
entry start

section '.text' code readable executable

          nop
start:                  ; definitely unaligned




Ok. Fair enough. That means coders wishing to use ELF64 executable format should manually align their entry points, if their code segment is preceded by other segment. Just to confirm my suspicion.

One more thing, I am wondering if there's any correlation between the size of data in the data segment and the relative position of the entry point. Because everytime I extend / decrease the string "hello world" in the first post, the entry point position shifted. From what I can see, the two segments are far apart from each other (by pages) and should not interfere with one another?

Example, after increasing the string

Code:
format ELF64 executable 3
entry main

segment readable writeable
msg db 'hello Worldssss',0ah,0

segment readable executable
main:
        mov     rbx,main
        mov     rax,msg
        call    dumpreg

        call    exitx



Yields (observe the changes in RBX)


Code:
RAX|00000000004000B0 RBX|00000000004010C1 RCX|0000000000000000 
RDX|0000000000000000 RSI|0000000000000000 RDI|0000000000000000 
R8 |0000000000000000 R9 |0000000000000000 R10|0000000000000000 
R11|0000000000000000 R12|0000000000000000 R13|0000000000000000 
R14|0000000000000000 R15|0000000000000000 RBP|0000000000000000 
RSP|00007FFFB3B12350 RIP|00000000004010CF [UHEX]



Thanks.
Post 15 Dec 2017, 08:44
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15383
Location: Monstropolis


fasmnewbie wrote:
Ok. Fair enough. That means coders wishing to use ELF64 executable format should manually align their entry points ...

It is not needed and it achieves nothing.

fasmnewbie wrote:
One more thing, I am wondering if there's any correlation between the size of data in the data segment and the relative position of the entry point. Because everytime I extend / decrease the string "hello world" in the first post, the entry point position shifted. From what I can see, the two segments are far apart from each other (by pages) and should not interfere with one another?

The ELF format is different from PE, the segments are not extended to fill the slack space between. So all data and code is joined together with just the section headers between. You can't change that, it is the ELF format. When the loader allocates the pages in memory you will find "gaps" before and after each section where the code and data has been skipped to match the file arrangement.
Post 15 Dec 2017, 08:49
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463

@revo, I thought the CPU recognizes no sections / format in the final binary and the segment selectors should do their jobs in separating code and data.
Post 15 Dec 2017, 08:58
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15383
Location: Monstropolis


fasmnewbie wrote:
@revo, I thought the CPU recognizes no sections / format in the final binary and the segment selectors should do their jobs in separating code and data.

The sections have to reside somewhere. And the paging mechanism is only granular to 4kB*, so the loader has no other option if it wants to properly separate the access rights.

BTW: No modern OS uses the segment registers for rights separation of memory regions. It is all about paging.

* Usually. There are other size options on some CPUs but they are rarely used.
Post 15 Dec 2017, 09:01
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463


revolution wrote:

fasmnewbie wrote:
@revo, I thought the CPU recognizes no sections / format in the final binary and the segment selectors should do their jobs in separating code and data.

The sections have to reside somewhere. And the paging mechanism is only granular to 4kB, so the loader has no other option if it wants to properly separate the access rights.

BTW: No modern OS uses the segment registers for rights separation of memory regions. It is all about paging.



@revo. Ok. But it still doesn't make sense to me how the changes of data size residing in one segment can affect the state of other items in a far away page.

Thanks.
Post 15 Dec 2017, 09:29
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6717
Location: Kraków, Poland

See the ELF format specification:

ELF specification wrote:
Loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size.

This means that the offset that the code has in file affects the address where it is going to be loaded in memory. Since page size is 1000h, if entry point is at offset 0C1h in file, in memory it must have an address of form xxxxx0C1h. Therefore anything that causes things in file to move around is going to affect this address.
Post 15 Dec 2017, 09:36
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 463

Thanks for the explanation and link Tomasz. It answers some of my doubts.

From what I understand, in Linux, the alignment state of a segment is dependant upon how things are laid out in other, probably unrelated, segment or segment(s) which may or may not have similar segment flags as the segment in question.

e.g, If my executable segment appears first in my source, then followed by a writeable segment, there's a chance that the data (2nd) segment would start at uneven address/page. Am I seeing this correctly?
Post 15 Dec 2017, 10:23
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 959


fasmnewbie wrote:
@Furs
GCC aligns main entry point to 8/16-byte boundaries. Obviously there is branch prediction involved when GCC calls main and therefore it is crucial for performance for __main to be aligned. So does ld and golink. Even FASM PE's/PE64 console format does it all the time, every time. The only one off here is ELF64 executable format.

GCC aligns hot functions to 16-bytes, if it was 8-byte aligned it only proves that it just happened to be that way and wasn't exactly intended.

But I mean, you need some hot functions in there (__attribute__((hot))), otherwise there will be no point. If the entry point is the only function then obviously it's going to be aligned since the section itself is. This is just happenstance though. Hot functions get placed before cold functions and thus the entry point is placed last as well in this situation.
Post 15 Dec 2017, 12:50
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2005 phpBB Group.

Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2017, Tomasz Grysztar.