Message board for the users of flat assembler.
> Assembly > Learning binary file formats (work in progress)
This is an early stage of my upcoming book-like tutorial on the executable/object formats, with a heavy assistance of fasmg. I start with PE and I plan to follow up with ELF (including object varieties), then perhaps Mach-O and possibly some others.
You may follow me on Twitter for a little sidenotes and outtakes that I post while writing the main text.
Getting started with fasmg
PE (Portable Executable)
1.1 Building a simple program
1.2 Adding relocations
1.3 Making a library
1.4 Embedding resources
1.5 Moving to 64 bits with PE+
1.6 Experimenting further
ELF (Executable and Linkable Format)
2.1 A minimal executable file
To be continued...
Last edited by Tomasz Grysztar on 10 Mar 2019, 18:34; edited 2 times in total
|15 Jul 2018, 11:48||
Getting started with fasmg
To learn the inner workings of binary files we are going to construct them manually, with help of fasmg. This is a command line tool that takes the source text, which is like a script that defines how to assemble the binary file out of its components (down to bytes or even individual bits) and saves such produced file under a given name:
fasmg source.asm output.bin
The source contains a series commands, each on its own line of text. One of the basic instructions is DB, which defines data as a series of bytes:
If the source text looks like above, fasmg is going to produce a file that contains three bytes with the given values (that happen to be the ASCII codes of digits from 1 to 3).
The definitions of data can use units larger than a byte, among other available instruction there is DW to define 16-bit (2-byte) "words", DD for 32-bit (4-byte) "double words" and DQ for 64-bit (8-byte) "quad words". They all store values as little-endian (there are easy methods to define big-endian data too, but we are not going to need them here).
A data can also be defined as a string of bytes copied as-is from the source text. Such sequence of characters needs to be enclosed with either single or double quotes:
The DUP operator allows to define several duplicates of the same value:
db 3 dup '!'
Any definition of data has assigned an address, starting from zero. Data can be given a name, by writing it before the DB or other similar instruction as a so-called label:
This name can then be used in expressions and its value is the address of the first byte of the data that it labeled. The following produces a 32-bit value equal to the address of "digits" (most likely zero):
digits db 49,50,51,52,53,54,55,56,57
What makes the assembler especially useful is that we can define various things out of order and fasmg is going to compute and put the right values in the right places, like:
The file generated from the above sample is going to start with a 32-bit value equal to the difference between the "null" and "digits" addresses, that is the length of the string of digits.
dd null - digits digits db 49,50,51,52,53,54,55,56,57 null db 0
A label can also be created without a data definition on the same line, in such case the name needs to be followed by a colon:
A name can also be assigned any computed value with the = or := operator. The := defines a constant, like a label, while = defines a variable whose value may be changed by another similar assignment later.
dd eof db 'Hello!' eof
The $ is a special name that always equals to the current address:
dd length digits db 49,50,51,52,53,54,55,56,57 null db 0 length = null - digits
Assigning the value of $ to a name has the same effect as defining such named label.
dd length digits db 49,50,51,52,53,54,55,56,57 length = $ - digits db 0
Various portions of executable files may end up loaded to a different addresses in memory. The instruction ORG allows to change the assumed address for the data definitions that follow, without altering the position in file. This changes the value of $ and the values of all labels defined after this point. Since this decouples $ from the position within the generated file, there is another special name $% that always equals to the position in file regardless of the assumed address.
org 0x100 start offset = $% dd start ; equals 0x100