
   This document is still far from being completed, please be patient.
   Last modified: 09-04-2003

------------------------------------------------------------------------------


	     The official guide to flat assembler 1.47 internals


Chapter 1  Introduction
-----------------------

This guide is intended mainly to help all the people that want to make some
modifications into flat assembler source, port it to another operating system,
or add some features. But it may also become helpful for the people who want
to write their own assembler or compiler and want to see how was it done in
this case. The reader is expected to know assembly language enough to read
some small piece of assembly code and understand what it does.
  Altough there are almost no comments inside the source of flat assembler,
the detailed labels are usually enough to navigate through this source,
because in most cases they explain what exactly does the piece of code that is
following them. These names will be our navigational points in the source, so
it's better not to change them if you want to use this guide.


1.1  Source structure

In the SOURCE directory you can find many files with .INC extension an a few
subdirectories. Files in the root of SOURCE directory form the "core" of FASM
and are OS-independent, so they are common for all versions of the program.
Core is written in the way which allows to compile it in both 16-bit or 32-bit
mode, but it always uses the 32-bit addressing and therefore it needs a flat
memory space to be provided.
  Each subdirectory contains the files dedicated to one operating system.
This is the "interface" of FASM. It mediates between the core and the OS,
prepares the memory and files for the core and displays the messages during
and after the assembly.


1.2  Memory organization

FASM uses two continuous block of memory for all its operations, we will call
them the main and additional memory block. The first version of FASM was
designed for DOS and it was allocating the free extended memory as a main
memory block and the free conventional memory as a additional memory block.
  Other versions allocate one big memory block and divide it into two parts,
with main memory block seven times bigger than additional memory block.
This proportion could be changed, but the additional block doesn't need to be
as large as the main block.
  Core uses both those block from the bottom and from the top. When the
highest used address from the bottom meets the lowest used address from the
top in the same block, core calls the error handler in the interface with
the "out of memory" message. The data structures that are contained inside
those blocks will be described in later chapters.


1.3  Core modules

The core of FASM consists of four modules: preprocessor, parser, assembler,
and formatter. They should be called by interface exactly in this order after
the memory blocks are allocated and the names of source and output files are
prepared. When an error occurs during the execution of one of these modules,
the OS-dependent error handler in the interface is called to display the
appropriate message and terminate the program. If everything is successful,
interface displays the summary message and exits.
  The first module in order is preprocessor. It loads all the source files
into main memory block, interpretes its sets of directives and processes the
macroinstructions and symbolic constants. It also separates the words and
special characters in the lines and removes the spaces that were between them.
When it's finished with all the loaded files, the source is ready to be
processed by parser.
  The parser module converts the source into FASM's internal code, all
assembly language keywords are replaced with short codes, also each labels
is replaced with it's unique identificator.
  The assembler module intepretes the code generated by parser to generate
the output code. It may take many passes the correctly resolve all the values,
but these passes are quite fast, because all the most time consuming tasks
related to string comparisions are already done by parser. When the assembler
finishes its job, the formatter creates the output file in the selected format
and fills it with the generated code.


Chapter 2  Interface
--------------------

This chapter describes all the functionality OS-dependent interface needs to
provide and should be the most helpful for the ones wanting to port FASM to
some operating system other than those for which interfaces already exist.
The following information applies to each version of the interface, unless it
is said otherwise.


2.1  Interface files

Each interface subdirectory contains FASM.ASM file, which is the main file
that has to be assembled into the final executable. All other source files
are referred by this one using the INCLUDE directive.
  The main file should contain all the structures needed to create valid
executable for the appropriate operating system. The main routine of program
should allocate needed memory blocks, interprete the program arguments and
then call the core modules. After finishing the assembly process it usually
also displays some message about generated code before exiting. The files
containing core are included right after this code.
  The main file should also contain declarations for all the uninitialized
data variables that core is using. These declarations should be copied
exactly in the form they occur in the main file of each interface. They
should be put into executable structure in such way that they won't take any
space in the executable file.
  Other files in the interface subdirectories (usually it's the SYSTEM.INC
file) contain the OS-specific procedures that can be used both by the
interface or core. The procedures that have to be called by core should
stricly follow the rules that will be described later, because the core will
expect them to behave in the same way no matter on what OS it is running.
Other procedures can be specific to interface as they will be called only by
files designed for the chosen operating system.


2.2  Memory allocation

As it was said in previous chapter, FASM needs two continuous blocks of
memory. Interface usually contains the "init_memory" routine, which is called
just after the start of program. This routine allocates the needed memory and
fills the [memory_start] and [memory_end] variables with pointers to the first
byte of main memory block and to the first byte after the main memory block.
In the same way it fills the [additional_memory] and [additional_memory_end]
variables with the pointer for start and end of the additional memory block.
  If program needs to free the allocated memory before exiting, it should do
it in the "exit_program" routine which is called by interface after displaying
the final messages. This routine restores all the system resources that need
to be restored and exits the program. It has one argument, provided in AL
register, which is the exit code that should be passed to OS. If OS accepts
exit code larger than the byte, the value of AL should be zero-extended to fit
the needed size. When this routine is called after the successful assembly,
the exit code is set to 0.


2.3  Program parameters

At the beginning interface calls also its own routine called "get_params".
This procedure converts the command line arguments into separated zero-ended
string. If OS already does it for program, no such routine is needed. When
parameters are ready, interface checks whether there are exacly two valid
parameters and fills the [input_file] and [output_file] variables with the
pointers to first and second parameter. If the count of parameters is any
other number, it jumps to "information" routine, which displays the
information about program usage and exits by jumping to the "exit_program"
routine with the exit code set to 1.


2.4  File operations

The most important group of routines that is used by core are the file
operations. "open" and "create" routines need a pointer to zero-ended file
name to be provided in EDX, the first one opens the file for reading, the
second one creates the file for writing. Both set the CF to 0 and return the
file handle in EBX when the operation is succesful, otherwise CF is set 1.
These functions should not modify the ESI, EDI and EBP registers.

"read" routine reads the bytes of count given in ECX from the file handle given in
EBX into the buffer pointed by EDX. "write" writes the bytes of count given in
ECX from the memory pointed by EDX into handle given in EBX. Both these
functions clear CF if operation is succesful or set it otherwise. These
functions should leave the EBX, ECX, EDX, ESI, EDI and EBP unmodified.

"close" closes the handle given in EBX, it should keep the ESI, EDI and EBP unchanged.

"lseek" moves the current position in file of handle given in EBX, EDX
contains the amount of bytes by which the position should be moved and AL
contains the identifier of the origin for the move (0 means the beginning of
file, 1 means current position, 2 means the end of file). It should return the
new position from the beginning of file in EAX and leave the EBX, ESI, EDI and
EBP unchanged.


2.5  Timestamp

The "make_timestamp" routine should return the valid timestamp in the EAX.
This value should contain current system time converted to the number of
seconds since the 1-1-1970 00:00:00 (some operating systems already use the
timestamp format for the system time, so the conversion is not needed there).
This procedure is used by formatter.


2.6  Error handling

There are two procedures in the interface that are called in case of the error
during the compilation. They should display the appropriate message and then
exit by calling the "exit_program" with appropriate exit code. The pointer to
the zero-ended string describing the error that occured is stored on the stack
(pointed by ESP), because each such string follows the CALL instruction which
executes the error handler in interface. So the address stored on stack will
be the word value if executable uses 16-bit code or double word value if
executable uses 32-bit code.
  The "fatal_error" routine is called when some error occurs that is not
related to any particular line of source. This handler should just display
the message of address stored on the stack and then call the "exit_program"
with the exit code 255.
  The "assembler_error" routine is called when error that occured is related
to the line of source that was currently processed. This handler not only
displays the message from address on the stack, but also displays detailed
information about the line in which an error occured. To understand how it
works, the knowledge about the preprocessed source format is needed, and it
will be described in next chapter. But usually it should be enough to copy
the standard handler that is used in the same form by Win32 and Linux version
and should be OS-independent if only executable uses 32-bit code and all
needed interface routines are provided. It uses some of the routines for the
file operations that were already described, and also some displaying routines
that are described below. After displaying all messages this handler jumps
to the "exit_program" routine with the exit code 2.
  Interface defines also the "dm" macro that is then used to define all
error messages. For the standard error handlers it defines the message as
zero-ended string.


2.7  Displaying messages

Interface contains a few routines for displaying messages, but only one is
needed and used by core, it's "display_block" procedure, which is called by
assembler to display the user messages from source. The address of data
that should be displayed is provided in ESI, ECX contains the amount of bytes
to display. Core doesn't need any register to be preserved by this routine,
but because it is also used by standard error handler, it should leave EBX
register unmodified for it.
  All other display routines are used by the standard error handler and should
preserve the EBX register. "display_string" displays the zero-ended string
pointed by ESI, "display_character" displays the single character from DL,
"display_number" displays the value of EAX as a decimal number.


Chapter 3  Preprocessor
-----------------------

This describes the first of core modules, which is called just after the
interface have completed all its initial tasks. The purpose of this module
is to load the source into memory, separate and enumerate all source lines
and process the special set of directives which we will call the preprocessor
directives. Some of these directive define the symbolic contants and
macroinstruction, and the task of replacing them with appropriate values in
source is also performed by preprocessor. Only one pass is applied to the
whole source at this stage, each line is scanned for directives, symbolic
constants and macroinstructions right after it is separated from the source
file, and all task are done on it before reading the next line.


3.1  Initial state of preprocessor

When interface calls the preprocessor module, it is expected to have already
allocated the needed memory blocks, the [memory_start] and [memory_end]
variables should contain the addresses of the beginning and the end of main
memory block, the [additional_memory] and [additional_memory_end] variables
should contain such addresses for the additional memory block. [input_file]
variable should contain the pointer to zero-ended name of source file, the
[output_file] doesn't need to be set with valid value at this stage, it is
used only by the formatter and can be set directly before calling that module.


[...]