
	     The official guide to flat assembler internals


Table of contents
-----------------


Chapter 1  Introduction

	1.1  Source structure
	1.2  Memory organization
	1.3  Core modules

Chapter 2  Interface

	2.1  Interface files
	2.2  Memory allocation
	2.3  Program parameters
	2.4  File operations
	2.5  Environment variables
	2.6  Timestamp
	2.7  Error handling
	2.8  Displaying messages

Chapter 3  Preprocessor

	3.1  Initial state of preprocessor
	3.2  Main preprocessor routine
	3.3  Preprocessing file
	3.4  Tables used by preprocessor
        3.5  Preprocessing line
        3.6  Preprocessing directive handlers



Chapter 1  Introduction
-----------------------

This guide is intended mainly to help all the people that want to make some
modifications into flat assembler source, port it to another operating system,
or add some features. But it may also become helpful for the people who want
to write their own assembler or compiler and want to see how was it done in
this case. The reader is expected to know assembly language enough to read
some small piece of assembly code and understand what it does.
  Altough there are almost no comments inside the source of flat assembler,
the detailed labels are usually enough to navigate through this source,
because in most cases they explain what exactly does the piece of code that is
following them. These names will be our navigational points in the source, so
it's better not to change them if you want to use this guide.


1.1  Source structure

In the SOURCE directory you can find many files with .INC extension an a few
subdirectories. Files in the root of SOURCE directory form the "core" of FASM
and are OS-independent, so they are common for all versions of the program.
Core is written in the way which allows to compile it in both 16-bit or 32-bit
mode, but it always uses the 32-bit addressing and therefore it needs a flat
memory space to be provided.
  Each subdirectory contains the files dedicated to one operating system.
This is the "interface" of FASM. It mediates between the core and the OS,
prepares the memory and files for the core and displays the messages during
and after the assembly.


1.2  Memory organization

FASM uses two continuous block of memory for all its operations, we will call
them the main and additional memory block. The first version of FASM was
designed for DOS and it was allocating the free extended memory as a main
memory block and the free conventional memory as a additional memory block.
  Other versions allocate one big memory block and divide it into two parts,
with main memory block seven times bigger than additional memory block.
This proportion could be changed, but the additional block doesn't need to be
as large as the main block.
  Core uses both those block from the bottom and from the top. When the
highest used address from the bottom meets the lowest used address from the
top in the same block, core calls the error handler in the interface with
the "out of memory" message. The data structures that are contained inside
those blocks will be described in later chapters.


1.3  Core modules

The core of FASM consists of four modules: preprocessor, parser, assembler,
and formatter. They should be called by interface exactly in this order after
the memory blocks are allocated and the names of source and output files are
prepared. When an error occurs during the execution of one of these modules,
the OS-dependent error handler in the interface is called to display the
appropriate message and terminate the program. If everything is successful,
interface displays the summary message and exits.
  The first module in order is preprocessor. It loads all the source files
into main memory block, interpretes its sets of directives and processes the
macroinstructions and symbolic constants. It also separates the words and
special characters in the lines and removes the spaces that were between them.
When it's finished with all the loaded files, the source is ready to be
processed by parser.
  The parser module converts the source into FASM's internal code, all
assembly language keywords are replaced with short codes, also each labels
is replaced with it's unique identificator.
  The assembler module intepretes the code generated by parser to generate
the output code. It may take many passes the correctly resolve all the values,
but these passes are quite fast, because all the most time consuming tasks
related to string comparisions are already done by parser. When the assembler
finishes its job, the formatter creates the output file in the selected format
and fills it with the generated code.


Chapter 2  Interface
--------------------

This chapter describes all the functionality OS-dependent interface needs to
provide and should be the most helpful for the ones wanting to port FASM to
some operating system other than those for which interfaces already exist.
The following information applies to each version of the interface, unless it
is said otherwise.


2.1  Interface files

Each interface subdirectory contains FASM.ASM file, which is the main file
that has to be assembled into the final executable. All other source files
are referred by this one using the INCLUDE directive.
  The main file should contain all the structures needed to create valid
executable for the appropriate operating system. The main routine of program
should allocate needed memory blocks, interprete the program arguments and
then call the core modules. After finishing the assembly process it usually
also displays some message about generated code before exiting. The files
containing core are included right after this code.
  The main file should also contain declarations for all the uninitialized
data variables that core is using. These declarations should be copied
exactly in the form they occur in the main file of each interface. They
should be put into executable structure in such way that they won't take any
space in the executable file.
  Other files in the interface subdirectories (usually it's the SYSTEM.INC
file) contain the OS-specific procedures that can be used both by the
interface or core. The procedures that have to be called by core should
stricly follow the rules that will be described later, because the core will
expect them to behave in the same way no matter on what OS it is running.
Other procedures can be specific to interface as they will be called only by
files designed for the chosen operating system.


2.2  Memory allocation

As it was said in previous chapter, FASM needs two continuous blocks of
memory. Interface usually contains the "init_memory" routine, which is called
just after the start of program. This routine allocates the needed memory and
fills the [memory_start] and [memory_end] variables with pointers to the first
byte of main memory block and to the first byte after the main memory block.
In the same way it fills the [additional_memory] and [additional_memory_end]
variables with the pointer for start and end of the additional memory block.
  If program needs to free the allocated memory before exiting, it should do
it in the "exit_program" routine which is called by interface after displaying
the final messages. This routine restores all the system resources that need
to be restored and exits the program. It has one argument, provided in AL
register, which is the exit code that should be passed to OS. If OS accepts
exit code larger than the byte, the value of AL should be zero-extended to fit
the needed size. When this routine is called after the successful assembly,
the exit code is set to 0.


2.3  Program parameters

At the beginning interface calls also its own routine called "get_params".
This procedure converts the command line arguments into separated zero-ended
string. If OS already does it for program, no such routine is needed. When
parameters are ready, interface checks whether there are valid parameters
and fills the [input_file] and [output_file] variables with the pointers to
first and second parameter. If there is no second parameter specified,
[output_file] variable should be set to zero. If there is no parameters or
there are too many of them, interface jumps to "information" routine, which
displays the information about program usage and exits by jumping to the
"exit_program" routine with the exit code set to 1.


2.4  File operations

The most important group of routines that is used by core are the file
operations. "open" and "create" routines need a pointer to zero-ended file
name to be provided in EDX (both slashed and backslashed should be interpreted
as a path separators; if system does not support some of them, the conversion
of path must be done first). The first one opens the file for reading, the
second one creates the file for writing. Both set the CF to 0 and return the
file handle in EBX when the operation is succesful, otherwise CF is set 1.
These functions should not modify the ESI, EDI and EBP registers.
  "read" routine reads the bytes of count given in ECX from the file handle
given in EBX into the buffer pointed by EDX. "write" writes the bytes of count
given in ECX from the memory pointed by EDX into handle given in EBX. Both
these functions clear CF if operation is succesful or set it otherwise. These
functions should leave the EBX, ECX, EDX, ESI, EDI and EBP unmodified.
  "close" closes the handle given in EBX, it should keep the ESI, EDI and EBP
unchanged.
  "lseek" moves the current position in file of handle given in EBX, EDX
contains the amount of bytes by which the position should be moved and AL
contains the identifier of the origin for the move (0 means the beginning of
file, 1 means current position, 2 means the end of file). It should return the
new position from the beginning of file in EAX and leave the EBX, ESI, EDI and
EBP unchanged.


2.5  Environment variables

The "get_environment_variable" function should search the system environment
for the variable of zero-ended name pointer by ESI and store its value in
the buffer pointer to by EDI. The buffer is limited only by the top of the
main memory block, that is by [memory_end] variable. The function should
return the EDI pointing to the first byte after the last one filled with
value of environment variable, if no variable of given name was found, the
empty value should be used, that is EDI should be left unchanged.


2.6  Timestamp

The "make_timestamp" routine should return the valid timestamp in the EAX.
This value should contain current system time converted to the number of
seconds since the 1-1-1970 00:00:00 (some operating systems already use the
timestamp format for the system time, so the conversion is not needed there).
This procedure is used by formatter.


2.7  Error handling

There are two procedures in the interface that are called in case of the error
during the compilation. They should display the appropriate message and then
exit by calling the "exit_program" with appropriate exit code. The pointer to
the zero-ended string describing the error that occured is stored on the stack
(pointed by ESP), because each such string follows the CALL instruction which
executes the error handler in interface. So the address stored on stack will
be the word value if executable uses 16-bit code or double word value if
executable uses 32-bit code.
  The "fatal_error" routine is called when some error occurs that is not
related to any particular line of source. This handler should just display
the message of address stored on the stack and then call the "exit_program"
with the exit code 255. All the error messages are provided as the zero-ended
byte strings.
  The "assembler_error" routine is called when error that occured is related
to the line of source that was currently processed. This handler not only
displays the message from address on the stack, but also displays detailed
information about the line in which an error occured. To understand how it
works, the knowledge about the preprocessed source format is needed, and it
will be described in next chapter. But usually it should be enough to copy
the standard handler that is used in the same form by Win32 and Linux version
and should be OS-independent if only executable uses 32-bit code and all
needed interface routines are provided. It uses some of the routines for the
file operations that were already described, and also some displaying routines
that are described below. After displaying all messages this handler jumps
to the "exit_program" routine with the exit code 2.


2.8  Displaying messages

Interface contains a few routines for displaying messages, but only one is
needed and used by core, it's "display_block" procedure, which is called by
assembler to display the user messages from source. The address of data
that should be displayed is provided in ESI, ECX contains the amount of bytes
to display. Core doesn't need any register to be preserved by this routine,
but because it is also used by standard error handler, it should leave EBX
register unmodified for it.
  All other display routines are used by the standard error handler and should
preserve the EBX register. "display_string" displays the zero-ended string
pointed by ESI, "display_character" displays the single character from DL,
"display_number" displays the value of EAX as a decimal number.


Chapter 3  Preprocessor
-----------------------

This chapter describes the first of core modules, which is called just after
the interface have completed all its initial tasks. The purpose of this module
is to load the source into memory, separate and enumerate all source lines
and process the special set of directives which we will call the preprocessor
directives. Some of these directive define the symbolic contants and
macroinstruction, and the task of replacing them with appropriate values in
source is also performed by preprocessor. Only one pass is applied to the
whole source at this stage, each line is scanned for directives, symbolic
constants and macroinstructions right after it is separated from the source
file, and all tasks are done on it before reading the next line.


3.1  Initial state of preprocessor

When interface calls the preprocessor module, it is expected to have already
allocated the needed memory blocks, the [memory_start] and [memory_end]
variables should contain the addresses of the beginning and the end of main
memory block, the [additional_memory] and [additional_memory_end] variables
should contain such addresses for the additional memory block. [input_file]
variable should contain the pointer to zero-ended name of source file.


3.2  Main preprocessor routine

All the preprocessing job is done by the routine called "preprocessor".
At startup, it first generates a table of characters that will be used for
both case conversion and detecting special characters in source. The table
consists of the 256 bytes, and each corresponds to the character of value the
same as the index of byte in table. The zero byte in table marks that the
corresponding character is one of the special characters (also called
symbol characters), for the other characters the bytes in table are the
values for them after converting to lower case - this will be used for
case insensitive comparisions.
  Then the value of the INCLUDE environment variable is received through the 
interface and stored it in the beginning of main memory block, and the 
[memory_start] variable is updated to point to the first free byte after
this value. This new starting point of main memory block is then used as an
origin of buffer where the preprocessed code will be stored. During the
preprocessing the pointer to the place in this buffer where the data of
preprocessed line will be stored is kept in EDI register, initially it is
equal to the [memory_start] variable. To do the actual preprocessing,
"preprocess_file" routine is called, it needs an EDI to point to the current
position in main memory block where the lines have to be stored, ESI to point
to the name of source file and EBX to contain handle of this file already
opened for reading. It returns control to main routine after preprocessing all
lines from that file, the EDI register is updated to point to the first free
byte after the last preprocessed line.This value is later used by parser module
as a pointer to the place where the parsed source will be stored, for this
purpose it is stored in [source_start] variable.


3.3  Preprocessing file

The "preprocess_file" routine uses the file handle stored in EBX register to
load the contents of at the end of main memory block and then it closes the
handle. The [memory_end] variable is updated to point to the first byte of
loaded source, because during the preprocessing it should be always a valid
pointer to the end of free memory - after the preprocessing of file is done,
the [memory_end] is restored to the previous value. The data loaded from file
is followed by one additional byte, which is the ASCII end of file marker
(value 1Ah). This byte is recognized by preprocessor as the place where it
should end preprocessing the file, if such byte is encountered before the
actual end of file, the rest of file will not be preprocessed at all.
  After reading the source comes the "preprocess_source" loop. While in this
loop, the ESI register points to the source that has yet to be preprocessed,
EDI points to the place where the next preprocessed line will be stored,
ECX contains line counter, EBX points to the beginning of loaded file that
is currently preprocessed (when entering the loop, it is the same as ESI) and
EDX points to the name of this file. The values of EBX, ECX and EDX are
needed only for the purpose of building the header of each preprocessed line.
This header is 16 bytes long and consists of four double word values, in the
table 3.1 you can see its layout.


   Table 3.1  Header of preprocessed line loaded from source
  /-------------------------------------------------------------------------\
  | Offset | Value							    |
  |========|================================================================|
  |   +0   | pointer to the name of file, from which the line was loaded    |
  |--------|----------------------------------------------------------------|
  |   +4   | line number in bits 0-30, the highest bit zeroed		    |
  |--------|----------------------------------------------------------------|
  |   +8   | offset of line inside the file				    |
  |--------|----------------------------------------------------------------|
  |  +12   | reserved, set to zero					    |
  \-------------------------------------------------------------------------/


  After making the header for line, the "convert_line" routine is called,
which processes one line from source (all bytes from where the ESI points up
to the line break or EOF character) and converts it into data portions that
follow the line header. They are line elements of different types, followed
by single zero byte, which marks the end of preprocessed line (the header of
next line will follow this byte immediately).
  Any chain of characters that have no special meaning, separated from other
similar chains with spaces or some other special characters, is converted
into symbol element. The first byte of this element has the value of 1Ah, the
second byte is the count of characters, followed by this amount of bytes,
which build the symbol.
  Some characters have a special meaning, and cannot occur inside the symbol,
they split the symbols and are converted into separate line elements. These
characters are defined by the "symbol_characters" list, first byte of this
data structure is the count of characters that follow. Some of those
characters do not become a line element, but have some special meaning for
preprocessor: spaces and tabs are stripped, line breaks are converted into
zero byte that ends the chain of line elements, semicolon is stripped with all
the bytes that follow it up to the end of line and backslash causes the
following line break to be ignored. All other characters from that list
are converted into symbol characters, which are line elements consisting of
only one byte each, with the value the same as original character.
For example, if source contains this line of text:

    mov ax,4

preprocessor will convert it into the chain of bytes, shown here with their
hexadecimal values (characters corresponding to some of those values are
placed below the hexadecimal codes):

    1A 03 6D 6F 76 1A 02 61 78 2C 1A 01 34 00
	  m  o	v	 a  x  ,	4

  The last type of element that can be found in preprocessed line is the
quoted text. This element is created from chain of any bytes other than
line breaks that are placed between the single or double quotes in the
original text. First byte of such element is always 22h, it is followed
by double word which specifies the number of bytes that follow, and the
value of quoted text comes next. For example, this line from source:

    mov eax,'ABCD'

will be converted into (the notation used is the same as in previous sample):

    1A 03 6D 6F 76 1A 03 65 61 78 2C 22 04 00 00 00 41 42 43 44 00
	  m  o	v	 e  a  x  ,		    A  B  C  D

This data defines two symbols followed by symbol character, quoted text and
zero byte that marks end of line.
  There is also a special case of symbol with first byte having the value 3Bh
instead of 1Ah, such symbol means that all the line elements that follow,
including this one, have already been interpreted by preprocessor and should
be ignored by later modules. Such symbol cannot occur in the data created by
"convert_line" routine, but after "convert_line" has done it's job inside the
"preprocess_source" loop, the "preprocess_line" routine is called, and the
purpose of this routine is to recognize some special directives and
macroinstructions, and - if they occur in line - interprete them and update
the line data accordingly.
  The "convert_line" routine can modify the ECX register used by
"preprocess_source" loop - this happens in case of lines concatenated with
backslash character, "convert_line" makes one line out of them, but updates the
ECX register accordingly.


3.4  Tables used by preprocessor

There are few data structures that are used by "preprocess_line" procedure when
scanning the already converted line for directives or some user-defined symbols
like symbolic constants or macros. First one is the static table
"preprocessor_directives", followed by the list of symbols that are recognized
as directives. Each entry in this list begins with byte containing the length
of symbol in characters, then this amount of bytes that define the symbol, and
then 16-bit value, which defines and offset relative to the "preprocessor"
label, which points to the routine that handles this directive.
  The second structure is build during the preprocessing and contains the list
of all macroinstructions defined in source. This structure is placed at the
bottom of additional memory block and ends at address stored in the
[free_additional_memory] variable, which points to the first free byte after
the table of macroinstruction. Each entry in this table is 8 bytes long, the
first double word contains the hash of macroinstruction name, the second
double word contains an address to the first character of the name of
macroinstruction. The hash value is made by the "hash_macro" procedure, which
needs ESI to point to the name, CL to contain the length of name and CH to be
zero, except for the lowest bit, which is the flag set in case of the structure
macroinstruction (the one defined with "struc" directive). The "hash_macro"
procedure returns in EAX the value that is then stored as the first double word
of the entry in table of macroinstructions - it contains the length of name in
bits 0 to 7, structure flag in bit 8 and the 22-bit hash created with FNV-1a
algorithm in the bits 10 to 31 (bit 9 is reserved, possibly for another flag).
This information is enough to recognize and apply the macroinstruction, because
the name pointed by the second doulbe word of entry in the table is the one
inside the line which defined that macroinstruction, so it is followed by the
next elements of line, in particular by the whole definition (possibly spanning
multiple lines).
  The third structure is stored at the top of additional memory blocks and
is the table of symbolic constants. Each entry in this table is 16 bytes long,
first double word contains the hash of constant name, second the address of
constant name, third the length of the value of constant in bytes, and fourth
is the pointer to the value of constant (as this value is inside the already
preprocessed line, it is composed of line elements). The hash value is made
by the "hash_constant" routine, called with EBX pointing to the name of
constant, CL containing the length of name and CH zeroed except for the lowest
bit, which is the flag set in the case of prioritized constant (the one defined
with "fix" directive). It returns in EAX value to be stored as the first double
word of the entry in table, the structure of this value is very similar to the
one used for macroinstructions - it contains the length of name in the lowest
8 bits, the FNV-1a hash in the highest 22 bits and the priority flag in bit 8.
  The table of symbolic constants is pointed to by the [labels_list] variable,
and this address is moved down while the new entries are added to the table.
To summarize: at the beginning of additional memory block (the address stored
in [additional_memory] variable) starts the table of macroinstructions and
ends at the [free_additional_memory] address, free memory follows and ends at
the [labels_list] address, the table of symbolic constants follows and ends at
the [additional_memory_end] address.


3.5  Preprocessing line

The "preprocess_line" procedure is called after the source line has been
already completely converted into format described earlier. It has to preserve
the ECX and ESI registers for the "preprocess_source" loop, and it should
keep the EDI register pointing to the first byte after the data of preprocessed
line. When entering "preprocess_line", the EDI register already points to the
first byte after the end of line data created by "convert_line" and if no more
preprocessing is done on this line, the EDI should be left unchanged. However
"preprocess_line" might alter some of the data portions inside the line and
the length of data might change - in such case EDI needs to be updated.
  The pointer to the line that needs to be preprocessed is stored in the
[current_line] variable, the "preprocess_line" checks whether this line
contains some data that needs to be further preprocessed, and in such case
performs the appropriate tasks and updates the line with the changed data.
These tasks include interpretation of preprocessor directives and expanding the
macroinstructions and symbolic constants. If some directive is detected,
preprocessor jumps to the handler of such directive. Each directive handler
ends with the jump to the "line_preprocessed" label and the EDI should contain
a valid pointer to the first byte after the preprocessed line when performing
this jump.
  The first check that is done by "preprocess_line" is the check for definition
of prioritized constants - to detect such definition preprocessor checks
whether the first two data portions are symbols, and whether the second symbol
is the word "fix". If such situation is detected, preprocessor jumps to the
"define_fix_constant" handler. Otherwise it calls the "process_fix_constants"
routine, which checks the line for occurences of prioritized symbolic constants
and if any are found, it replaces them with their values.
  In case when the line that is preprocessed is not line converted directly
from source, but the line generated by macroinstruction, the prioritized 
constants and their definitions are not detected and processed in such line
(since they were already processed in the original source line that generated
this new one). Instead, it calls the "process_macro_operators" routine, which
does some processing specific to lines generated by macroinstruction - it will
be explained in details later.
  After this initial processing is finished, the main check for directives and 
macroinstructions comes. If the first element of line is the symbol, preprocessor
checks whether is it a directive with the "get_symbol" routine. This procedure 
needs ESI to point to the data of symbol, ECX to contain the length of symbol 
and EDI to point to the table of symbols that have to be recognized. Each entry 
in that table begins with one byte containing the length of symbol, then this 
amount of bytes containing the symbol itself and then one 16-bit word containing 
identifier of the symbol. In case of directives the identifier is the address of
directive handler relative to the beginning of preprocessor module. If symbol
is found in the table, the "get_symbol" clears the carry flag and returns the
found identifier in AX. Also it leaves the ESI pointing to the first byte after
the symbol data (in this case - to the next element of line). Otherwise it sets
the carry flag and leaves the ESI and ECX unchanged.
  When the symbol is recognized as directive, its first byte is replaced with
the value 3Bh, what causes the rest of given line to be ignored by the parser.
Then preprocessor jumps to the directive handler with ESI pointing to the first
line element after the directive symbol.
  When symbol is not a directive, the further checking is done - this time to
detect, whether this symbol has been defined as macroinstruction. This one is
performed by calling the "get_macro" procedure. It needs CL to contain the
length of symbol, ESI to point to the symbol data and CH to contain the
structure flag - when CH is set to 1, it searches for the structure macro of
given name, otherwise it searches for the standard macro of such name. In this
case the CH is zeroed, since the first symbol in line can only be a standard
macroinstruction. The "get_macro" sets the carry flag when no macroinstruction
of given name exists, otherwise it clears the carry and returns in EBX the
pointer to found entry in table of macroinstructions. If the symbol is
recognized as macroinstruction, preprocessor replaces its first byte with the
value 3Bh and jumps to the "use_macro" handler.
  If the given symbol is not a macroinstruction, the next element of line is
checked. If it is the colon character, the symbol is treated as label (which
will be processed later by the parser module). In such case both the label
symbol and the colon character are skipped and the checking described above is
done again, this time starting from the line element following the colon.
Preprocessor treats it as a new starting point of the line and will not return
to the skipped part, therefore when skipping the label symbol, it checks it
for the event of being a symbolic constant, and in such case replaces the
label symbol with the value of constant, shifting the rest of line (and the
new starting point for line preprocessing) if necessary - the whole task of
skipping the label symbol is done by the subroutine starting at
"preprocess_label" label.
  If the first element of line is a symbol, but is not recognized by any of the
checks mentioned above, preprocessor goes to the second element. When it is
also a symbol, it is first checked to be the symbolic constant definition
directive. In such case it jumps to the "define_equ_constant" handler,
otherwise it uses the "get_macro" procedure again, this time with CH set to 1
in order to find out whether the structure macroinstruction of given name
exists. If none is found, it jumps to the "not_preprocessor_symbol" label -
what means that the line contains instructions that will go through the parsing
process - and finally calls the "process_equ_constants" procedure. This routine
needs ESI to point to the first element of line which has to be processed and
EDI to the first byte after the end of line, it replaces all the symbolic
constans in the line with their values and updates the EDI accordingly if the
length of line has changed because of this operation.
  In case, when the structure macroinstruction is encountered, preprocessor
changes the line data to make the first element of line be parsed as a label
and the rest of line to be ignored by parser. When the first element is not
a symbolic constant, it's enough to put the colon character in the place of
first byte of the second symbol and then the 3Bh byte and the length of
second symbol shortened by one. The symbol containing name of structure
macroinstruction is destroyed this way, but it doesn't matter, since it had
been recognized by preprocessor already, and parser will not do anything more
than skip it, which will work correctly even in case of zero-length symbol
(what can happen when the macroinstruction name was one character long).
In case, when the first symbol is a symbolic constant, there is no need to
modify the second symbol, since the rest of line data has to be shifted
anyway. So the preprocessor shifts all the line elements starting from the
second symbol by the appropriate amount, so the value of symbolic constant,
the colon character and the zero-length symbol starting with 3Bh byte all will
fit just before the rest of line data. When the line is ready, preprocessor
jumps to the "use_macro" handler, with the [struc_name] variable containing
the pointer to the structure label (the first element of line).


3.6  Preprocessing directive handlers

  As it was stated in previous section, when preprocessor recognizes the first
element of line to be the directive, it marks such symbol by replacing the
initial byte with value 3Bh and then jumps to the handler of recognized
directive. The ESI points to the first element of line following the directive
symbol, the EDI points to the first byte after the line data, what also means
the first byte after the already prepared source. The directive handler may
add even the whole new lines there and update the EDI. After peforming all
its task, the handler should jump to the "line_preprocessed" label, with the
correct pointer in EDI register.
  The "include_file" handler performs the inclusion of additional source
files. After finding the file specified by the directive parameters, it
calls the "preprocess_file" routine (since handler itself is executed by the
subroutine called from the "preprocess_file" procedure, this is a recursion
process, limited in depth only by the available stack space) to make the
source lines out of it and place them in the area pointed to by EDI register.

[...]