flat assembler
Documentation and tutorials.
Understanding flat assembler
This text is a kind of guide for advanced users, that summarizes some of the rules of the flat assembler's language and teaches some advanced techniques of combining its various features. It also has the purpose of explaining some of the behaviors that may be confusing and not conforming with the expectations unless one understands exactly how the things work and cooperate in various layers of the language used by flat assembler, which is in some aspects unique among the assemblers.
This guide can not, however, replace the manual - it assumes you already have some basic knowledge about the flat assembler's language so now you can go into understanding it deeper.
Assembler as a compiler and assembler as an interpreter
The implementations of programming languages are divided into two main classes: compilers and interpreters. The interpreter is a program which takes a program written in some language and executes it. And the compiler is simply a program that translates program written in one language into another one - the most commonly the target language is the machine language, so the result can be then executed by the machine.
From this point of view, the assembler appears to be a kind of compiler - it takes the program written in assembly language (the source code) and translates it into machine language. However there are some differences. When compiler does the translation from one language to another, it is expected to make the program in the target language that will run (when executed by some interpreter or processor) the same way and give the same results. But in exact details like choosing between many possible language constructs that would do the same, the compiler has freedom of choice - while it is expected to make possibly the best choice, the various compilers translating between the same two languages may give quite different results, even though the programs will do the same. The assembler is a bit different in this aspect, as there is a kind of exact correspondence between the instructions of assembly language and the instructions of machine language they are translated to. In fact in most cases you know what bytes exactly will be generated by the assembly language construct. This is what in fact makes assembler behave a bit like interpreter. This is the most obvious in the case of directives like:
db 90h
which tells the assembler to put the one byte with value 90h at the current position in the output code. This is more like if the assembler was an interpreter, and the machine language generated by assembler was simply an output of the interpreted program. Even the instructions, which in fact represent the machine language instructions they are translated to, can be considered to be actually the directives that tell assembler to generate the opcode of given instruction and place it at current position in output.
Also one can put no instruction mnemonics at all into the source code, and use only DB directives to create, for instance, just some text. In such case the output is not a program at all, as it doesn't contain any machine language instructions. This makes assembler appear to be really an interpreter.
But when someone writes the program in assembly language, he usually thinks about the program he writes in machine language - assembler just makes the task of creating program in machine language easier, providing easy to memorize names for instructions (called mnemonics for that very reason), allowing to label various places in memory and other values with names and then calculating the appropriate addresses and displacements automatically. When writing some simple sequence of instructions in assembly language:
mov ax,4C00h
int 21h
one usually doesn't think about them as interpreted directives that generate the machine language instructions. One does think as if they actually were the instructions which they generate, one thinks in terms of the program he writes with the assembly language, not the program he writes in assembly language. But there are actually those two programs merged together, two different layers of thinking in one source code. This makes assembler a new quality: a tool that appears to be both compiler and interpreter at the same time.
Run-time layer and interpreted layer
Let's look at the two simple pieces of assembly language that both do add the EAX to itself, repeating this process five times. The first one uses ECX to count the repetitions:
mov ecx,5
square: add eax,eax
loop square
This generates the three machine language instructions, and the operation that adds EAX to itself, is performed five times when the processor executes the machine code that is generated from the above source. This is done by decrementing ECX by one and jumping back to that instruction if ECX is still not zero. The second sample looks simpler:
repeat 5
add eax,eax
end repeat
This time the directive of the assembler is used to repeat the instruction five times. But no jumping is done here. What assembler generates when meets the above construct, is actually the same what would be generated when we wrote it this way:
add eax,eax
add eax,eax
add eax,eax
add eax,eax
add eax,eax
Assembler generates the five copies of the same machine instruction. As LOOP instruction is used to create the run-time loops, the looping that happen when processor executes the machine code, the REPEAT directive makes an assembly-time loop, it repeats interpreting the block of directives that follows it up to the corresponding END directive. What's in this block is in this case is an instruction, but as it was said earlier, the instruction is in fact a directive that causes some machine language instruction be created. Thus it repeats five times interpreting the ADD directive, which each time generates the code of instruction that adds EAX to itself. This in a perfect example of the interpreted layer of assembly language. Nonetheless there still is also a run-time layer here: what we actually achieve, is getting the five copies of the same machine language instruction, executed one after another. The next sample is more pure interpreted language:
A = 1 repeat 5 A = A + A end repeat
This piece of flat assembler's dialect of assembly language defines the assembly-time variable called A and then five times repeats doubling its value. Everything that happens here is the interpreting, there's no machine code or any other such output generated at all, the only influence it would have on the run-time layer would be if the value of A was later used to affect some instruction.
Using flat assembler as pure interpreter
As it was already noted on example of the DB directive, the output of assembler may not be a program at all. In such case the interpretation of assembler as an compiler is lost and the result of assembly is just plainly the output of interpreted language. Copy the below source code to file interp.asm to see it on an example:
file 'interp.asm'
repeat $
load A byte from %-1
if A>='a' & A<='z'
A = A-'a'+'A'
end if
store byte A at %-1
end repeat
This is a program written entirely in flat assembler's interpreted language, using some of it's advanced features. It first places the entire contents of interp.asm file (its own source) at current position (it is always 0 when starting assembly) and then for each byte of that file (the $ is the value of current position, so after putting the whole file at position 0, the $ becomes equal to the size of this file) it repeats the process of: taking this byte, converting it to upper-case with help of the simple condition check and writing the modified byte back. This way it converts the whole contents of the file to upper-case, and this is what finally land in the output file. So this an example of simple text-conversion program written in the interpreted language, and since the output is a text and doesn't contain any machine language, the aspect of assembler being a compiler is completely absent in this case.
Code resolving
The last samples all showed the description of assembler as an interpreter as enough to fully explain what it does. But after we agreed (I hope) that assembler can be read as an interpreter, now it's time to show where this intepretation fails. Because assembler truly is both a compiler and interpreter at the same time, and none of those terms alone is able to explain correctly what the assembler does.
One of the crucial features for the assembler to have is the ability to label different places in machine code or data with freely chosen names, and to use those names instead of raw addresses inside the instructions. This, in particular, applies to jump instructions, so you it's possible to write instructions jumping to other places in program using only names labelling those places, and the programmer doesn't need to worry what exact addresses are they. However when making a jump, a programmer may need to jump forward aswell as backwards. When jumping backwards, the assembler has an easy task - it already knows what the address was it when it met that name and interpreted it as label. However when jumping forward, the assembler that was just interpreter, would fail. There is no way how it could know where the address of the label it didn't yet encounter.
But assembler still has to do it, and in this aspect it behaves again like a compiler and not interpreter: given the instructions and labels written in assembly language it needs to find out the equivalent instructions and addresses in machine language. There are few possible methods for achieving this, one of them is to interprete the whole source once, leaving the empty places in created machine code for the values that are not known at the time when they are needed, and then - after this is finished and so all the values of labels are already known - those empty places can be filled with right values. However there is another problem, when the assembler wants to generate as optimal machine code as possible (with respect to size, what also affects the speed), and the machine language of given processor architecture this assembler works for (like x86 architecture, which is the one for flat assembler) allows different in length forms of instruction codes depending on what range the target address fits in. For example on x86 architecture there are short forms of jump instructions that can jump only to addresses not further than about 128 bytes forward or backwards, and the longer forms of analogous jumps, that can jump further. In such cases assembler may try to generate the short forms when possible, but if it used the short form in case, when the value of target address is not yet know, it may then realize that this address is too far away, and the longer form would have to be used in this place.
Thus, to make the more effective optimization possible, flat assembler uses the different approach. It first interpretes the whole source, and chooses the smallest possible form of instruction every time - when the value of target address is not yet known, it also generates the shortest form, assuming the best possible optimization may happen. But after it finishes, and knows all the values of labels, it doesn't look back at what it generated to check out whether it may be filled with right values. It then interpretes the whole source once again, but this time using the values of labels gathered the previous time to predict the correct target addresses in the places, where they were unknown earlier. If indeed the best possible optimization can be done, everything this time is assembled the same, only now filled with the right values everywhere. But it also may happen, that some of the instructions will have the different length this time than they had previously. In such case all the labels defined after such instruction would get their addresses shifted, and the predictions made with the addresses gathered previosly would become not exactly true. But this does not discourage the assembler. When it realizes that some of the addresses have changed during the second interpretation of the source, it decides to interprete it once again, taking more up-to-date values of labels and trying to predict the right values better this time. This process may be repeated many times, but hopefully finally the predicted values match their definitions and only then assembler decides it has finished its job and writes out the output generated during its last pass through the source.
This whole process is called resolving of the code. This approach not only allows the optimal code to be compiled from the given sequence of machine instructions and labels, but also also allows combining the interpreted layer of the assembler with a compiled one in an interesting way. Look at this sample:
alpha:
repeat gamma-beta
inc eax
end repeat
beta:
jmp alpha
gamma:
Assembler needs to resolve the values of beta and gamma labels, since they are used before they are defined (it is usually called that they are forward-referenced). And those values affect how many times the INC instruction gets repeated. The interpreted layer is not really visible here, since everything here needs to be resolved and assembler needs to find the right solution on its own. There even may not exist a solution at all, like if we substracted alpha instead of beta from the gamma label. In such case the value of this difference would obviosly be larger than the number of repeats, so the the source would get self-contradictory and assembler would not approach any solution.
In the above sample the interpreted layer became faded out by the resolving. But the next one shows the case when they clearly coexist:
mov eax,gamma
A = 1
repeat 5
A = A*%
end repeat
label gamma at $+A-7
The A is an assembly-time variable, and its final value is clearly calculated with help of the interpreted language. But then this value is used to define label gamma, which is forward-referenced and needs to be resolved. In fact we could even use the = here to define the value of gamma, since the assembler would treat such definition (when the definition of gamma name is encountered only once during interpreting the source) as a definition of global constant, not the assembly-time variable, and so it could also be forward-referenced. This leads us to another interesting example of code resolving:
dd x,y x = (y-2)*(y+1)/2-2*y y = x+1
Here the = is used to declare numerical constants that are forward-referenced and the assembler needs to resolve their values. Moreover, their values depend on each other and it's even hard to tell immediately whether there exist some solution where all the values would match. However if we try to assemble it with flat assembler, it manages to find out the solution in a few passes - it is with x=6 and y=7. It's actually a rare case when the code resolving algorithms designed rather for optimization of the machine code happen to be able to solve even such set of equations, but it also gives us a pure sample of what the code resolving has to do.
By moving the DD directive to the last line in above sample, we would leave only y as a forward-referenced symbol, while x would appear to be just auxiliary variable to help calculate the self-dependent value of y. We could even split those calculations into many separate operations on x:
x = y-2
x = x*(y+1)
x = x/2-2*y
y = x+1
dd x,y
to again emphasize a bit more the interpreted layer. But note that we could not define the self-dependence of y without such intermediate variable, as flat assembler doesn't allow forward-referencing a symbol inside it's own definition, so that definitions involving own previous value are reserved for the assembly-time variables (like the x in this sample).
The resolving of code becomes even more complex issue when we consider the IF directive with all the possible complex dependencies it can create. You can find some examples of such problems in the section about multiple passes in the flat assembler's manual.
Of course for given set of dependencies more than one correct solution may exist. In case of simple dependencies between instruction forms and label addresses flat assembler tries to find the one with possibly smallest instruction encodings, but this doesn't mean it always find the smallest existing solution in general, and in some cases it may not be able to find the solution even though it exists. This is because the prediction algorithms it uses were designed with focus on generating the quality machine code, and not solving any complex arithmetic problems that can be encoded with its language. But if the assembler finishes its job without signaling any error, you can at least be sure, that all the dependencies are fulfilled and the output is correct with respect to them.
Preprocessor
Still, flat assembler has one more layer, which makes the whole thing even more complex. It's the preprocessor.
The main point of preprocessor is that it operates on the whole source code before it goes into the assembler, and what it does is mainly the text processing, that allows to create with some simple statements the much more complex sets of assembly instructions. There is a set of special directives, called preprocessor directives, which are interpreted only by preprocessor and removed from the source before passing it to the assembler. Everything else that preprocessor finds in source, it passes for the assembler to process.
For example, let's consider we've got this source:
mov ax,bx
include 'nop.inc'
mov cx,dx
and the contents of the NOP.INC file is just:
nop
What preprocessor does with it? It interpretes the things it recognizes, like INCLUDE directive, which tells it to put the whole contents of NOP.INC file in place of it. The things it doesn't recognize it leaves intact. So what is finally passed to the assembler is:
mov ax,bx
nop
mov cx,dx
Note that for the assembler itself there is no such thing as INCLUDE directive here. The preprocessor already prepared the simple sequence of instructions for him. There would be the IF directive put just before the INCLUDE and END IF inside the included file, and none of them would complain, as for preprocessor the IF and END IF anyway have no meaning, it just leaves them for the assembler; and the assembler wouldn't see any discontinuity, since after the preprocessing that would be the correct sequence of the assembly directives.
When preprocessor puts some new lines into the source, like when it replaces the INCLUDE directive with all the lines from that given source file, it also preprocesses all those new lines before it goes further. Thus the included file can also contain directives for preprocessor that get recognized and appropriately processed.
Also the lines that do not contain preprocessor directives may still get altered during preprocessing. This is because of the text replacement features processor provides. Such replacements are done with the so-called symbolic constants. You define a symbolic constant with an EQU directive, like:
A equ +
With such definition, the name A is replaced with the + symbol everywhere where it is found by preprocessor after that definition. Note that because preprocessor just goes once through all the source (so acts much like just a pure intepreter), the replacement is applied only after the A gets defined. For example:
mov eax,A
A equ ebx
mov eax,A
will after preprocessing become:
mov eax,A
mov eax,ebx
Another important thing about symbolic constants is that they are not really constants actually - they can act as a preprocessing-time variables, analogously to the assembly-time variables defined with = directive. So they can be re-defined, and perhaps we should call them symbolic variables instead, even though in manual they are called constants (for historical reasons). Just like with the assembly-time variables, you can re-define preprocessing-time variables using the previous value to make the new one. For example:
A equ 2 A equ A+A
defines symbolic variable with value 2+2. This works because the replacements of symbolic variables with their assigned values is done also in the line containing the EQU directive, though only after the EQU. Thus also this:
A equ 2 B equ + A equ A B A
defines symbolic variable with value 2+2. The whitespace for preprocessor is important only where it is needed to separate names that would become one longer name if they were not separated this way. Any other whitespace is ignored and stripped out - for preprocessor it's only the sequence of symbols that counts.
Let's now summarize the differences between the preprocessing-time variables and assembly-time variables. The one that is obvious is that the assembly-time variables are purely numerical and always hold just some number or address values, while symbolic variables can have just about anything as a value (they can even have an empty value, when there's nothing else that whitespace and comments after the EQU directive). The fact that symbolic variables do just a kind of text substitution can be demonstrated on the following example:
nA = 2+2
mov eax,nA*2
sA equ 2+2
mov eax,sA*2
The first line defines nA to have numerical value of 4, and the next one calculates the value to put into EAX as nA multiplied by 2, so the instruction that is assembled is MOV EAX,8. To the contrary, the third line defined sA to be equivalent to 2+2 text, and thus the instruction in last line is changed by preprocessor into MOV EAX,2+2*2, what is later assembled into MOV EAX,6. Thus you should be careful with symbolic variables and always think what effect they may cause on the source when they are replaced with the text you assigned to it.
The another subtle thing here is that EQU and = directives are processed by different layers. All the replacements that EQU causes are done before the whole source is passed to the assembler. Let's look at the effects of this in this sample:
A = 0 X equ A = A+ X 4 dd A
After this source being chewed by the preprocessor, this is the result that is fed into the assembler:
A = 0 A = A+4 dd A
Thus what you finally get is the 32-bit data field filled with the value of 4. This example shows how you can get the different layers to cooperate - however such task requires that you realize exactly what belongs to which layer - we will talk more about mixing of layers later.
Macroinstructions
The macroinstructions (often called in short macros) are another feature of preprocessor. The macro is a recipe for the preprocessor, and when you use macro with some set of parameters, preprocessor applies this recipe to create some new source lines and it puts them in place of the line that invoked the macro.
The definition of macro is treated by the preprocessor as a one large directive (as it may span multiple lines), so the whole recipe is itself unaffected by any preprocessing, and is not passed to the assembler. When you invoke macro, however, and preprocessor uses the recipe to produce new lines, it also preprocesses all those new lines before it goes further - just like it is with INCLUDE directive. Let's consider this simple macro:
macro Def name
{
name equ 1
}
As everything between the braces is the contents of the macro, the preprocessor doesn't notice the EQU directive here - all the lines here are parts of the recipe, and all preprocessor does with it is that it notes to itself what the recipe for the Def macro is and goes further, without leaving anything for the assembler out of those lines. Now what happens when we use that macro? Let's say we do it this way:
Def A
Preprocessor will replace this line with the lines generated out of recipe for the Def macro, in this case it will be just one line:
A equ 1
This new line is then interpreted in a standard way - thus preprocessor recognizes the EQU directive here and applies it to define A constant.
This also means that a line generated by macro can contain an invocation of another macro. Note however that is not possible for a macro to generate invocation of itself, as while processing the lines the macro that generated them is disabled and the previous meaning of that macro is applied (in a similar way like if you used "purge" directive, the only difference is that this macro is enabled again when processing its invocation is finished). This makes it harder to make recurrent macros (but it's still possible, we will discuss it later), however the quality of such behavior is that you can stack the definitions of macros, like it is shown in the manual.
There are some special operators and directives that can be used inside macro definitions - they rule the way in which preprocessor applies the recipe to generate the new lines. So it is obvious that as those operations are performed in order to generate the new lines, they are done before the preprocessing that happens on those lines when they are finished. For instance:
macro Inc f
{
include `f#'.inc'
}
Inc win32a
Here the recipe tells preprocessor to convert the first symbol of f parameter into quoted string and then attach the '.inc' string to the result. If the parameter consists of exactly one symbol, the result of those operations will be just one quoted string, and the invocation of Inc macro in above sample should generate such line:
include 'win32a.inc'
Which is then preprocessed as usual, so it includes the win32a.inc file.
The backslash-escaping is useful when you need to put into the lines generated by macro some symbols that would get intepreted as a macro recipe operations otherwise (if they were not escaped). Like:
macro Defx x
{
\x db x
}
Here x is the parameter of macro, so when applying the recipe preprocessor puts the value of that parameter everywhere in place of it. However this macro is intended to define the byte variable labelled x, with the value given by the parameter to macro. The conflict of names could be resolved by giving some distinct name to the parameter, but here you can see how the escaping can also be used to give the desired result. When generating a line from a macro, preprocessor cuts out the first backslash of any escaped symbols and this is all what it does with them.
The most frequent use of escaping is however related to defining macros by macros. Since the lines generated by macro are preprocessed just like any others, they may themselves contain defitions of macros. But you need to escape any operators for such child macro to prevent the parent macro from interpreting them while unrolling itself. Let's go for a bit more complicated example this time:
macro Parent [name]
{
common
macro Child [\name]
\{
\common
forward
name dd ?
common
\forward
\name dd ?
\}
}
and see what happens when we invoke such macro this way:
Parent x,y
The macro directives and parameter names that are not escaped are interpreted by the Parent macro, while the escaped ones will go into the definition of Child. The result is:
macro Child [name]
{
common
x dd ?
y dd ?
forward
name dd ?
}
It is recommended to stop over this example until you understand exactly what is happening here.
It is clear why we have to distinguish various levels with the escaping like in the above sample. The questions that is sometimes asked, is why do we need to escape the enclosing braces, too. Such escaping obviously helps keeping track of what escaping should be used where, especially when there may be many levels of macro definitions - the symbols that we want to be recognized by some appointed macro has to be escaped with as many backslashes as its braces. And the manual explains that escaping of the closing brace is needed, because the first closing brace preprocessor meets is always interpreted as the end of macro definition, so if we did not do backslash-escaping on closing brace of Child macro, it would be interpreted as the end of recipe for Parent macro. But why preprocessor cannot just count all the braces to determine which one is closing which block? The answer is already hidden it what it was said here. It is possible for macro to only begin the definition of another macro and not end it, like it is shown in the example of creating alternative definition macro syntax in the manual. After such macro is processed, the preprocessor gets the started definition of another macro, and then gathers the following lines into this definition until it finds the closing brace (note that this means that no preprocessing is done on the lines following that macro, and thus the only way to close such definiton then is to provide the closing brace directly - but the FIX directive may also provide such closing brace by producing it from other symbol, as it is shown in the manual).
Also there might be some other out-of-order effects on the blocks generated by macro, like when there are some opening braces that do not open any block at all, or when they are affected by the repeating blocks. For all those reasons combined, the preprocessor does always treat first closing brace that it meets as the end of the macro definition, and thus you have to backslash-escape all the braces that you want to be generated by this macro. Well, you can omit backslash-escaping the opening braces, but it is anyway recommended to do it, to help keep track of the things.
Instantaneous macroinstructions
There are some directives that define the macros that are not given any names, but instead are invoked just when they got defined. Those directives are REPT, IRP, IRPS and also MATCH (which deserves a separate section), and we may call them instantaneous macroinstructions to emphasize the fact that they are applied just once, immediately after being defined.
They may also differ from the standard macros in a way in which the parameters are provided, but their definition blocks are just the same kind of recipes, and generate the new lines in the same way as regular macros. This also means that when you put instantaneous macro inside some other macro, you must do the appropriate backslash-escaping for it to work properly.
To demonstrate how the instantaneous macros are related to regular macros, here are the four equivalent constructions:
rept 4 i
{
; ...
}
irp i, 1,2,3,4
{
; ...
}
irps i, 1 2 3 4
{
; ...
}
macro Inst [i]
{
; ...
}
Inst 1,2,3,4
If you put the identical recipes into each of those four blocks, each of them will do exactly the same. In particular, the FORWARD, REVERSE and COMMON directives will have indistinguishable effects in all the cases.
From the three types of instantaneus macroinstructions demonstrated above, the IRP is the only one where the values of parameters are given in basically the same way as for the named macroinstructions - they are separated with commas, and thus may happen to be empty (unless you tell to preprocessor that the value of parameter cannot be accepted as empty, by putting * character after the name of parameter), and you can enclose the value of parameter with < and > characters if you need to provide value that contains the comma itself.
The REPT directive on its own generates all the possible values for the counter parameters, you can only adjust the base value for each counter - or not use counters at all. Still, those counters behave in the same way as macro parameters given the lists of possible values.
As for the exact explanation of IRPS directive, we first need to know a few more details about how preprocessor perceives the source text.
Tokenization of source and line makers
Preprocessor is not in fact working on the source text in the exact form how is it stored in file. It extracts from each line its actual contents, ignoring the redundant whitespace and comments. What it does, is actually splitting the each line of source into the sequence of simple tokens, and since then all the processing is performed on those tokens (in manual they are called symbols, for the historical reasons, here we will use both terms interchangeably).
The first class of tokens are the symbol characters. The manual states that all of them are:
+-/*=<>()[]{}:,|&~#`
Each of those characters, when it occurs somewhere in the source text, is an independent entity and becomes a one separate token. The other special characters are also the whitespace ones - space and the tab, which don't form any token themselves (though may separate some entities that become individual tokens), the line breaking characters (with obvious role), the semicolon (that marks the beginning of a comment and thus all the rest of line is ignored entirely) and also the quotes and backslash character, which will get covered later.
Any sequence of the characters that are not special ones, like the continuous sequence of letters and digits, becomes a name token. Such sequence can be split into separate token by either whitespace or some symbol character. For example this line:
mov ax,2+1
contains six tokens: first the MOV name token, then the AX name token (they are separated by whitespace), the comma symbol character, the name token 2, the plus symbol character and the name token 1. Putting any additional whitespace into this line wouldn't change in any way how it is seen by preprocessor. However removing all the whitespace between MOV and AX symbols would make them become a single name token.
There is still one more type of tokens, the quoted strings. When the first character of token is either a single or double quote, it is interpreted as a quoted string, and all the following characters other than line break are fed into this single token until the closing quote is met (but, following the standard of many assemblers, the two quotes in a row do not close the string and are included into it as a single character). So this line:
db '2+2'
contains two tokens, a name token followed by a quoted string. Nevertheless, if the quote occurs not as a first character of token, but is placed somewhere inside, it doesn't have any special meaning (quotes are not a special characters by themselves). Thus:
Jule's:
is just a regular name token followed by the colon symbol character.
The backslash is a special character that has two different meanings depending on its position. If backslash is followed by some token, it is integrated with that token into single one. This may happen recursively, so if the backslash is followed by the backslash followed by some other token, they will all finally become one token. This feature exists solely for the purpose of escaping the symbols in the macroinstructions.
If backslash is not followed by any token, it is interpreted as a line continuation character, and the tokens from the next line from source are attached to the token of current line. This way from many lines of source text a single tokenized line in the preprocessor's sense may be formed.
Thus now we understand that what is here called a lines that are preprocessed, are actually the sequences of tokens, not the raw text of the source. And such sequences can be born in a two different ways - either created directly from the source text, or generated by macroinstruction. Thus we've got a two different "line makers" - one being "source reader" and one "macroinstruction processor".
As we have seen, there is a bunch of special commands, like concatenation operator or LOCAL directive, that may be issued while macroinstruction generates a new lines. In a similar way, there are special commands that are understood and obeyed only by the source reader. The backslash character mentioned above is an example, and the FIX directive is another.
The FIX directive provides a kind of textual replacements (actually the token replacements, as it defines replacement of single name token with some specified sequence of tokens) like EQU or DEFINE, however those definitions and replacements are done by the source reader, while preparing a lines to be then preprocessed. This way you can have some replacements done before the other preprocessing happens.
This text is still being worked on - you may find more information here in the future, as the new sections are added.
Copyright © 2004-2010, Tomasz Grysztar.