Macros for parsing and data retrieval by Massimiliano Tomassoli 2009 (v. 1.0) ------------------- Regular Expressions ------------------- A language is just a set of sentences. This set can be defined in many ways. The obvious way is to list all its elements. Another way to use a regular expression. Note that regular expressions can represent only a limited family of languages, but we will soon get rid of that limitation. A sentence of a language is a sequence of symbols chosen from a given alphabet. Due to Fasm's fixed tokenization, our alphabet will be composed of tokens. But for now let's stick to the usual theory. Let's see a few regular languages over the alphabet {a,b,c,...,z} and their respective regular expressions. 1) L1 = {a, aa, ab} re1 = a|a~a|a~b | is the "or" operator and ~ the "and" operator. Since ~ has the highest precedence, no brackets are needed and we may also write re1 = a|aa|ab What it means should be obvious: re1 accepts sentences that a) are equal to "a" or b) are equal to "aa" or c) are equal to "ab" Regular expressions are not unique. Here's another one equivalent to the first: re1b = a(blank|a|b) Let's define "blank" as the neutral element of the operator ~, i.e. ~blank = and blank~ = (as in 5*1 = 1*5 = 5). re1b accepts sentences that starts with "a" followed (only) by a) nothing or b) "a" or c) "b". Of course, the brackets are special characters which don't belong to our alphabet so we can use them safely. 2) L2 = {a, aa, aaa, aaaa, aaaaa, ...} Yes: L2 is an infinite language. We cannot describe it just with | and ~. We need the operator * defined this way: _{* = blank|_{|_{_{|_{_{_{|...
It might seem like a joke, but when sub is a complex subexpression things get interesting.
re2 = a~(a*) = aa*
Another option is
re2b = a+
where + is defined as follows:
_{+ = _{_{*
The operators seen so far listed in increasing order of precedence are:
|
~
*,+ (same precedence)

3) Let's add " " (space) to our alphabet.
L3 = {push eax, push ebx, push ecx, push edx,
push eax eax, push eax ebx, ... push edx edx,
push eax eax eax, push eax eax ebx, ... push edx edx edx}
re3 = push( e(a|b|c|d|)x)+
Note that "push (" and "push(" are two different things: spaces are honored in our alphabet. I admit that the definition of L3 is a little bit ambiguous for what concerns spaces, but these are just minor details. Also note that re3 is a formal definition of (my idea of) L3 while the first one is not because "..." is ambiguous.

---------------
Parsing in Fasm
---------------

There are three important "variables":
curExpr it's equal to the expression to be parsed;
readSuccess it's 1 if the last parsing was successful, 0 otherwise;
outData it's equal to the output of the parsing.
Parsing commands "consume" curExpr and write their output to outData. I will also say that a command "returns" something.
If readSuccess is 1 when a parsing command CMD is invoked, CMD starts to parse curExpr.
If CMD fails, it sets readSuccess to 0 and outData to blank (with "outData equ"), otherwise it consumes curExpr and update outData.
If readSuccess is 0 when CMD is invoked, CMD modifies NEITHER readSuccess NOR curExpr. outData might or might not be modified, therefore do NOT make particular assumptions about it.

No parsing command will ever set readSuccess to 1 if it was 0 when the command was issued.

There are many commands (they're macros):
readCCurly
readOpenAngle
readClosedAngle
readAtom name
readEmpty
readEnd
readSymbol
readQuoted
readVar varName
assertEqual command, [names]
assertDiff command, [names]
readUntilPossible command
readOneThenUntilPossible command
andRead [commands]
orRead [commands]
alsoRead [commands]
thruRead [commands]
optRead command
saveOutData dest, command
appendOutData dest, command
expandDef defName

Please note that any kind of parsing carried out by these macros cannot override in any way Fasm behaviour.
For instance, "ebx" will be parsed as a single symbol, no matter what you do. On the other hand, "eax,3" will always be parsed as three symbols.

Let's start with something easy.

readSuccess equ 1
curExpr equ push eax
readSymbol

As I told before, if readSuccess is 0, readAtom do nothing, so the first line is needed.
The expression to be parsed is "push eax". The instruction readAtom reads a symbol, extracts it from curExpr and returns it in outData.
In the example above, after readSymbol,
curExpr = eax
outData = push
readSuccess = 1
readSymbol fails only if called when curExpr is blank.

The piece of code

readSuccess equ 1
curExpr equ push eax
readSymbol
readSymbol

does exactly what you think: it reads two symbol, extract them from curExpr and put the last one (the first is overwritten) in outData. readSuccess won't be modified by readSymbol.
In other words, after that piece of code,
curExpr = ; blank
outData = eax
readSuccess = 1

Now consider a similar piece of code:

readSuccess equ 1
curExpr equ push eax
andRead readSymbol, readSymbol

Macro andRead invokes the commands passed as arguments one by one but only while readSuccess is 1.
For instance, if the first ReadSymbol fails, the second one is not invoked.

To apreciate the difference, let's consider the following piece of code:

readSuccess equ 1
curExpr equ eax ebx
readAtom eax
readAtom ecx

From now on I'll say that a command "accepts" an expression if it successfully parses it.
readAtom only accepts the expression . In particular, "readAtom eax" only accepts "eax".

This is what happens:

readSuccess equ 1
curExpr equ eax ebx

readAtom eax
{
curExpr equ ebx
outData equ eax
}
readAtom ecx
{
outData equ
readSuccess equ 0
}

Now consider this different piece of code:

readSuccess equ 1
curExpr equ eax ebx
readAtom eax
readAtom ebx

What happens is this:

readSuccess equ 1
curExpr equ eax ebx

readAtom eax
{
curExpr equ ebx
outData equ eax
}
readAtom ebx
{
curExpr equ
outData equ ebx
}

The main problem is that now outData is equal to "ebx" and not "eax ebx" as one would expect. The right solution is to use andRead:

readSuccess equ 1
curExpr equ eax ebx
andRead readAtom eax, readAtom ebx

This gives us the expected result:
curExpr equ
outData equ eax ebx

andRead does another very important thing: if it fails it doesn't modify curExpr at all. That is, andRead becomes an atomic command and, as such, complies with the simple rules of parsing commands.
Let's be pedantic:
"readAtom eax" is a command,
"readAtom eax
readAtom ebx" are two commands,
"andRead readAtom eax, readAtom ebx" is a SINGLE (compound) command.

-------------------------------
Regular Expression with symbols
-------------------------------

Since FASM does tokenization for us, we have to use an infinite alphabet made of symbols! But there are things we can't do. For instance, we can't accepts all and only the languages made of words which starts with the letter "a".
The word "abacus" is read as a single symbol so there is no way we can recognize it as a word that starts with the letter "a". All we know is that "abacus" is different from "chopper" and equal to "abacus" (obviously).
But we have no problems with special symbols like ( ) [ ] { } < > , + * ~, etc... To tell you the truth I did have some problem with } > and commas...

-----------------------------------
Commands with a (brief) description
-----------------------------------

All commands, if successful, remove the subexpression accepted from curExpr and returns some output in outData (they usually returns what they extracted from curExpr).
If they fail, they set outData to the empty string (with "outData equ").

readCCurly
Accepts and returns the closed curly brace '}'.

readOpenAngle
Accepts and returns the open angle bracket '<'.

readClosedAngle
Accepts and returns the closed angle bracket '>'.

readAtom name
Accepts and returns the symbol name.
For instance
readAtom apple
accepts and returns "apple"

readEmpty
Does nothing (accepts the empty string).

readEnd
Like the other commands, if readSuccess is 0 it does nothing. Otherwise, it sets outData to the empty string and, if curExpr is NOT blank, also sets readSuccess to 0.

readSymbol
Accepts and returns any (non empty) single symbol.

readQuoted
Accepts and returns quoted strings (e.g. "this is a quoted string").

readVar varName
This is an exception.
It doesn't modify curExpr. If the variable var@varName is undefined, it does nothing; otherwise, it set outData to the value of var@varName.

assertEqual command, [names]
Invokes command and, if the value of outData is NOT in (the list of) names, restores curExpr, blanks outData and sets readSuccess to 0.

assertDiff command, [names]
Invokes command and, if the value of outData is in (the list of) names, restores curExpr, blanks outData and sets readSuccess to 0.

readUntilPossible command
Invoke command until it fails, then set readSuccess to 1 and returns the concatenation of the strings returned by all the successful invocation of command.

readOneThenUntilPossible command
It's equivalent to
andRead command, readUntilPossible command

andRead [commands]
Invokes the commands one by one and succeeds only if every single command succeeds. When a command fails, subsequent commands are NOT invoked (short-circuit evaluation).
If it succeeds it returns the concatenation of the strings returned by single commands. If it fails, it restore curExpr, blanks outData and sets readSuccess to 0.

orRead [commands]
Invokes the commands one by one until there are no more commands or one command succeeds. orRead succeeds only if one command succeeds (subsequent commands are ignored).
If it succeeds it returns the string returned by the successful command. If it fails, it restore curExpr, blanks outData and sets readSuccess to 0.

alsoRead [commands]
Invokes all the commands whether readSuccess is 1 or not. It does the same even if readSuccess is initially 0.

thruRead [commands]
It's like andRead but, if successful, returns the string returned by the last command.

optRead command
If readSuccess is 1, invokes command and set readSuccess to 1.

saveOutData dest, command
Invokes command and, if successful, saves the string returned by it in the variable var@dest.

appendOutData dest, command
It's like saveOutData but the string, enclosed within < >, is appended to the content of var@dest.

expandDef defName
This command expands a rule so that recursion is possible during parsing. I'll talk about this later.

-------------
Some examples
-------------

Let's consider the languages L1, L2 and L3.
1) L1 = {a, aa, ab}
re1 = a|a~a|a~b
cmd1 = orRead readAtom a, readAtom aa, readAtom ab
Note that cmd1 doesn't fail if curExpr isn't a word in L1. It only fails if curExpr doesn't start with a word in L1. We should also check that curExpr is been totally consumed:
cmd1 = andRead , readEnd

As I've already said, Fasm parses "aa" as a single symbol, then there is no explicit AND between a and a: "aa" is just an atom.

But if we wanted to emulate the classic behaviour, we could separate alphabet letters with spaces. For instance,
curExpr equ a b a c d e

2) L2 = {a, aa, aaa, aaaa, aaaaa, ...}
re2 = a~(a*) = aa*
re2b = a+

We can't do this with Fasm, so let's consider a variation of L2:
L2b = {a, a a, a a a, a a a a, a a a a a, ...}
cmd2 = andRead readAtom a, readUntilPossible readAtom a
cmd2b = readOneThenUntilPossible readAtom a
Note that in cmd2 andRead takes two arguments.

3) Let's add " " (space) to our alphabet.
L3 = {push eax, push ebx, push ecx, push edx,
push eax eax, push eax ebx, ... push edx edx,
push eax eax eax, push eax eax ebx, ... push edx edx edx}
re3 = push( e(a|b|c|d|)x)+
Here we can't parse registers character-wise but that's not much of a problem:
cmd3 = andRead \
readAtom push, \
>

Here's a technical note for you.
The commands andRead and orRead accept a list of comma-separated commands. When a command has an argument that contains some commas, we have to enclose that argument within angle brackets (otherwise it will be interpreted as a list of arguments). If that command -- let's call it CMD -- is an argument to another command we have to enclose CMD within angle brackets as well.
For instance,

andRead readAtom eax, readAtom ebx

is OK, but

readUntilPossible andRead readAtom eax, readAtom ebx

is not. Even if readUntilPossible took a list of arguments, it would be incorrect anyway, because "andRead readAtom eax, readAtom ebx" is a SINGLE command.
Therefore we must write

readUntilPossible

If this last expression is an argument to, say, orRead we need another pair of angle brackets:

andRead readAtom eax, \
>

because commands which takes a list of arguments doesn't like angular brackets in the middle of single arguments either.

------------------
Algebraic commands
------------------

It's clear that things get awkward very quickly as expressions become longer and longer. The solution is to use algebraic commands.
Compare:

1) L1 = {a, aa, ab}
re1 = a|a~a|a~b
cmd1 = orRead readAtom a, readAtom aa, readAtom ab

alg1 = a|aa|ab

2) L2b = {a, a a, a a a, a a a a, a a a a a, ...}
cmd2 = andRead readAtom a, readUntilPossible readAtom a

alg2 = a a*

cmd2b = readOneThenUntilPossible readAtom a

alg2b = a+

3) Let's add " " (space) to our alphabet.
L3 = {push eax, push ebx, push ecx, push edx,
push eax eax, push eax ebx, ... push edx edx,
push eax eax eax, push eax eax ebx, ... push edx edx edx}
re3 = push( e(a|b|c|d|)x)+
cmd3 = andRead \
readAtom push, \
>

alg3 = push(eax|ebx|ecx|edx)+

Here are all the binary operators listed in increasing order of precedence together with their definition:

operator =
expr1 = expr2 = ... = exprn
means
thruRead , , ...,

operator ,
expr1, expr2, ..., exprn
means
alsoRead , , ...,

operator |
expr1 | expr2 | ... | exprn
means
orRead , , ...,

operator ~
expr1 ~ expr2 ~ ... ~ exprn
or
expr1 expr2 ... exprn
mean
andRead , , ...,

Unary operators are applied starting from the left.
For instance,
a op1 op2 op3 op4
corresponds to
op4(op3(op2(op1(a))))

Unary operators always take precedence over binary operators.

----------
Important!
----------

Please note that some unary operators are quite special, in fact they take two operands! I call binary only those operators which can take two arbitrary sub-expressions. Special unary operators take only one subexpression and an auxiliary parameter that cannot be a sub-expression.

Here are the unary operators and their definition:

operator :
expr:varName
means
saveOutData varName,

operator >
expr:varName
means
appendOutData varName,

operator _in_
expr _in_ {name1, name2, ..., namen}
means
assertEqual , , , ...,

operator _notin_
expr _notin_ {name1, name2, ..., namen}
means
assertDiff , , , ...,

operator *
expr*
means
readUntilPossible

operator +
expr+
means
readOneThenUntilPossible

operator &
&varName
means
readVar varName

There are also some special unary or ternary operators (it depends on your point of view):

operator ( )
(expr)
means

operator [ ]
[expr]
means
optRead

operator { }
{nameDef}
means
expandDef nameDef

operator < >

means
cmd

There are some details I omitted to simplify the discussion but it's time to talk about them.

----------------------------
Special symbols and escaping
----------------------------

What happens if we want to parse the string
mov [eax]
?
We can't just write the algebraic parsing expression
mov [eax]
because [ ] is a special algebraic operator. The correct expression is
mov /[eax/]
where / is the escape character.
To parse the symbol
/
itself, use
//

What about
list {eax, ebx}
?
You can't write
list /{eax/,ebx/}
because that closed curly brace interferes with my macros. The problem is not in the algebraic expression but in the commands which are generated: they won't work. For instance,
readAtom }
will never work.
Do you remember the command readCCurly?
You'll need to use the algebraic expression
list /{eax/,ebx _CC_
where _CC_ is a special symbol for readCCurly (CC means Closed Curly).
Of course, you can escape
_CC_
by writing
/_CC_

Here are all the available special symbols:

_E_ becomes readEmpty
_END_ becomes readEnd
_S_ becomes readSymbol
_Q_ becomes readQuoted
_CC_ becomes readCCurly
/< becomes readOpenAngle
/> becomes readClosedAngle

The last two are just escaped symbols, but they corresponds to special commands (readAtom won't work for them).

Please note that all this does NOT apply to non-expression arguments. The commands assertEqual and assertDiff expect a list of names as second argument. This list cannot contain the three symbols
}
<
>
You must use, respectively,
_CC_
_OA_
_CA_
Also note that you can't use complex names such as
_CC_ something else
Those special symbols are only recognized when used alone.
In the future I might eliminate those restrictions. For now, I don't think they are much of a problem.

------------------------------------
Translation of algebraic expressions
------------------------------------

Algebraic expressions need to be translated into plain commands like the one we used before. This is done by the macro translateRules.
Here is the format:
translateRules ruleName1, ruleName2, ..., ruleNamen
translateRules will create n variables containing the translations.
Instead of giving you more details, we'll look at an example and I'll discuss it.

@rule1 equ push~eax
@rule2 equ push~ebx
translateRules rule1, rule2

In this case translateRules does the following:
cmd@rule1 equ andRead readAtom push, readAtom eax
cmd@rule2 equ andRead readAtom push, readAtom ebx

Now you can use the two commands as follows:
readSuccess equ 1
curExpr equ push eax
cmd@rule1

Note that if a rule is called , we must put its algebraic definition into the symbol
@
and its translation will be put into the symbol
cmd@

---------
Variables
---------

The commands saveOutData and appendOutData can be used to save return values to variables.
Real variable names always begin with the prefix "var@", but are referred to without it. For instance, if you call a variable "addr", its real name will be "var@addr".

This is what we said before:

saveOutData dest, command
Invokes command and, if successful, saves the string returned by it in the variable dest.

appendOutData dest, command
It's like saveOutData but the string is not simply saved but appended to the content of dest.

Now I'll give you the missing details.

If you use the two commands above (or the two corresponding : and > algebraic operators), you need to set
toSaveVarNames
to the list of variables you want to use.

Why is all this needed?

Let's assume we want to parse a complex expression like
mov eax, [eax+ebx*4 + 65h]
As the expression is being parsed, many returned values are saved to our variables. What happens if the parsing fails?
We have to restore the variables to their previous value!
Here's an easy example:

var@reg =
curExpr equ mov eax, eddx
@rule1 equ mov (eax|edx)>reg /, (eax|edx)>reg

This is the first time we've seen > (i.e. appendOutData) in a real situation.
Here's cmd@rule1 after the translation:

andRead readAtom mov,
>,
>,
>

This is what happens if we invoke cmd@rule1:

; Note that var@reg =
andRead
{
globalOutData =
readAtom mov --> OK
curExpr = eax, eddx
outData = mov
globalOutData = mov

saveOutData reg,
{
orRead
{
readAtom eax --> OK
curExpr = , eddx
outData = eax
readAtom edx --> SKIPPED
}

var@reg = ,
}
globalOutData = mov eax

readAtom <,> --> OK
curExpr = eddx
outData = ,
globalOutData = mov eax,

saveOutData reg,
{
orRead
{
readAtom eax --> failed
outData =
readAtom edx --> failed
outData =
}
readSuccess = 0
}

globalOutData =
curExpr = mov eax, eddx
var@reg =
}

If andRead didn't restore var@reg to its previous value, it would now be equal to
,
and that's not what we want.
For instance consider the following pseudo-code:

orRead ,
,

Let's suppose that:
1) andRead1 modifies var@reg and then fails.
2) andRead2 modifies var@reg and then fails.
3) andRead3 modifies var@reg and succeeds.
Without proper saving and restoration, the value in var@reg would be WRONG, because it would also contain partial results written by the first two andRead.

In a nutshell, you must put the name of the variables you're going to use in
toSaveVarNames
otherwise your variables won't be restored when needed.

Luckily, matchExpr already does this for us.
Its prototype is
macro matchExpr mainRuleName, outVarNames
where mainRuleName is the startingRule (more about this later) and outVarNames is the list of the names of the variables you want to use.
Here are two examples:

1)
@rule1 equ push~eax
translateRules rule1

curExpr equ push eax

matchExpr rule1,
match =1 , readSuccess
{
display "match1!",13,10
}

2)
@rule1 equ (add|sub):instr _S_:destOp /, _S_:srcOp
translateRules rule1

curExpr equ add eax, 654h

matchExpr rule1,

If you want, you may call clearVars to "undefine" variables.
For instance,

clearVars var1, var2, var3

will undefine the variables

var@var1
var@var2
var@var3

i.e. it will use "restore" until these three last names become undefined.

-------------------
Rules and recursion
-------------------

As we have already said, if a rule is called, say, MyRule it must be defined as follows:

@MyRule equ

where is an arbitrary algebraic expression.
DO NOT FORGET THE CHARACTER @! IT'S REALLY IMPORTANT!
You refer to that rule by its name
MyRule
but the algebraic expression is
@MyRule
and the command expression generated by translateRules is
cmd@MyRule
Even when you don't use translateRules or matchExpr you should stick to those directions otherwise your code might not work, especially if you use recursion.

The command expandDef has the following prototype:

expandDef defName

where defName is the name of a definition, i.e. a rule (I should have changed some names for consistency, but I got used to them).
What expandDef does is quite simple: it executes the command defName refers to.
For instance,
expandDef MyRule
will executes the command
cmd@MyRule

In algebraic expressions you may use
{MyRule}
This is translated into
expandDef MyRule

expandDef is quite simple but it makes our parsing macros enormously powerful. Now we can define grammars and go well beyond what simple regular expressions can offer.

-------------
User commands
-------------

When you write plain commands manually it's easy to insert your own macros.
For instance, you might write:

macro cmd@myCommand
{
...
}

cmd@rule1 equ andRead readAtom a, cmd@myCommand

When you use algebraic expressions, you may use the operator < > as follows:

macro cmd@myCommand
{
...
}

@rule1 equ a

Note that we omit cmd@ when we refer to the command.

----------------------------------
Putting it all together: examples!
----------------------------------

Instead of describing every single command in every possible detail, we'll see a long list of examples and I'll point out what I think is important.
Brace yourself because the last examples are quite complex.

---------
Example 1
---------

@rule1 equ push~eax
translateRules rule1

curExpr equ push eax

matchExpr rule1,
match =1 , readSuccess
{
display "match1!",13,10
}

This is very easy, isn't it?
Note that
@rule1 equ push eax
would work as well.

---------
Example 2
---------

@rule1 equ (mov|push)(eax|ebx)
translateRules rule1
irp expr, push eax, push ebx, mov eax, mov ebx
{
curExpr equ expr

matchExpr rule1,
\match =1 , readSuccess
\{
display "match2!",13,10
\}
}

In this example we (successfully) parse the four expressions
push eax
push ebx
mov eax
mov ebx
one by one.

---------
Example 3
---------

@rule1 equ (mov eax /, _S_) | (push~ebx)
translateRules rule1

curExpr equ mov eax, 654h

matchExpr rule1,
match =1 , readSuccess
{
display "match3!",13,10
}

Here we use _S_ to read a generic symbol and we escape the operator comma (,) with the escape operator /. The operator ~, as always, can be omitted.

---------
Example 4
---------

; Escaping.
@rule1 equ mov /~ // /~ eax
translateRules rule1

curExpr equ mov ~ /~eax, 654h

matchExpr rule1,
match =1 , readSuccess
{
display "match4!",13,10
}

This is another example of escaping. Note that ~ is never interpreted as an operator in @rule1.

---------
Example 5
---------

; Data retrieval.
@rule1 equ (add|sub):instr ~ _S_:destOp ~ /, ~ _S_:srcOp
translateRules rule1

curExpr equ add eax, 654h

matchExpr rule1,
match =1 , readSuccess
{
display "match5: "
\match a|b|c , var@instr|var@destOp|var@srcOp
\{
display \`a," ",\`b,",",\`c,13,10
\}
}
clearVars instr, destOp, srcOp

In this example we parse the string "add eax, 654h" like before, but this time we also save "add" in var@instr, "eax" in var@destOp and "654h" in var@srcOp.
Note that matchExpr need to be passed the three variable names as a single arguments.
Like we said before, clearVars clears the three variables, i.e. var@instr, var@destOp and var@srcOp are undefined symbols after that line.
I've already said this briefly in the description of the commands: while saveOutData (:) overwrites variables with the exact output, appendOutData (>) append the output enclosed within < > and includes a comma if needed.
For example, if var@myVar is
firstValue
and the next output is secondValue, saveOutData will set var@myVar to
secondValue
while appendOutData will set var@myVar to
firstValue,
If one variable is written to only by appendOutData, it will have the following form:
, , , ...,

---------
Example 6
---------

; Extracts the symbols "tree", "oak", "pine", "/" and "," from the
; expression preserving the order in which they appear.
@rule1 equ ((_S_ _notin_ {tree,oak,pine,/,<,>})* _S_>wordList)*
translateRules rule1

curExpr equ armchair tree relax sun ,. | &/8 \
563 sunday pine dog,oak,,,fridge...

matchExpr rule1, wordList
match =1 wList, readSuccess var@wordList
{
display "match6: "
\irp w, wList
\{
display \`w," "
\}
display 13,10
}
clearVars wordList

rule1 does, more or less, the following:
while possible
while next symbol is different from "tree", "oak", ..., or ","
skip it
skip the next symbol and append it (within < >) to var@wordList

---------
Example 7
---------

; Extracts the symbols "tree", "oak", "pine", "/" and "," from the
; expression preserving the order in which they appear (second version).
@rule1 equ ((tree|oak|pine|/ /|/,)>wordList* _S_)*
translateRules rule1

curExpr equ armchair tree relax sun ,. | &/8 \
563 sunday pine dog,oak,,,fridge...

matchExpr rule1, wordList
match =1 wList, readSuccess var@wordList
{
display "match7: "
\irp w, wList
\{
display \`w," "
\}
display 13,10
}
clearVars wordList

This example is identical to previous one but for @rule1, which is equivalent but syntactically different. Note that this rule is more general because the current implementation of assertDiff (_not_in_) doesn't support general subexpressions.

rule1 does, more or less, the following:
while possible
while the next symbol is "tree", "oak", ..., or ","
skip it and append it (within < >) to var@wordList
skip the next symbol

----------
Example 7b
----------

; Extracts the symbols "tree", "pine", "/", "," and any word that begins with
; "sub-" from the expression preserving the order in which they appear
; (second version).
@rule1 equ ((tree|sub/-_S_|pine|/ /|/,)>wordList* _S_)*
translateRules rule1

curExpr equ armchair tree sub-aqua relax sun ,sub-zero. \
| &/8 563 sunday pine dog,oak,,,fridge...

matchExpr rule1, wordList
match =1 wList, readSuccess var@wordList
{
display "match17: "
\irp w, wList
\{
printStr w
display " "
\}
display 13,10
}
clearVars wordList

It's clear that assertDiff (_not_in_) doesn't support the command readSymbol (_S_).

---------
Example 8
---------

@someInstr equ mov|add|sub|xor|test|cmp
@someRegs16 equ ax|bx|cx|dx
@someRegs32 equ eax|ebx|ecx|edx
@someRegs equ {someRegs16}|{someRegs32}
@rule1 equ {someInstr}>__instr {someRegs}>__dest/, {someRegs}>__src
translateRules someInstr, someRegs16, someRegs32, someRegs, rule1

curExpr equ sub eax, ecx

matchExpr rule1, <__instr, __dest, __src>
display "match8: "
match =1 , readSuccess
{
display "OK!",13,10
stripAngBrackets var@__instr, var@__dest, var@__src
var@__instr var@__dest, var@__src
}
match =0 , readSuccess
{
display "ERROR!",13,10
}
clearVars __instr, __dest, __src

Here we use expandDef ({ }) to refer to other rules. As you can see, the use of expandDef makes the code much more readable.
Why do I use > instead of : ?
That's a long story, but to make it short let's just say that there was no ':' where I wrote this example.
Operator ':' makes more sense here because every variable is written to only once.
Moreover, since > encloses values within angle brackets (< >), we must use something like stripAngBrackets, which does exactly what it says.
The third line of code in the first match tell Fasm to assemble "sub eax, ecx".

---------
Example 9
---------

; Let's do some simple recursion.
@rule1 equ NOTOK [{rule1}] ; [x] means that x is optional
translateRules rule1

curExpr equ NOTOK NOTOK NOTOK NOTOK NOTOK OK!

matchExpr rule1,
match =1 unmExpr, readSuccess curExpr
{
display "match9: ", `unmExpr,13,10
}

This is our first example of recursive definition or rule.
We know that cmd@rule is
andRead readAtom NOTOK, optRead expandDef rule1
What happens is that
expandDef rule1
when executed, executes the command
andRead readAtom NOTOK, optRead expandDef rule1
so @rule1 is equivalent to
@rule1 equ NOTOK+

Note that optRead is needed otherwise matchExpr will always fail. Indeed there is no finite string which contains infinitely many occurrences of "NOTOK"!

Now the fun begins.

----------
Example 10
----------

; Let's do some simple recursion! (PHASE 1)
@atom equ a|b|c
@binOpSum equ {atom}((/+|-){atom})*
@mainRule equ {binOpSum}
translateRules atom, binOpSum, mainRule

curExpr equ a-b+c-a+b OK!

matchExpr mainRule,
match =1 unmExpr, readSuccess curExpr
{
display "match10: ", `unmExpr,13,10
}

The rules are simple to read:
an atom is "a", "b" or "c";
an "additive expression" is an atom or a sequence of more atoms separated by + or -;
the main expression is an "additive expression".
Here we translate more rules, but start from one rule (ovbiously). Note that when the parsing ends cursExpr is equal to "OK!". That is exactly what that code will display if everything's OK.

Now note that something like
@algExpr equ /({algExpr}/) | {algExpr}/+{algExpr} | x|y|z
won't work because if
/({algExpr}/)
fails,
{algExpr}/+{algExpr}
is processed and therefore its first part ({algExpr}) is expanded. The result is
@algExpr equ /({algExpr}/) | {algExpr}/+{algExpr} | x|y|z
Now
/({algExpr}/)
fails again and here we are again... infinite loop!

----------
Example 11
----------

; Let's do some simple recursion! (PHASE 2)
@atom equ a|b|c|d|e|f
@binOpSum equ {binOpProd}((/+|-){binOpProd})*
@binOpProd equ {atom}((/*|//){atom})* ; */ (C++ editor workaround!)
@mainRule equ {binOpSum}
translateRules atom, binOpSum, binOpProd, mainRule

curExpr equ a*b*c + c*d + e*f OK!

matchExpr mainRule,
match =1 unmExpr, readSuccess curExpr
{
display "match11: ", `unmExpr,13,10
}

These rules tell us that:
an atom is "a", "b", "c", "d", "e" or "f";
an "additive expression" is a "multiplicative expression" or a sequence of more "multiplicative expressions" separated by + or -;
a "multiplicative expression" is an atom or a sequence of more atoms separated by * or /;
the main expression is an "additive expression".

Why are "additive expressions" defined as above? Because that way multiplication (and division) has precedence over addition (and subtraction). If you're not sure, spend some time convincing yourself of this!

----------
Example 12
----------

; Let's do some simple recursion! (PHASE 3)
@atom equ a|b|c|d|e|f
@operand equ /({binOpSum}/) | {atom} ; handles sub-expressions!
@binOpSum equ {binOpProd}((/+|-){binOpProd})*
@binOpProd equ {operand}((/*|//){operand})*
@mainRule equ {binOpSum}
translateRules atom, operand, binOpSum, binOpProd, mainRule

;curExpr equ (a*b*c + d +) c*(d+e)*f OK! ; <-- wrong expression!
curExpr equ (a*b*c + d) + c*(d+e)*f OK! ; <-- correct: this works!

matchExpr mainRule,
match =1 unmExpr, readSuccess curExpr
{
display "match12: ", `unmExpr,13,10
}

Here we simply added the concept of sub-expression by inserting another "indirection": now * and / operators work on operands rather than directly on atoms. Operands are expressions enclosed within round brackets or atoms.
I admit that I might have chosen better names for my rules...
Note that the commented curExpr is wrong; indeed the second closed bracket is misplaced. As expected, if you use that string, the parsing will fail.

----------
Example 13
----------

; Let's do some simple recursion! (PHASE 4)
@atom equ a|b|c|d|e|f
@operand equ /({binOpSum}/) | {atom} ; handles sub-expressions!
@unaryOps equ {operand} !*
@binOpSum equ {binOpProd}((/+|-){binOpProd})*
@binOpProd equ {unaryOps}((/*|//){unaryOps})*
@mainRule equ {binOpSum}
;translateRules atom, operand, unaryOps, binOpSum, binOpProd, mainRule

;curExpr equ (a*b*c + d +) c*(d+e)*f OK! ; <-- wrong expression!
curExpr equ (a*b*c + d ! ! !) + c*(d+e)*f OK! ; <-- correct: this works!

; IMPORTANT:
; ! is not a special symbol to FASM, so a! will be recognized as a single
; symbol.
; However, (a !) will work just fine because ) is a special symbol and
; thus !) are always read as separate symbols.

matchExpr mainRule,
match =1 unmExpr, readSuccess curExpr
{
display "match13: ", `unmExpr,13,10
}

We added another level of "indirection": * and / operators work not on operands but on "unary operator expressions".
A "unary operator expression" is an operand followed by zero, one or more unary operators.
In this case, we just added the unary operator factorial (!).
Make sure to read the important note in the code.

----------
Example 14
----------

; Let's do some simple recursion! (PHASE 5)
@atom equ a|b|c|d|e|f
@operand equ /({binOpSum}/) | {atom} ; handles sub-expressions!
@unaryOps equ {operand} !*
@binOpSum equ {binOpProd}((/+|-){binOpProd})*
@binOpProd equ {defaultOp}((/*|//){defaultOp})*
@defaultOp equ {unaryOps}({unaryOps})*
@mainRule equ {binOpSum}
translateRules atom, operand, unaryOps, binOpSum, binOpProd, \
defaultOp, mainRule

;curExpr equ (a*b*c + d +) c*(d+e)*f OK! ; <-- wrong expression!
curExpr equ (a*b*c + d e + d ! ! !) + c(d+e)f OK! ; <-- correct!

; IMPORTANT:
; ! is not a special symbol to FASM, so a! will be recognized as a single
; symbol.
; However, (a !) will work just fine because ) is a special symbol and
; thus !) are always read separate symbols.

matchExpr mainRule,
match =1 unmExpr, readSuccess curExpr
{
display "match14: ", `unmExpr,13,10
}

What we did is quite simple: we made * the default operator, that is, * can be omitted (like ~ in our algebraic expressions).
In example 13 binOpProd is defined as an expression composed of "unary operator expressions" connected by * or /. Now binOpProd is defined as an expression of "default operator expressions" connected by * or /.
A "default operator expression" (@defaultOp) is an expression composed of "unary operator expressions" juxtaposed i.e. connected by an "invisible" or, better, understood operator, i.e. our default operator *.

----------
Example 15
----------

macro cmd@initLocals
{
var@curData equ
var@curOp equ
var@opd2 equ
}

macro cmd@saveLocals
{
savedCurData equ var@curData
savedCurOp equ var@curOp
savedOpd2 equ var@opd2
}

macro cmd@updCurData
{
var@curData equ var@curOp var@curData var@opd2
}

macro cmd@restoreLocals
{
var@curData equ savedCurData
var@curOp equ savedCurOp
var@opd2 equ savedOpd2
restore savedCurData
restore savedCurOp
restore savedOpd2
}

; Let's do some simple recursion! (PHASE 6 (PHASE 1 REVISITED))
@atom equ a|b|c
@binOpSum equ , \
({atom}:curData \
( (/+|-):curOp {atom}:opd2 )* \
= &curData), \

@mainRule equ , {binOpSum}
translateRules atom, binOpSum, mainRule

curExpr equ a+c-b-a OK!

matchExpr mainRule,
match =1 unmExpr, readSuccess curExpr
{
display "match15: outData = "
printStrN outData
}

We want to take an algebraic expression and rewrite it in Polish notation. Polish notation is very simple:
a becomes a
a bin_op b becomes bin_op a b
a unary_op becomes unary_op a
So, for instance,
5+6
becomes
+5 6
while
5+6*7
becomes
+5*6 7
Think of it as a tree:
+
5 *
6 7

and now walk the tree in pre-order, i.e. root-left-right:
+5*6 7
as expected.

The expression
(1+2)3! + 4
can be represented by the following tree:
+
* 4
+ !
1 2 3

Therefore it can be written as
+*+1 2!3 4

Now remember that @binOpSum was
@binOpSum equ {atom}((/+|-){atom})*
in phase 1.
Here we want to save what we read, so we decide to use three variables:
curData it contains the partial output
curOp it contains the last seen operator
opd2 it contains the last seen right operand
We end up with
@binOpSum equ {atom}:curData((/+|-):curOp {atom}:opd2)*

There's a problem: we know that binOpSum is to be called recursively in the complete solution. What happens to our three variables?
In the final solution, that two {atom} will be substituted for two {binOpProd} which will probably invoke binOpSum at some level of the recursion. When this happen, our three variables are overwritten.
What would happen if a recursive function used only global variables instead of local ones? Here the same thing would happen.
One solution is to save the values of the "local variables" and then restore them. We decide that whoever modifies the values must save and restore the old values.

The operator alsoRead (,) executes a sequence of commands one after another. It doesn't do anything special. We use < > to call our own commands:
@binOpSum equ , \
{atom}:curData((/+|-):curOp {atom}:opd2)*, \

What we do is this:
1) we save the "local variables"
2) we use the "local variables"
3) we restore the "local variables"
This way we don't interfere with the "caller" which is probably just using the very same "local variables".
Note that cmd@restoreLocals uses the directive "restore" on the global variables we use to save our fake local variables.
Why don't we use "restore" on the local variables itself?
The problem is that we don't know how many times our local variables are written to. If the expression is
operand1 * operand2
we modify curData twice, but if the expression is
operand1 * operand2 * ... * operandn
we modify it n times!
But we know we modify the global variables
savedCurData
savedCurOp
savedOpd2
only once, i.e. when we call saveLocals.

Now there's another problem. Our @binOpSum returns what its second line returns because our macros saveLocals and restoreLocals don't modify outData. But aren't we supposed to read
operand1 op operand2
and return
op operand1 operand2
?
That is how we (recursively) defined the Polish notation, after all.
We could proceed in many ways but what I chose is this:
curData = left_operand
while we come across an operator and a right operand (coupled)
curData = operator curData right_operand

This means that
a+b+c
becomes
++a b c
and, in general,
a+b+c+...+z
becomes
++...+a b c ... z
This is perfectly in accordance with our definition of Polish notation.

How do we do that?
That
{atom}:curData
writes the left operand to curData. That's exactly what we wanted.
Now we modify
((/+|-):curOp {atom}:opd2)*
in this way:
((/+|-):curOp {atom}:opd2 )*
What we did was just add a call to updCurData, an our own macro.
Here's the macro:
macro cmd@updCurData
{
var@curData equ var@curOp var@curData var@opd2
display "updCurData",13,10
}

What it does should be perfectly obvious.
It modifies curData exactly like we said in the pseudo code above:
"curData = operator curData right_operand".

Like we said, our @binOpSum returns what its second line returns: it returns what it read! But we want to return curData!
One solution is to use thruRead. Its brief description says:
thruRead [commands]
It's like andRead but, if successful, returns the string returned by the last command.
This means that if we have an expression MAGIC_EXPR that returns the value of curData and we write
((/+|-):curOp {atom}:opd2 )* = MAGIC_EXPR
we are done.
Lucky for us,
readVariable curData
is that expression.
In algebraic form it becomes
&curData

This is our final rule:
@binOpSum equ , \
({atom}:curData \
( (/+|-):curOp {atom}:opd2 )* \
= &curData), \

Note that we add two round brackets because ',' (i.e. alsoRead) has precedence over '=' (i.e. thruRead).

------------------------------------
Example 16 (putting it all together)
------------------------------------

macro cmd@initLocals
{
var@curData equ
var@curOp equ
var@opd2 equ
}

macro cmd@saveLocals
{
savedCurData equ var@curData
savedCurOp equ var@curOp
savedOpd2 equ var@opd2
}

macro cmd@updCurData
{
var@curData equ var@curOp var@curData var@opd2
}

macro cmd@updCurDataProd
{
var@curData equ * var@curData var@opd2
}

macro cmd@updCurDataFact
{
var@curData equ ! var@curData
}

macro cmd@restoreLocals
{
var@curData equ savedCurData
var@curOp equ savedCurOp
var@opd2 equ savedOpd2
restore savedCurData
restore savedCurOp
restore savedOpd2
}

; Let's do some simple recursion! (PHASE 7)
@atom equ a|b|c|d|e|f
@subExpr equ , \
(/({binOpSum}:curData/) = &curData), \

@operand equ {subExpr} | {atom}
@unaryOps equ , \
({operand}:curData (! )* = &curData), \

@binOpSum equ , \
({binOpProd}:curData \
( (/+|-):curOp {binOpProd}:opd2 )* \
= &curData), \

@binOpProd equ , \
({defaultOp}:curData \
( (/*|//):curOp {defaultOp}:opd2 )* \
= &curData), \

@defaultOp equ , \
({unaryOps}:curData \
( {unaryOps}:opd2 )* \
= &curData), \

@mainRule equ {binOpSum}
translateRules atom, subExpr, operand, unaryOps, binOpSum, \
binOpProd, defaultOp, mainRule
;printStrN cmd@defaultOp
;curExpr equ (a*b*c + d +) c*(d+e)*f OK! ; <-- wrong expression!
curExpr equ (a*b*c + d e + d ! ! !) + c(d+e)f OK! ; <-- correct!

; IMPORTANT:
; ! is not a special symbol to FASM, so a! will be recognized as a single
; symbol.
; However, (a !) will work just fine because ) is a special symbol and
; thus !) are always read separate symbols.

matchExpr mainRule,
match =1 unmExpr, readSuccess curExpr
{
display "match16:",13,10
display " outData = "
printStrN outData
display " curExpr = "
printStrN curExpr
}

If you *did* understand the last example, this one should be straightforward. The only thing to note is that we don't need to save all the three variables every time. For instance, we only need to save var@curData in @unaryOps.
I use saveLocals and restoreLocals for simplicity.

That's all for now!}}}}}}}}}}