This is the new version of this tutorial (as the older ones are obsolete, since fasm's internals have changed a bit since they were written), and the first which I'm posting on this messageboard. Hope you'll find it useful, as I failed with writing the full internals documentation (that doesn't mean I won't finish it, but it'll probably take a long time). Maybe I should post also some tutorial on porting fasm to other environments?
Example 1 - simple approach
Consider we want to have the "bignop" instruction, without any arguments, which will generate 7 bytes of value 90h. First step: add this name to the table, so fasm will recognize this instruction. This name is 6 bytes long, so you should find instructions_6 table (it's in the 'x86.inc' file, which contains the standard set of instructions) and put there two new lines (remember to keep the alphabetical order!):
db 'bignop',0
dw bignop_instruction-assembler
This will define our new instruction, with the handler procedure being bignop_instruction, and a zero as the additional parameter. Now, when fasm meets this instruction in preprocessed and parsed source, it will jump to your handler (the bignop_instruction label), with the additional parameter in AL register. So the only thing left to do is to write this instruction handler. You may add it to "x86.inc", but the best solution is to create new "custom.inc" file, and put "include 'custom.inc'" line somewhere in the "x86.inc" file. If your editor can't handle text files larger than 64KB, just write the following command at the DOS prompt:
echo include 'custom.inc' >> x86.inc
Now create the "custom.inc" file, and put the bignop_instruction handler there:
bignop_instruction:
mov al,90h
mov ecx,7
rep stos byte [edi]
jmp instruction_assembled
This handler will just generate 7 bytes of code, without reading any arguments, and then pass the control back to assembler. Every instruction handler should be ended with this jump.
Now recompile the fasm and try the new instruction!
Example 2 - argument processing
Now we will add the "varnop" instruction, which will expect an argument being a number specifying the length of the instruction. The instruction handler is:
varnop_instruction:
lods byte [esi]
cmp al,'('
jne invalid_argument
cmp byte [esi],'.'
je invalid_value
call get_dword_value
mov ecx,eax
mov al,90h
rep stos byte [edi]
jmp instruction_assembled
This handler expects the number at ESI, so it loads a byte and checks if it is a number expression (marked by a "(" byte). If it isn't, we are jumping to the error handler (look at "errors.inc" to see what errors can be handled, you can also add your own - it's simple). Also, if the second byte is ".", it means this is floating point number, and we don't want it. Then we can call the "get_dword_value" procedure with esi pointing to the first byte after "(" character, and the whole expression is processed, the result number is stored in EAX register. Now handler generates this count of NOPs and exits.
There are many procedures that will process arguments for you, here is the list of the most useful of them:
1. When argument is a number, call one of the following procedures with ESI pointing to the first byte after "(":
get_byte_value - returns number in AL
get_word_value - returns number in AX
get_dword_value - returns number in EAX
get_pword_value - returns number in DX:EAX
get_qword_value - returns number in EDX:EAX
get_value - converts number of any type and returns it in EDX:EAX
2. When argument is a register, the first byte at ESI is 10h, load the second byte to AL and call:
convert_register - accepts only general purpose registers, sets the AH to the size of register (1, 2 or 4) and AL to the register code number.
convert_fpu_register - accepts only FPU register, sets the AH to the value of 10 (this is the size of single FPU register) and AL to the register code number.
convert_mmx_register - accepts only MMX registers, AH is set to the register size (8 or 16) and AL to the register code number.
These procedures set also the [argument_size] variable to the same value as AH register. If the [argument_size] is already set to something but 0, and sizes don't match, the error handler is called.
You can also process the second byte manually, you can see the possible second byte values looking at the "symbols" table in "x86.inc" file.
3. To process size overrides, after loading the first byte of an argument into AL call get_size_operator procedure. If there is a size override, it sets the [argument_size] to proper value, and loads the first byte of next symbol into AL, otherwise it does nothing.
4. When argument is the memory (the first byte is "["), call the get_address procedure with ESI pointing to the first byte after "[". It will return an address value in EDX, base register code in BH, index register code in BL, index scale in CL, address size override in CH and the segment register code in [segment_register] variable. You can just pass the unchanged BX, CX and EDX registers to the store_instruction procedure, with [base_code] set to instruction code and [postbyte_register] set to the register code or instruction extension - the whole opcode will be generated then and stored at EDI. If [base_code] is set to 0Fh, the [extended_code] should contain the value of second opcode byte.
To make 16-bit version of instruction (regardless the "use16" or "use32" setting), call the operand_16bit_prefix procedure before generation an opcode. To make 32-bit version, call the operand_32bit_prefix.
Please look at the various instruction handlers in "x86.inc" for the more complex examples.
Example 3 - common handler
We can make the common handler for the both of above instructions, using the additional parameter field:
in "tables.inc":
db 'bignop',7
dw bignop_instruction-assembler
and
db 'varnop',0
dw bignop_instruction-assembler
in "custom.inc":
bignop_instruction:
xor ecx,ecx
or cl,al
jnz .store
lods byte [esi]
cmp al,'('
jne invalid_argument
cmp byte [esi],'.'
je invalid_value
call get_dword_value
mov ecx,eax
.store:
mov al,90h
rep stos byte [edi]
jmp instruction_assembled
If the additional parameter is 0, it reads the count argument, otherwise it uses the AL as a count.
Have a nice customizing!