flat assembler
Message board for the users of flat assembler.
Index
> Programming Language Design > [fasmg.x86] the long road ahead |
Author |
|
revolution 29 Apr 2017, 13:32
But x86 nop is more than just "db 0x90", it can also have arguments to give longer no-ops. Plus it is processor specific. Other CPUs can also use the nop opcode but have entirely different binary outputs.
|
|||
29 Apr 2017, 13:32 |
|
Mike Gonta 29 Apr 2017, 14:11
revolution wrote: But x86 nop is more than just "db 0x90", it can also have arguments to give longer no-ops. Plus it is processor |
|||
29 Apr 2017, 14:11 |
|
Tomasz Grysztar 29 Apr 2017, 14:20
You do not need to preserve all the registers, in fact the only register that matters when jumping to "instruction_assembled" is ESI.
You may find some info about the instruction handler interface in DIRECTIVES.INC around line 290. |
|||
29 Apr 2017, 14:20 |
|
Mike Gonta 29 Apr 2017, 15:01
Tomasz Grysztar wrote: You do not need to preserve all the registers, in fact the only register that matters when jumping to "instruction_assembled" is ESI. I'll start digging for the rest of the api. http://mikegonta.com/fasmg.x86 |
|||
29 Apr 2017, 15:01 |
|
Tomasz Grysztar 02 May 2018, 09:34
This is probably the best place to write down some notes that could become a guide to how to create an assembler on top of fasmg engine. So let me expand on the example you made.
In the instruction tables used by fasm 1 there was a place for an additional parameter passed to the handler, so one common handler could be used for multiple instructions that used the same rules for operands and encoding. In case of fasmg we also have a field that could be used for this purpose: Code: db 3,'nop',SYMCLASS_INSTRUCTION,VALTYPE_NATIVE_COMMAND,VAL_INTERNAL,90h dd simple_instruction db 4,'int3',SYMCLASS_INSTRUCTION,VALTYPE_NATIVE_COMMAND,VAL_INTERNAL,0CCh dd simple_instruction Code: simple_instruction: mov al,[edx+ValueDefinition.attribute] push eax mov ecx,1 call initialize_output pop eax stosb jmp instruction_assembled You should pay attention to definitions of all the interfaces. The fact that EDX has a pointer to ValueDefinition is stated in the definition of "instruction handler" interface (I already mentioned it in a post above): Code: ; instruction handler ; in: ; esi = pointer into preprocessed line ; ecx = number of whitespace tokens between previous symbol and current position ; edx - ValueDefinition of instruction ; ebx - SymbolTree_Leaf of instruction ; edi - SymbolTree_Root of instruction ; when [SymbolTree_Leaf.class] = SYMCLASS_STRUCTURE: ; [label_leaf] - SymbolTree_Leaf of structure label ; [label_branch] - SymbolTree_Foliage of structure label ; out: ; when done, handler should jump to instruction_assembled with esi containing a pointer moved past the processed part of line, ; or jump directly to assembly_line when the rest of line should be ignored ; note: ; when esi is equal to [line_end], pointer is at the end of line and there is no data available at this address We may proceed to a bit more interesting example, again analogous to one of the classic instruction handlers of fasm 1: Code: int_instruction: call get_constant_value test al,al jz missing_argument cmp al,2Eh je invalid_argument ; error on floating point argument call keep_value mov ecx,2 call initialize_output mov al,0CDh stosb call get_kept_value mov ecx,1 call fit_value jc value_out_of_range ; error when number does not fit in a byte jmp instruction_assembled Code: invalid_argument: mov edx,_invalid_argument call register_error jmp assembly_line Technically it is not necessary to call "keep_value" here, because "initialize output" is not among the parsing or expression evaluating functions, so simply preserving the EDX point would be enough in this case. But I believe it makes the example more clear and it might be a good habit to use this function anyway. And another similar handler, this time for ENTER instruction, to show how to process a comma-separated arguments: Code: enter_instruction: mov ecx,1+2+1 call initialize_output mov al,0C8h stosb mov [output_pointer],edi call get_constant_value test al,al jz missing_argument cmp al,2Eh je invalid_argument mov edi,[output_pointer] mov ecx,2 call fit_value jc value_out_of_range call get_constituent_value jc missing_argument cmp al,',' jne invalid_argument call get_constant_value test al,al jz missing_argument cmp al,2Eh je invalid_argument mov edi,[output_pointer] add edi,2 mov ecx,1 call fit_value jc value_out_of_range jmp instruction_assembled Handling of special symbols like registers (or expressions containing registers) requires a bit more preparation. Basically we have to define some built-in ELEMENT-type symbols. I'm going to write more on this on another occasion. |
|||
02 May 2018, 09:34 |
|
Tomasz Grysztar 03 May 2018, 17:37
In the TABLES.INC we can define not only instructions, but any other types of symbols that we need to have built-in. Let's add some entries that define a couple of registers as ELEMENT-type symbols:
Code: db 2,'ax',SYMCLASS_EXPRESSION,VALTYPE_ELEMENT,VAL_INTERNAL,'R' dd 0200h db 2,'cx',SYMCLASS_EXPRESSION,VALTYPE_ELEMENT,VAL_INTERNAL,'R' dd 0201h ; ... db 3,'eax',SYMCLASS_EXPRESSION,VALTYPE_ELEMENT,VAL_INTERNAL,'R' dd 0400h db 3,'ecx',SYMCLASS_EXPRESSION,VALTYPE_ELEMENT,VAL_INTERNAL,'R' dd 0401h A simple way to then handle such defined register as an operand to an instruction is to use "get_constituent_value" and then do multiple checks to find out what kind of symbol was found: Code: bswap_instruction: call get_constituent_value jc missing_argument cmp al,1Ah jne invalid_argument ; no symbol identifier test edx,edx jz invalid_argument ; undefined symbol cmp [edx+ValueDefinition.type],VALTYPE_ELEMENT jne invalid_argument cmp [edx+ValueDefinition.attribute],'R' jne invalid_argument mov eax,[edx+ValueDefinition.value] cmp ah,4 jne invalid_argument push eax mov ecx,2 call initialize_output mov al,0Fh stosb pop eax add al,0C8h stosb jmp instruction_assembled The above would work exactly the same if we used some other symbol type instead of VALTYPE_ELEMENT. In fact we could define a custom type just for registers, with a constant like VALTYPE_REGISTER to identify it. However, the ELEMENT type allows such defined register to be used in arithmetical expressions, crucial for things like x86 addressing. We may modify the example BSWAP handler so that instead of looking for a plain symbol it processes a complete expression and checks if the result is a simple register. This approach is going to allow syntax like: Code: bswap (eax) bswap ecx+0 Now the handler that looks for the register in expression result looks like this: Code: bswap_instruction: call get_expression_value jc missing_argument cmp byte [edi+ExpressionTerm.attributes],EXPR_NUMBER jne invalid_argument ; linear polynomials always have all terms marked as EXPR_NUMBER call get_term_value cmp dword [edx],0 jne invalid_argument ; constant term has to be zero add edi,sizeof.ExpressionTerm cmp [edi+ExpressionTerm.attributes],0 je invalid_argument ; variable term needs to be present call get_term_value mov ecx,1 call fit_value ; this overwrites first byte of ExpressionTerm.attributes, but we do not need it anymore jc invalid_argument cmp byte [edi],1 ; the variable term must not be multiplied by any value other than 1 jne invalid_argument mov ebx,[edi+ExpressionTerm.metadata] ; this points to SymbolTree_Leaf of symbol used as a variable add edi,sizeof.ExpressionTerm cmp [edi+ExpressionTerm.attributes],0 jne invalid_argument ; expect no more variable terms mov edx,[ebx+SymbolTree_Leaf.definition] test edx,edx jz invalid_argument cmp [edx+ValueDefinition.attribute],'R' jne invalid_argument mov eax,[edx+ValueDefinition.value] cmp ah,4 jne invalid_argument push eax mov ecx,2 call initialize_output mov al,0Fh stosb pop eax add al,0C8h stosb jmp instruction_assembled When handling addresses in square brackets, the expression would need to be processed in a similar way, iterating through the terms and checking if they correspond to valid addressing registers. In case of addresses an important aspect is that they may contain some non-register variable terms, like a base symbol for relocatable section. (One of my ideas for x86 encoder was that if after removing processed register terms the expression result still contained some variable ones, it would put this remaining polynomial into a temporary variable and schedule a call to macro like DWORD to process it. This would then allow to have relocatable formats implemented with macros, actually the same macros that already exist for fasmg.) As for the variants like a segment prefix inside a square brackets, again the approach may vary. If we decided to allow segment register to be specified through an expression, the implementation would need two "get_expression_value" calls with "get_constituent_value" in between to handle the ":" separator. Otherwise a "peek_at_constituent_value" could be used to find out if the first symbol of an expression is a segment register, processing straight to "get_expression_value" if it is not. PS. One more thing: the built-in ELEMENT symbols defined like this have no visible metadata that could be extracted with METADATAOF operator. This is because when extracting metadata, fasmg looks at the "value_length" field which is zero for all the symbols generated from TABLES.INC. This is what allowed us to put a customized data into the "value" field, otherwise it would need to be a valid pointer to a value of said length and in the same format as values of VALTYPE_NUMERIC symbols. If we wanted to make these symbols to expose some metadata, we would need to either alter how the tables are processed in the "assembly_init" routine, or - perhaps better - add another routine to create internal symbol immediately after the call to "assembly_init" and before the actual assembly. If you'd like I can prepare some examples of how to add any fully customized symbol into the symbol tree during the initialization. |
|||
03 May 2018, 17:37 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.