flat assembler
Message board for the users of flat assembler.
Index
> Programming Language Design > On my new assembler Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Next |
Author |
|
Tomasz Grysztar 23 Mar 2015, 12:18
I am preparing to finally release the initial version of my new project, developed under the working title "fasm g". Before I do so, I think I should take some time to describe in more detail what is it and how is it related to fasm, in addition to my previous statements.
First of all: this is not fasm 2, and it is not even an x86 assembler. But it does not implement an instruction set for any other specific architecture, either. It is just a bare assembly engine that only has instructions like DB or DD to generate and assemble various data. It is therefore something like a third-generation successor of my old MDAT tool (if one took fasm 1.x and removed all the x86 instructions and formats, leaving just the directives like DB/IF/REPEAT and preprocessor, it could perhaps be called a second-generation MDAT). This means that it probably could be extended to contain an implementation of x86 instructions and become fasm 2, but I'm not going to attempt it unless there is real demand. The new architecture is quite different and while it has its advantages, it also has some weaknesses compared to fasm 1. Back in 2009, on our fasmcon meeting I gave a short talk about the features I planned for fasm 2. I think that most of what I said then still holds today. In fact, some of the fragments of then hypothetical code I presented can now be successfully assembled not only with fasm g, but even with fasm 1.71 - because during the years of my struggle with ideas for fasm 2 I actually managed to implement some of them (notably the labeled virtual blocks) into fasm 1.x line. One of the important things I said then was that fasm 2 would be slower than fasm 1, because it would no longer have a separate preprocessor and parser stage, but it would do multiple passes on the source text. This was one of my fundamental assumptions for the new architecture: I wanted to get rid of of fasm's separate layers and separate languages for each layer - but at a cost of performance. Now, when I have fasm g working already, I begin to think that it may actually be possible to design an architecture that would be "the best of both worlds" and use some of the tricks of fasm 1 for better performance while implementing a unified language like fasm g - but this is at the moment just a general vision, like fasm 2 was back in 2009. And fasm g is in fact exactly what I envisioned back then, it has no separate preprocessor and does multiple passes on pre-tokenized source text, and this is noticeably slower than multiple passes on internally precompiled source that fasm 1 does. Another consequence of the language unification is that macroinstructions are almost completely incompatible between fasm 1 and fasm g. On the other hand, the language of assembler module has been kept almost unchanged - and it is possible to have sources that are assembled with the same result by both assemblers. Of course such source would have to be limited just to directives like DB, VIRTUAL, REPEAT, LOAD, etc. - but even though this appears to be a very small common subset of languages, one can find many small snippets for fasm 1 that can be assembled with fasm g, and I have been using selected examples taken from this board to test the new assembler. But probably the most important point remains: how can an assembler that does not implement any actual instruction set be useful at all? And can it even be called an assembler then*? For the second question my answer is that I use the term "assembler" in a specific meaning, the one I introduced in the Understanding fasm text when I wrote: Tomasz Grysztar wrote: Because assembler truly is both a compiler and interpreter at the same time, and none of those terms alone is able to explain correctly what the assembler does. Back to the most important question: what can fasm g be used for? The answer may already be obvious to the ones that used fasm 1.x for purposes other than the assembly for x86 architecture. Different instruction sets can be implemented in form of macroinstructions, and some examples of this type already exist for fasm 1. Of course, when these macroinstructions need to be complex, the assembly is going to be much slower than it would be if instruction set was implemented natively into assembler - and this adds to the fact that fasm g in general is slower than fasm 1, so you should certainly not expect an amazing performance. On the other hand, the assembly is often used for small snippets, or programs for machines like microcontrollers that have very little memory - in these cases the sources are small enough that the performance drawbacks can be forgiven, while the instruction sets implemented in form of macroinstructions give great flexibility and can be really fun to play with. And while I do not know whether this new assembler is going to really interest anyone else (especially since it replaces some of the confusing features of fasm 1 with the new ones that may actually still be confusing but in a completely new way), I have a lot of fun with it already. I created the macroinstruction packages that implement the instruction sets for 8086/80186 (including MZ output with relocations), 8051 (with Intel HEX output) and JVM (with .class file generator converted directly from my example for fasm 1) and I plan to include at least these in the initial release. ___ * In our private talk vid suggested to me that I could use the following slogan to advertise my new project: "everything you want from an assembler, without an assembler". I think it is a good one. |
|||
23 Mar 2015, 12:18 |
|
Tomasz Grysztar 23 Mar 2015, 21:39
JohnFound wrote: In fact, I want to read the user manual, even more than to download the program. |
|||
23 Mar 2015, 21:39 |
|
Tomasz Grysztar 25 Mar 2015, 22:47
Tomasz Grysztar wrote: But I also want to write another guide specifically for the people that already know and use the languages of fasm 1 - such "transition guide" I plan to post here, I hope soon. |
|||
25 Mar 2015, 22:47 |
|
Tomasz Grysztar 26 Mar 2015, 13:43
This is a brief comparison of fasmg features to their analogs in fasm. It does demonstrate some of the basic commands of fasmg, but it is not a description of this language.
fasmg does have a partial compatibility with fasm, but this applies to just a small subset of fasm's features. fasmg does not implement the x86 architecture instructions or the output formats, any output it generates is through the declarations like DB. It also does not have a separate preprocessor like fasm and thus the language of fasm's preprocessor is also not present. In fasm the preprocessor used a different set of syntax constructions than the assembler to make them distinguishable from each other - in fasmg the language is unified, and features that in case of fasm were implemented in preprocessor (like macros) now use the syntax constructions of fasm's assembler module. So the only instructions that remain relatively unchanged are the ones that were the directives of assembly module in fasm. Still, it is possible to find some sources that were originally written for fasm but can be assembled with fasmg without any changes. This snippet from the Understanding fasm is one such example: Code: file 'interp.asm' repeat $ load A byte from %-1 if A>='a' & A<='z' A = A-'a'+'A' end if store byte A at %-1 end repeat While the features of assembly module may work more or less the same, there is a couple of things that were processed by parser module in case of fasm and they are either not present or noticeably modified in case of fasmg. The EQ and IN conditions are not implemented in fasmg as they were inherently not safe to use in general context (they were partially superseded by MATCH in case of fasm and in case of fasmg there are also other possible replacements). The EQTYPE is present in fasmg, but works very differently (as fasm's EQTYPE was actually in part specific to x86 symbols), though it still can be used to detect strings. The anonymous labels are not implemented by fasmg (but they can easily be implemented through macroinstructions, including the variations that could be hard to implement in case of fasm). The symbols starting with dot may appear to behave in the same way as in fasm, but they follow the different set of rules in fasmg (this is related to the fact that fasmg relies heavily on namespaces, and it treats the dot as a special character). The basic definitions of assembly symbols are the same as in fasm. The labels are defined with ":" or LABEL and can be forward-referenced, the variables are defined with "=" and can be forward-referenced only when they have exactly one value. The forward-referenced values are resolved by fasmg in the same way as fasm did it, so this traditional snippet assembles as it used to: Code: dd x,y x = (y-2)*(y+1)/2-2*y y = x+1 Code: a =: 1 a =: 2 restore a ; brings back a = 1 There is still a couple of features of the assembler module of fasm that are not present in fasmg. There is no ALIGN or TIMES - I decided it's better to have them as macros only (ALIGN gains from the macro customization, while TIMES was in fasm only for semi-compatibility with NASM anyway). Also the syntax like "a = byte 1" is not allowed in fasmg (it probably could be added, but the reason for its omitting is similar to why the STORE has a new recommended syntax - fasmg allows sizes to be specified by a numerical expressions and symbols like "byte" simply have the numerical values; placing two numerical expressions next to each other can lead to ill-defined boundaries of said expressions). The EQU is present in fasmg, even though it was preprocessor's directive in fasm. And as in case of fasm it defines the symbolic value with an effect similar to the simple substitution of text: Code: nA = 2+2 dd nA*2 ; 8 sA equ 2+2 dd sA*2 ; 6 Since EQU is now a part of the unified language, it is affected by constructions like IF or REPEAT. The symbolic variable can even be forward-referenced, as long as it has a single definition. There is also an additional variant, REEQU, which is to EQU what "=" is to "=:" (it is like RESTORE and EQU combined into one operation). The REPEAT in fasmg combines the functions of fasm's REPEAT and REPT: Code: repeat 3, counter:0 byte#counter db counter end repeat Also "`" now works very differently, though often gives similar results as in fasm: Code: repeat 3, counter:0 byte#counter db `counter end repeat The macroinstructions in fasmg use the syntax of the assembler: Code: macro nop
db 90h
end macro Code: macro definitions arg& irp <name,value>, arg name = value end irp end macro definitions a,1, b,2, c,3 Code: irp str, 'alpha','beta','gamma' repeat %% dd offset#% end repeat repeat %% indx % offset#% db str end repeat break end irp The macros are now the symbols of assembler, and therefore they are handled like the other kinds of such symbols - for example a macro can be forward-referenced when it has a single definition. Such macro can also call itself to create a recursion. But if a macro is redefined and becomes a variable, then when it calls itself it actually calls the previous value - just like in fasm. Like IRP or MACRO, also the STRUC, MATCH and POSTPONE are implemented in fasmg with the END syntax instead of braces. The fasm's traditional escaping that was required in nested preprocessing structures is not needed with fasmg constructions - the assembler controls the nesting levels in the same way as fasm always did with IF/WHILE/REPEAT; the "#" character is never removed from text so it is not affected by nesting; and the "`" character is only processed when used with the right name of parameter, so it is enough to use non-conflicting names for the parameters at different levels of nesting. For the rare cases when a macro needs to open another macro without closing it, there is an ESC command. |
|||
26 Mar 2015, 13:43 |
|
Tomasz Grysztar 26 Mar 2015, 15:37
A few notes about the implementation:
I am preparing the package with the manual, sources and a few examples - I am going to post it here soon. |
|||
26 Mar 2015, 15:37 |
|
JohnFound 26 Mar 2015, 17:23
I like it! It is great to get rid of curly braces, back slashes and differences between preprocessor and assembling stages. On the other hand removing "common", "forward" and "reverse" seems to make the syntax a bit more complex and not so readable. (for example, I didn't understood the example with irp, repeat and indx - are there nested loops, only one loop or two sequential loops?
I also have several questions: I understand that the speed is not important just now, but still, what is the estimated speed of FASMG related to FASM? Isn't implementation of the instruction set through macros too slow, especially for such complex CPUs as x86 and x86-64? Or too complex? How about the size optimizations of the instructions? Are you planning to implement built-in instruction handling in addition to FASMG? Maybe in some modular way that to allow different instruction sets to be switched in easy? What binary formats are supported? Are FAS files generated as in FASM? Or maybe in different format? Sorry for so many questions, but I am really interested, because the syntax seems to be really good, addressing most of FASM flaws. BTW: If once the preprocessor and the assembler are joined, why not to join the assembly symbols and the preprocessor symbols? Why to keep two different and not compatible entities instead of one. I understand, that it will need to handle two different types of data (numbers and symbols) in one variable, but the type converting is not impossible and may even become great flexible feature. In addition, some directives will become redundant and the syntax cleaner. IMHO. |
|||
26 Mar 2015, 17:23 |
|
Tomasz Grysztar 26 Mar 2015, 17:50
JohnFound wrote: I understand that the speed is not important just now, but still, what is the estimated speed of FASMG related to FASM? JohnFound wrote: Isn't implementation of the instruction set through macros too slow, especially for such complex CPUs as x86 and x86-64? Or too complex? JohnFound wrote: How about the size optimizations of the instructions? JohnFound wrote: Are you planning to implement built-in instruction handling in addition to FASMG? Maybe in some modular way that to allow different instruction sets to be switched in easy? JohnFound wrote: What binary formats are supported? JohnFound wrote: Are FAS files generated as in FASM? Or maybe in different format? |
|||
26 Mar 2015, 17:50 |
|
Tomasz Grysztar 26 Mar 2015, 17:58
Here comes the package I prepared. I am a bit tired with this project at the moment, so I have done the final packaging a bit hastily, but I hope this is enough for a first preview.
UPDATE: I have made this package available on the official download page, you can get the latest version there when any new updates are made. Last edited by Tomasz Grysztar on 13 Jun 2015, 18:16; edited 1 time in total |
|||
26 Mar 2015, 17:58 |
|
revolution 26 Mar 2015, 23:35
Can macros access a previous decision from a previous pass and declare that a further pass is required?
|
|||
26 Mar 2015, 23:35 |
|
l_inc 27 Mar 2015, 01:19
Tomasz Grysztar wrote:
A nice feature I'm missing in fasm 1. There are some inaccuracies in the manual... Some proofreading might be desirable (I'd like to volunteer ). revolution Quote: Can macros access a previous decision from a previous pass and declare that a further pass is required? This is possible with fasm 1. It's possible to take values from any specific previous pass and to use those in the following passes or to do smth. only if a condition is true at some specific pass etc. So I guess a mechanism of doing the same wouldn't change. _________________ Faith is a superposition of knowledge and fallacy |
|||
27 Mar 2015, 01:19 |
|
revolution 27 Mar 2015, 01:48
l_inc wrote:
|
|||
27 Mar 2015, 01:48 |
|
l_inc 27 Mar 2015, 23:05
revolution
I can't unambiguously decipher your clarification, but it seems like you want to declare a need for another pass by checking if some constraint is satisfied. This is kinda convoluted way of thinking, cause multipass processing is there for satisfying such constraints. So instead of attempting to do some explicit intra-pass specific checking and pass count control you just need to specify the constraint and fasm then applies as many passes as needed to make it true. _________________ Faith is a superposition of knowledge and fallacy |
|||
27 Mar 2015, 23:05 |
|
revolution 28 Mar 2015, 05:20
What I mean is something like this ARM code:
Code: thumb cmp r0,0x102-y y: Code: F1B00FFE 7M ---> cmp r0,0xFE Note that the alternative narrow encoding is this: Code: 28FE V4T ---> cmp r0,0xFE |
|||
28 Mar 2015, 05:20 |
|
Tomasz Grysztar 28 Mar 2015, 08:59
The revolution's question is about the "oscillator problem", when some conditional assembly (like when trying to optimize an instruction to a shortest form) causes the initial condition to be changed to its opposite, and this alternates between the consecutive passes. The stream of passes with alternating conditions then never ends and the solution cannot be found.
[Note: the samples I provide below assemble exactly the same with either fasm 1.71 or fasmg pre-release] Let's have a look at this simplified example: Code: if v and 1 = 0 db v shr 1 ; short opcode else db 80h,v ; long opcode end if v: So, to make a source that is not self-contradictory, just like in the example of "if ~defined", we need to modify conditions a bit. This one gets correctly resolved to a long form of instruction: Code: if v and 1 = 0 & ~broken db v shr 1 ; short opcode broken = 0 else db 80h,v ; long opcode broken = 1 end if v: You can look at it from the point of view of tracing the changes of values during the consecutive passes (though that requires some knowledge of fasm's implementation) - the symbol "broken" then simply traces that the optimization to short form failed at least once, and it prevents it from being chosen again (thus preventing the oscillation). But I prefer to describe the problems like above in the terms of logical contradictions and removing them, because then this is less dependent on the particular implementation, and is more a general feature of language. Of course it is possible that current implementation of fasm or fasmg may not be able to find a solution even in case, when source is not self-contradictory, and then the knowledge of implementation details may help to find the right set of modifications to help the assembler deal with the problem - for this reason in case of fasm I documented some of the details of how the predictions are made. But I still prefer to start with a general formulation for an abstract language, and leave the description of fasm's current implementation for an "appendix" (that is more or less what I did in fasm 1 manual, though in a very compressed form). revolution asked whether fasmg allows macros to trace the stored decisions across the passes. As l_inc rightly pointed out, this is something that never was a real problem even in case of fasm 1 - because macros are able to define as many local symbols as they need, and they can use these symbols to trace the predictions. The actual problem that applied to fasmarm was with the native instruction handlers, which in case of fasm 1 architecture had much harder way to define any internal symbols or other kind of data storage for this purpose. The macros never had such problem. |
|||
28 Mar 2015, 08:59 |
|
revolution 28 Mar 2015, 11:09
Thanks for the explanation.
Also, how does it treat floating point numbers? Specifically: What do SHR, SHL, BSR and BSF do with FP inputs? Are variables and constants tagged in any way as floats? Can we use +, -, *, / etc. to do arithmetic in float space? Can we convert from float to integer and/or integer to float? |
|||
28 Mar 2015, 11:09 |
|
Tomasz Grysztar 28 Mar 2015, 13:09
A small follow-up in the topic of the problematic "oscillating" optimizations: in my design notes for fasm2 I wrote down an idea for an additional feature that would allow attempts to solve this kind of problems by measuring something like a "stability" of input values. I did not test whether this is viable, but I hoped that the fasmg engine would allow to more easily test various ideas of this kind.
revolution wrote: Also, how does it treat floating point numbers? Specifically: What do SHR, SHL, BSR and BSF do with FP inputs? Are variables and constants tagged in any way as floats? Can we use +, -, *, / etc. to do arithmetic in float space? Can we convert from float to integer and/or integer to float? |
|||
28 Mar 2015, 13:09 |
|
revolution 28 Mar 2015, 14:31
Most useful would be the ability to define the float format. For example for ARM there are various formats used: 1, 2, 4 and 8 bytes. The 4 and 8 byte formats are the usual IEEE754. The 2 byte format has 2 variants. And the 1 byte format for some coprocessor modules.
And further to the 2 byte format one of the variants does not encode any infinity value so the exponent can be all 1's and encode a normal float value. |
|||
28 Mar 2015, 14:31 |
|
Tomasz Grysztar 28 Mar 2015, 14:52
revolution wrote: Most useful would be the ability to define the float format. For example for ARM there are various formats used: 1, 2, 4 and 8 bytes. The 4 and 8 byte formats are the usual IEEE754. The 2 byte format has 2 variants. And the 1 byte format for some coprocessor modules. PS. I have updated the preview package above, I fixed a few small bugs today. |
|||
28 Mar 2015, 14:52 |
|
codestar 28 Mar 2015, 17:19
Good work. Say good bye to \{\} forever.
What about the "save block/file" feature? As for "`", the option to use `UPPERCASE would be nice to auto-create multiple names and texts from one generic "name": NAME, 'name', ID_NAME, TYPE_NAME, name_fp, name_import, 'NAME.BMP', etc. How about ``name or `+name? Is this redefine for both define/equ? Quote: redefine seq cdr What's this .type/.size applied to macro parameters? Does . (ns.size) have a different meaning? Code: if a.size = 0 err 'Size not specified' end if if a.type = 'mem' | a.type = 'reg' ; ... end if macro parse_operand ns,op match =byte? value, op ns.size = 1 parse_operand_value ns,value ; ... ??? Is the "esc" a return from macro? Any %line, %date, %file/%name tags? |
|||
28 Mar 2015, 17:19 |
|
Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.