On my new assembler

Index > Programming Language Design > On my new assembler

Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Next

Author

Thread

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 26 Mar 2015, 15:37

A few notes about the implementation:

I have written it as a 32-bit program, using a very basic 386+ instruction set, also without any instructions that do not have the counterparts in the long mode. My idea was that a program that uses just a basic instruction set and a few registers could be more easily converted into other architectures, though I only really had x86-64 in mind. Such simple implementation also means that there are no sophisticated optimizations there, but obviously the processing speed was not my priority here.
The EBP register is left unused by this implementation. I reserved it so that it can be possibly used to address all of the assembler's variables, as such thing would be needed to create a thread-safe library.
The engine is designed in such a way, that it should be relatively easy to make a library out of it that would allow an access to internal API to extend the language with additional symbols (like instruction sets, etc.) even at a run-time. The assemblers for specific architectures (like x86) could be perhaps created in such a modular way.
Currently there are console interfaces for Windows and Linux and the Linux one links to libc, because fasmg requires a well-performing malloc/realloc API. Under Windows it uses HeapAlloc/HeapReAlloc.
The floating-point numbers are not supported yet, though the engine has the reserved places for their implementation. They would be another type of an expression value (recognizable by EQTYPE) in addition to strings and numeric (linear polynomial) values. As I felt that I'm already loosing to much time to this project, I decided I could leave out the floating-point values for now, as I did not need them for any initial examples.
A known bug: the counters created by REPEAT create a special kind of token, which is equivalent to a plain decimal number, but when it is compared with the same decimal number using MATCH, they are not considered identical. I initially left it out in the implementation of MATCH and was too lazy to get back to it.

I am preparing the package with the manual, sources and a few examples - I am going to post it here soon.

26 Mar 2015, 15:37

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 26 Mar 2015, 17:23

I like it! It is great to get rid of curly braces, back slashes and differences between preprocessor and assembling stages. Smile

On the other hand removing "common", "forward" and "reverse" seems to make the syntax a bit more complex and not so readable. (for example, I didn't understood the example with irp, repeat and indx - are there nested loops, only one loop or two sequential loops?

I also have several questions:
I understand that the speed is not important just now, but still, what is the estimated speed of FASMG related to FASM?

Isn't implementation of the instruction set through macros too slow, especially for such complex CPUs as x86 and x86-64? Or too complex? How about the size optimizations of the instructions?

Are you planning to implement built-in instruction handling in addition to FASMG? Maybe in some modular way that to allow different instruction sets to be switched in easy?

What binary formats are supported?
Are FAS files generated as in FASM? Or maybe in different format?

Sorry for so many questions, but I am really interested, because the syntax seems to be really good, addressing most of FASM flaws. Smile

BTW: If once the preprocessor and the assembler are joined, why not to join the assembly symbols and the preprocessor symbols? Why to keep two different and not compatible entities instead of one. I understand, that it will need to handle two different types of data (numbers and symbols) in one variable, but the type converting is not impossible and may even become great flexible feature. In addition, some directives will become redundant and the syntax cleaner. IMHO.

26 Mar 2015, 17:23

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 26 Mar 2015, 17:50

JohnFound wrote:

I understand that the speed is not important just now, but still, what is the estimated speed of FASMG related to FASM?

It is hard to tell, since there are very few constructs that can actually be directly compared. On the source that can be assembled with both assemblers, fasmg is usually a bit slower, though in some cases it was actually faster.

JohnFound wrote:

Isn't implementation of the instruction set through macros too slow, especially for such complex CPUs as x86 and x86-64? Or too complex?

Yes, of course the implementation of x86 as macros is incredibly complex and slow. This is what I wanted to emphasize when I wrote that the slowness of fasmg is further escalated by the fact that the only examples of a "real" assembly must be done with such complex macros, and this may give this engine a "bad publicity". In my tests the processing of any syntax through macros was about one order of magnitude slower than the native implementation. On the other hand, it was really satisfying to get it all working.

JohnFound wrote:

How about the size optimizations of the instructions?

The IF directive plus fasm's multi-pass resolving give the same results as with the native implementations of these optimizations. Actually it could be done in the same way with fasm 1, as the examples like DCPU-16 macros demonstrated.

JohnFound wrote:

Are you planning to implement built-in instruction handling in addition to FASMG? Maybe in some modular way that to allow different instruction sets to be switched in easy?

I considered fasmg to be a prototype and perhaps a base engine for fasm2, but whether I ever try to make such fasm2 depends on whether there will be any interest in such project. Because for my personal needs fasm1+fasmg is actually enough for now, and a project like fasm2 would be a huge undertaking for me.

JohnFound wrote:

What binary formats are supported?

There are no built-in formats, the examples generate everything (including relocations) by themselves.

JohnFound wrote:

Are FAS files generated as in FASM? Or maybe in different format?

There are no symbol-dumping and assembly-dumping facilities in this engine at the moment. It's the another fragment of work I decided to set aside, like the floats support.

26 Mar 2015, 17:50

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 26 Mar 2015, 17:58

Here comes the package I prepared. I am a bit tired with this project at the moment, so I have done the final packaging a bit hastily, but I hope this is enough for a first preview.

UPDATE: I have made this package available on the official download page, you can get the latest version there when any new updates are made.

Last edited by Tomasz Grysztar on 13 Jun 2015, 18:16; edited 1 time in total

26 Mar 2015, 17:58

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 26 Mar 2015, 23:35

Can macros access a previous decision from a previous pass and declare that a further pass is required?

26 Mar 2015, 23:35

l_inc

Joined: 23 Oct 2009
Posts: 881

l_inc 27 Mar 2015, 01:19

Tomasz Grysztar wrote:

Code:

irp <name,value>, arg

A nice feature I'm missing in fasm 1. There are some inaccuracies in the manual... Some proofreading might be desirable (I'd like to volunteer Smile

).

revolution

Quote:

Can macros access a previous decision from a previous pass and declare that a further pass is required?

This is possible with fasm 1. It's possible to take values from any specific previous pass and to use those in the following passes or to do smth. only if a condition is true at some specific pass etc. So I guess a mechanism of doing the same wouldn't change.

_________________
Faith is a superposition of knowledge and fallacy

27 Mar 2015, 01:19

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 27 Mar 2015, 01:48

l_inc wrote:

Quote:
Can macros access a previous decision from a previous pass and declare that a further pass is required?

This is possible with fasm 1. It's possible to take values from any specific previous pass and to use those in the following passes or to do smth. only if a condition is true at some specific pass etc. So I guess a mechanism of doing the same wouldn't change.

I mean only for the current instruction. Not from something further back.

27 Mar 2015, 01:48

l_inc

Joined: 23 Oct 2009
Posts: 881

l_inc 27 Mar 2015, 23:05

revolution
I can't unambiguously decipher your clarification, but it seems like you want to declare a need for another pass by checking if some constraint is satisfied. This is kinda convoluted way of thinking, cause multipass processing is there for satisfying such constraints. So instead of attempting to do some explicit intra-pass specific checking and pass count control you just need to specify the constraint and fasm then applies as many passes as needed to make it true.

_________________
Faith is a superposition of knowledge and fallacy

27 Mar 2015, 23:05

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 28 Mar 2015, 05:20

What I mean is something like this ARM code:

Code:

thumb
cmp r0,0x102-y
y:

If we don't take into account previous passes and the encodings that were tried then we end up with the message "error: code cannot be generated". We have to have some way to tell if previous passes had tried but failed to find a solution. The only way to solve this is to encode an "inefficient" wide instruction with a small constant value. i.e.

Code:

F1B00FFE 7M ---> cmp r0,0xFE

Without knowledge of previous passes we get stuck in a loop of alternating encoding narrow and wide instructions over and over.

Note that the alternative narrow encoding is this:

Code:

28FE V4T ---> cmp r0,0xFE

But we can't use it because it causes a new pass when "y" has a new value and it then needs to encode "cmp r0,0x100" which only fits in a wide encoding.

28 Mar 2015, 05:20

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 28 Mar 2015, 08:59

The revolution's question is about the "oscillator problem", when some conditional assembly (like when trying to optimize an instruction to a shortest form) causes the initial condition to be changed to its opposite, and this alternates between the consecutive passes. The stream of passes with alternating conditions then never ends and the solution cannot be found.

[Note: the samples I provide below assemble exactly the same with either fasm 1.71 or fasmg pre-release]

Let's have a look at this simplified example:

Code:

if v and 1 = 0
  db v shr 1  ; short opcode
else
  db 80h,v    ; long opcode
end if
v:

This would be an instruction in some hypothetical architecture, which has a short form for even addresses, but requires a long form when the address is odd. When it is formulated as above, it is in fact a plainly self-contradictory source, so it is nothing surprising that assembler is not able to find a solution. It is just like a classic example of "if not defined, then defined here" antinomy in fasm.

So, to make a source that is not self-contradictory, just like in the example of "if ~defined", we need to modify conditions a bit. This one gets correctly resolved to a long form of instruction:

Code:

if v and 1 = 0 & ~broken
  db v shr 1  ; short opcode
  broken = 0
else
  db 80h,v    ; long opcode
  broken = 1
end if
v:

(also, if additional byte was put just before "v", it would also resolve correctly, this time to a short form)

You can look at it from the point of view of tracing the changes of values during the consecutive passes (though that requires some knowledge of fasm's implementation) - the symbol "broken" then simply traces that the optimization to short form failed at least once, and it prevents it from being chosen again (thus preventing the oscillation).
But I prefer to describe the problems like above in the terms of logical contradictions and removing them, because then this is less dependent on the particular implementation, and is more a general feature of language. Of course it is possible that current implementation of fasm or fasmg may not be able to find a solution even in case, when source is not self-contradictory, and then the knowledge of implementation details may help to find the right set of modifications to help the assembler deal with the problem - for this reason in case of fasm I documented some of the details of how the predictions are made. But I still prefer to start with a general formulation for an abstract language, and leave the description of fasm's current implementation for an "appendix" (that is more or less what I did in fasm 1 manual, though in a very compressed form).

revolution asked whether fasmg allows macros to trace the stored decisions across the passes. As l_inc rightly pointed out, this is something that never was a real problem even in case of fasm 1 - because macros are able to define as many local symbols as they need, and they can use these symbols to trace the predictions. The actual problem that applied to fasmarm was with the native instruction handlers, which in case of fasm 1 architecture had much harder way to define any internal symbols or other kind of data storage for this purpose. The macros never had such problem.

28 Mar 2015, 08:59

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 28 Mar 2015, 11:09

Thanks for the explanation.

Also, how does it treat floating point numbers? Specifically: What do SHR, SHL, BSR and BSF do with FP inputs? Are variables and constants tagged in any way as floats? Can we use +, -, *, / etc. to do arithmetic in float space? Can we convert from float to integer and/or integer to float?

28 Mar 2015, 11:09

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 28 Mar 2015, 13:09

A small follow-up in the topic of the problematic "oscillating" optimizations: in my design notes for fasm2 I wrote down an idea for an additional feature that would allow attempts to solve this kind of problems by measuring something like a "stability" of input values. I did not test whether this is viable, but I hoped that the fasmg engine would allow to more easily test various ideas of this kind.

revolution wrote:

Also, how does it treat floating point numbers? Specifically: What do SHR, SHL, BSR and BSF do with FP inputs? Are variables and constants tagged in any way as floats? Can we use +, -, *, / etc. to do arithmetic in float space? Can we convert from float to integer and/or integer to float?

As I wrote a few posts earlier, I did not implement floats in this first version - I decided that this is one the features that could be left out initially, to reduce the amount of work required to get the first version out. But I designed the data structures with the floats in mind - when I add them, they are going to be another type of expression values, like the strings or numeric ones. Every operator is aware of the types of input values and can act accordingly - currently it is usually the conversion of string to numeric value, when a numeric operator is applied (for example (+'a') is a numeric value, while pure ('a') is still a string), and the "string" operator allows the conversion in the opposite direction (like (string 'Hello'+20h)). The "float" operator would allow conversion of standard numeric value to float, but the exact behavior of all the other operators would still need to be decided. Probably some of them would simply disallow input values of float type (and also "float" could disallow input value of string type, as this would still be enforceable with "float +"), but at least the standard arithmetic on floats would be a nice thing to have.

28 Mar 2015, 13:09

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 28 Mar 2015, 14:31

Most useful would be the ability to define the float format. For example for ARM there are various formats used: 1, 2, 4 and 8 bytes. The 4 and 8 byte formats are the usual IEEE754. The 2 byte format has 2 variants. And the 1 byte format for some coprocessor modules.

And further to the 2 byte format one of the variants does not encode any infinity value so the exponent can be all 1's and encode a normal float value.

28 Mar 2015, 14:31

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 28 Mar 2015, 14:52

revolution wrote:

Most useful would be the ability to define the float format. For example for ARM there are various formats used: 1, 2, 4 and 8 bytes. The 4 and 8 byte formats are the usual IEEE754. The 2 byte format has 2 variants. And the 1 byte format for some coprocessor modules.

And further to the 2 byte format one of the variants does not encode any infinity value so the exponent can be all 1's and encode a normal float value.

What I planned was like fasm1 does it: use own high-precision format (at least quad precision) for handling of the floating point value until it is finally used with DD or DQ, or other such directive. Only then the final conversion into selected format would occur. And if I added some additional operators for extraction of fp fields, then one could also create custom formats with macros, in addition to the IEEE ones created with standard data directives.

PS. I have updated the preview package above, I fixed a few small bugs today.

28 Mar 2015, 14:52

codestar

Joined: 25 Dec 2014
Posts: 254

codestar 28 Mar 2015, 17:19

Good work. Say good bye to \{\} forever.

What about the "save block/file" feature?

As for "`", the option to use `UPPERCASE would be nice to auto-create multiple names and texts from one generic "name": NAME, 'name', ID_NAME, TYPE_NAME, name_fp, name_import, 'NAME.BMP', etc. How about ``name or `+name?

Is this redefine for both define/equ?

Quote:

redefine seq cdr

Do if/define/equ work the same together? Have you considered a type of "if symbolic" (ifs)?

What's this .type/.size applied to macro parameters? Does . (ns.size) have a different meaning?

Code:

if a.size = 0
  err 'Size not specified'
end if
if a.type = 'mem' | a.type = 'reg'
  ; ...
end if

macro parse_operand ns,op
        match =byte? value, op
                ns.size = 1
                parse_operand_value ns,value
; ... ???

I thought the "else match" syntax may be impossible, since there is no way to tell by looking at "else" itself that it corresponds to the match.

Is the "esc" a return from macro?

Any %line, %date, %file/%name tags?

28 Mar 2015, 17:19

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 28 Mar 2015, 18:22

codestar wrote:

What about the "save block/file" feature?

It was not in the initial plan, but it may be added later.

codestar wrote:

As for "`", the option to use `UPPERCASE would be nice to auto-create multiple names and texts from one generic "name": NAME, 'name', ID_NAME, TYPE_NAME, name_fp, name_import, 'NAME.BMP', etc. How about ``name or `+name?

Any advanced source-text processing of this kind should be achievable with EVAL. And it allows to do much more than just that.

codestar wrote:

Is this redefine for both define/equ?

REDEFINE is like a two-liner RESTORE+DEFINE; REEQU is like RESTORE+EQU.

codestar wrote:

Do if/define/equ work the same together?

The EQU is now affected by the IF, but if symbolic variable is used in the arguments to IF, it is still evaluated as an equivalent text of its value.

codestar wrote:

Have you considered a type of "if symbolic" (ifs)?

I considered something like IFDEF/IFNDEF that would work with symbolic variables, because (for the reason mentioned above) the "IF DEFINED" syntax cannot be used with symbols defined by EQU/DEFINE. But this is one of the many potential features that I had not found really necessary, yet.

codestar wrote:

What's this .type/.size applied to macro parameters? Does . (ns.size) have a different meaning?

The "ns" is a parameter and it is replaced with the value of that parameter, which is the name of the parent namespace for the "size" symbol. Note that "." is now a special character - so when "ns" has a value of "@src", "ns.size" becomes simply "@src.size".

codestar wrote:

Is the "esc" a return from macro?

No, it just for the purpose of adding a line to macro body without disturbing the nesting constraints.

codestar wrote:

Any %line, %date, %file/%name tags?

Could be added quite easily. Note that in fasmg I use the rule that built-in symbols with names starting with "$" are the expression-class symbols, while the names starting with "%" always refer to parameters. You can check out the "%t" definition in TABLES.INC for an example.

28 Mar 2015, 18:22

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 28 Mar 2015, 18:41

I have updated the package once again - I decided to switch to version number based on the timestamp.

28 Mar 2015, 18:41

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 28 Mar 2015, 19:41

Tomasz Grysztar wrote:

I have updated the package once again - I decided to switch to version number based on the timestamp.

Tomasz, do you use some version control system? If so, have you thought about publishing the repository? The chance to follow the creation of something good is really interesting and useful.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9

28 Mar 2015, 19:41

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 28 Mar 2015, 19:49

JohnFound wrote:

Tomasz, do you use some version control system? If so, have you thought about publishing the repository? The chance to follow the creation of something good is really interesting and useful.

I used the Fossil SCM - I initially wanted to use git, because I already had some experience with it, but after you recommended Fossil I decided to give it a try and I found out that it suits my needs much better. I am undecided about publishing the repository, though.

28 Mar 2015, 19:49

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 28 Mar 2015, 21:02

Tomasz Grysztar wrote:

... but after you recommended Fossil I decided to give it a try...

That is why I asked. Wink

It is great, because if there is an existing repository, regardless of your decision in this very moment, there is always a hope it will be published later. Smile

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9

28 Mar 2015, 21:02

Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Next

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum