flat assembler
Message board for the users of flat assembler.

Index > High Level Languages > Is FASM a good Backend language for creating a compiler?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Mino



Joined: 14 Jan 2018
Posts: 163
Mino 26 Jan 2018, 18:08
Good morning, everybody,
before I started learning assembler language, and after having already practiced a bit some FASM, I wanted to know if FASM was a good assembler language as a target language for a compiler. According to me, yes. But I'd like to hear your opinion!
Here are my expectations for this language for the compiler:

    - Must have virtually complete control over the hardware,
    - Must be portable (Windows/Unix),
    - Must be fast,
    - Must be powerful,
    - Must be easy to debug.

With which other languages is FASM compatible?
And a final question, what are the strengths and weaknesses of FASM as compiler backend?
Thank you anyway, and good day!

_________________
The best way to predict the future is to invent it.
Post 26 Jan 2018, 18:08
View user's profile Send private message Reply with quote
alexfru



Joined: 23 Mar 2014
Posts: 80
alexfru 27 Jan 2018, 06:40
FASM is not an "assembler language". FASM is an assembler (or compiler for assembly language).

FASM is fast. Dunno about how it deals with debugging info (never needed it from FASM). The rest is largely irrelevant for the assembler and is mostly relevant for the compiler itself.

One little problem that I don't like about FASM is that if your compiler generates code like the following:

code
data
more code
more data
...

you'll need to take additional care of combining those fragments of code into a single .text section and those fragments of data into a single .data section (or two: .data and .bss). FASM doesn't do that automatically for you. NASM and YASM do this combining for you (btw, NASM is slow).

Unless your compiler generates a single assembly file (or one that includes all others) for the entire program, you'll also need a linker. This isn't a FASM-specific problem. Many assemblers just assemble one file (unless it includes more files) and produce just one file, which is raw binary (just your code and data, no extra metadata), object (suitable for combining with other such files) or some executable format (more complex than raw binary) ready for execution by the OS.

My C compiler uses NASM by default but can use FASM as well. However, there's an additional tool invoked to convert from NASM syntax (which is the syntax used by my compiler) to FASM syntax and to combine section fragments (explained above). And it all works well for 32-bit code (not tried 64-bit as my compiler doesn't support that (yet?); 16-bit code is a bit problematic, but you probably don't care about it anyway). My compiler's linker does not produce any debugging info other than a map file, so, like I said, I don't know if FASM generates it or if it's any good.
Post 27 Jan 2018, 06:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20357
Location: In your JS exploiting you and your system
revolution 27 Jan 2018, 06:45
alexfru wrote:
One little problem that I don't like about FASM is that if your compiler generates code like the following:

code
data
more code
more data
...

you'll need to take additional care of combining those fragments of code into a single .text section and those fragments of data into a single .data section (or two: .data and .bss). FASM doesn't do that automatically for you. NASM and YASM do this combining for you (btw, NASM is slow).
NASM and YASM don't generate exe files directly, they use linkers for the last step. This is the same for fasm, you use a linker for the last step if you want to combine all similar sections into one.

It is only fasm that has an additional feature to directly generate the exe file, and for that you need to do the section combining yourself.
Post 27 Jan 2018, 06:45
View user's profile Send private message Visit poster's website Reply with quote
alexfru



Joined: 23 Mar 2014
Posts: 80
alexfru 27 Jan 2018, 06:55
revolution wrote:
alexfru wrote:
One little problem that I don't like about FASM is that if your compiler generates code like the following:

code
data
more code
more data
...

you'll need to take additional care of combining those fragments of code into a single .text section and those fragments of data into a single .data section (or two: .data and .bss). FASM doesn't do that automatically for you. NASM and YASM do this combining for you (btw, NASM is slow).
NASM and YASM don't generate exe files directly, they use linkers for the last step. This is the same for fasm, you use a linker for the last step if you want to combine all similar sections into one.

It is only fasm that has an additional feature to directly generate the exe file, and for that you need to do the section combining yourself.


It's not about linking here. A single C source file (from which you typically make a single assembly/object file) can contain multiple chunks of code and data, not all neatly separated into all variables at the beginning of the file and all functions at the end. The same would be true of many other languages.
Post 27 Jan 2018, 06:55
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20357
Location: In your JS exploiting you and your system
revolution 27 Jan 2018, 08:50
alexfru wrote:
[It's not about linking here. A single C source file (from which you typically make a single assembly/object file) can contain multiple chunks of code and data, not all neatly separated into all variables at the beginning of the file and all functions at the end. The same would be true of many other languages.
Yes. The HLL compiler produces a file that NASM. YASM. or fasm assembles into an object file. The last step is the linker that combines all the sections as needed. The only special thing is that fasm has an extra mode of directly generating an exe if you use "format PE", or "format elf executable", or similar. NASM and YASM don't support that, they always need the linker. So you don't need to make the compiler produce a special output that combines all the sections, you "should" instead use the assembler to make an object file and a linker to follow. It is the HLL way.
Post 27 Jan 2018, 08:50
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 27 Jan 2018, 11:06
My opinion was always what revolution already said here, that it is the linker's job to combine sections, even ones coming from a single object file (it needs to do it anyway when linking multiple objects).

However there exist methods to move and combine code and data at the assembly time with fasm, and the feature of "continuable" VIRTUAL blocks that was recently back-ported from fasmg makes them even easier. So if your approach is not to use a linker and emit the executable directly, you can use the tricks with VIRTUAL to move all data definitions into a single place anywhere.
Post 27 Jan 2018, 11:06
View user's profile Send private message Visit poster's website Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 163
Mino 27 Jan 2018, 11:15
Thank you for your explanations!
For the different sections (data, code,...), I know what I'm going to do so that I don't mix everything, and give the code as simple and legible as possible, without going through a multitude of layers.
To help me generate the FASM code, I would create a virtual machine with its own (virtual) assembler, which would serve as an intermediate language between the high level and FASM.
This language will be designed to optimize the generated FASM code, and make the translation easier.
Post 27 Jan 2018, 11:15
View user's profile Send private message Reply with quote
alexfru



Joined: 23 Mar 2014
Posts: 80
alexfru 27 Jan 2018, 13:22
revolution wrote:
alexfru wrote:
[It's not about linking here. A single C source file (from which you typically make a single assembly/object file) can contain multiple chunks of code and data, not all neatly separated into all variables at the beginning of the file and all functions at the end. The same would be true of many other languages.
Yes. The HLL compiler produces a file that NASM. YASM. or fasm assembles into an object file. The last step is the linker that combines all the sections as needed.

Except, it can't be done that way with FASM + linker. I've tried and the result was unsatisfactory. If you've forgotten the relevant thread, here's the gist of it...

If you get multiple .text sections in the same object file, each of those comes with its own alignment that the linker should honor. This means the linker will insert something in the middle of the code (zeroes or NOPs). This may not only make the code larger and slower but it may also break it (not only because zeroes may be inappropriate but because code/data may shift away from the label that's supposed to be right before it). Similar issues can happen with data sections.

If, OTOH, I force alignment to 1 byte to avoid extraneous code/data insertion, then I can't align things.

You should probably try it yourself in order to see the problem and stop saying that it can be solved with a linker. It can't. For things to work either the input should not be fragmented or some other tricks are needed.

Or FASM should learn to combine section fragments and not pretend they're separate sections, which happen to share the same name. But I'm not hoping or expecting it to happen. I already have a tool that combines the fragments.
Post 27 Jan 2018, 13:22
View user's profile Send private message Reply with quote
alexfru



Joined: 23 Mar 2014
Posts: 80
alexfru 27 Jan 2018, 13:29
Tomasz Grysztar wrote:
My opinion was always what revolution already said here, that it is the linker's job to combine sections, even ones coming from a single object file (it needs to do it anyway when linking multiple objects).


You assume that section fragments do not contain parts of the same subroutine or parts of the same variable. You assume that every section fragment contains a subroutine or a variable in its entirety (or multiple whole subroutines/variables). And that's the problem that can't be fixed with a linker.
Post 27 Jan 2018, 13:29
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20357
Location: In your JS exploiting you and your system
revolution 27 Jan 2018, 13:37
alexfru wrote:
[You should probably try it yourself in order to see the problem and stop saying that it can be solved with a linker. It can't. For things to work either the input should not be fragmented or some other tricks are needed.
I gave up with HLLs a long time ago, and I wouldn't know how to drive one now. So you might be right. however, if fasm creates bad code then it should be reported as a bug.


Last edited by revolution on 27 Jan 2018, 14:00; edited 1 time in total
Post 27 Jan 2018, 13:37
View user's profile Send private message Visit poster's website Reply with quote
alexfru



Joined: 23 Mar 2014
Posts: 80
alexfru 27 Jan 2018, 13:54
revolution wrote:
alexfru wrote:
[You should probably try it yourself in order to see the problem and stop saying that it can be solved with a linker. It can't. For things to work either the input should not be fragmented or some other tricks are needed.
I gave with HLLs a long time ago, and I wouldn't know how to drive one now. So you might be right. however, if fasm creates bad code then it should be reported as a bug.

One man's bug is another man's feature. Smile If you don't see it a problem, well, then it's not a problem. For you it's not. And so you won't fix it as there's nothing to fix. Right? Smile
Btw, is this at least documented somewhere? Can't find it. Perhaps, it should be? It's a feature to brag about! Smile
Post 27 Jan 2018, 13:54
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20357
Location: In your JS exploiting you and your system
revolution 27 Jan 2018, 13:59
alexfru wrote:
And so you won't fix it as there's nothing to fix. Right?
It's not my code to fix though. But seriously, if there is a bug please report it. Preferably with some simple example code if possible.
Post 27 Jan 2018, 13:59
View user's profile Send private message Visit poster's website Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 163
Mino 27 Jan 2018, 15:17
revolution wrote:
alexfru wrote:
And so you won't fix it as there's nothing to fix. Right?
It's not my code to fix though. But seriously, if there is a bug please report it. Preferably with some simple example code if possible.


And what bug exactly, I didn't follow/understand everything?

_________________
The best way to predict the future is to invent it.
Post 27 Jan 2018, 15:17
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 27 Jan 2018, 18:33
alexfru wrote:
You assume that section fragments do not contain parts of the same subroutine or parts of the same variable. You assume that every section fragment contains a subroutine or a variable in its entirety (or multiple whole subroutines/variables). And that's the problem that can't be fixed with a linker.
Yes, the semantics of fasm's directives like SECTION generally do assume that you put complete entities (like functions or data structures) within such declared blocks of source, because fasm preserves the exact order and specification of sections as defined in source text. A linker may or may not then combine the sections in an expected way, but this may be implementation-dependent. Note that in many cases SECTION directive has a default alignment greater than 1 and then even if linker is kind enough to combine the sections in the right order, it still needs to insert some additional bytes between the fragments to align them appropriately.

The problem you have is related to an expectation of a different semantic meaning, a need for a set of directives that could switch between "output streams" to place instructions or data definitions in a separate streams, each defining a content of a different section, etc. This way you could freely mix statements belonging to different areas.

In case of fasm, such semantics can be (and are) introduced with macros, like globals that are used in various forms across the board. This have been made even easier and more flexible with fasmg, where macros can be forward-referenced. On the other hand, fasmg also introduced more options of handling growing streams of data in VIRTUAL blocks and in its case this is a much better method of introducing such semantics:
Code:
; This is just a demonstration of a method, not a complete framework:

format PE64 NX GUI 5.0
entry start

include 'win64a.inc'

; First, lay out the general structure of executable and put the appropriate content in the right places:

section '.data' data readable writeable

        virtual
                @@data::
        end virtual

        load content:@@data.size from @@data:$
        db content

section '.text' code readable executable

        virtual
                @@code::
        end virtual

        load content:@@code.size from @@code:$
        db content

section '.idata' import data readable writeable

     library kernel32,'KERNEL32.DLL',\
             user32,'USER32.DLL',\
             msvcrt,'MSVCRT.DLL'

        include 'api/kernel32.inc'
        include 'api/user32.inc'

     import msvcrt,sprintf,'sprintf'

macro @code
        end virtual
        virtual @@code
end macro

macro @data
        end virtual
        virtual @@data
end macro

postpone
        end virtual
        virtual @@data
                @@data.size = $-$$
        end virtual
        virtual @@code
                @@code.size = $-$$
        end virtual
end postpone

virtual @@code

; Now the actual code and data definitions:

@code

  start:
        sub     esp,18h

@data

  number dd 18769

@code

        fild    [number]
        fsqrt

@data

  tmp64 dq ?

@code
        fst     [tmp64]

@data

  message db '%f',0
  buffer db 100h dup ?

@code

        mov     r8,[tmp64]
        lea     rdx,[message]
        lea     rcx,[buffer]
        call    [sprintf]

        invoke  MessageBox,HWND_DESKTOP,buffer,"test",MB_OK
        invoke  ExitProcess,0    

Some of the new features of VIRTUAL blocks have been back-ported to fasm 1, but the above sample cannot be converted to fasm 1 without a large alterations, as in fasm 1 LOAD is much more limited, it cannot forward-reference data and can load only up to 8 bytes at a time.

Also in case of fasm 1 there is a potential problem when moving code around in a relocatable output, since relocation entries are not generated for instructions inside VIRTUAL blocks. It seemed like a good idea at the time but I may actually reconsider this (with some new experience brought by fasmg).
Post 27 Jan 2018, 18:33
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 27 Jan 2018, 18:47
alexfru wrote:
If, OTOH, I force alignment to 1 byte to avoid extraneous code/data insertion, then I can't align things.
You could align only the first section (to the largest alignment boundary that would be needed) and define an "origin" label in the beginning of that section. Then all the remaining sections could have declarations with "align 1" and for alignment the "align" macro basing on the "$-origin" offset would have to be used.

If the linker is kind enough, this may work. But still, this would be trying to work around using fasm's SECTION for a different semantics than it was designed for.
Post 27 Jan 2018, 18:47
View user's profile Send private message Visit poster's website Reply with quote
yeohhs



Joined: 19 Jan 2004
Posts: 195
Location: N 5.43564° E 100.3091°
yeohhs 27 Jan 2018, 23:43
Mino wrote:
And what bug exactly, I didn't follow/understand everything?


You'll need to understand linkers and loaders, i.e. background information.
See http://www.iecc.com/linker/

Also from here:
http://norfs.sourceforge.net/linkers_and_loaders.pdf
Post 27 Jan 2018, 23:43
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 07 Feb 2018, 20:37
Wikipedia wrote:

PureBasic supports inline assembly, allowing the developer to include FASM assembler commands within PureBasic source code, while using the variables declared in PureBasic source code, enabling experienced programmers to improve the speed of speed-critical sections of code.


There's also Context (freeware with sources).

Context 2.2 for Windows and Linux wrote:

To produce executable files assembler FASM (www.flatassembler.net) are used. Compiler was tested with FASM 1.49. Previous versions uses Borland TASM32 and TLINK32. The main reason of this change is using of freeware product.
Post 07 Feb 2018, 20:37
View user's profile Send private message Visit poster's website Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 116
Location: Argentina
pabloreda 08 Feb 2018, 02:01
I use FASM for make standalone executables for my lang
for me FASM is the best!
Post 08 Feb 2018, 02:01
View user's profile Send private message Visit poster's website Reply with quote
fabbel



Joined: 30 Oct 2012
Posts: 84
fabbel 09 Feb 2018, 15:15
Hello Thomas :
Your code / tricksusing virtual block to build code sections dynamically looks very interesting - however i am worried by your remark :
us said
"Also in case of fasm 1 there is a potential problem when moving code around in a relocatable output, since relocation entries are not generated for instructions inside VIRTUAL blocks. It seemed like a good idea at the time but I may actually reconsider this (with some new experience brought by fasmg)."

=> do you have any plan to change this in the (near) future ?

I am trying to use fasm to develop a dll (hence relocatable module by definition)
=> i am worried this would then prevents me from fully leveraging what you presented here...

Thanks vm.
And keep up the nice work with fasm / fasmg
Post 09 Feb 2018, 15:15
View user's profile Send private message Reply with quote
6a05



Joined: 31 Mar 2018
Posts: 3
6a05 01 Apr 2018, 20:22
Mino wrote:
I wanted to know if FASM was a good assembler language as a target language for a compiler.

The choice of assembler (i.e. program which translates mnemonics into machine code) is rather neglable here. You can really use any assembler you want and like. It is because assembler doesn't affect your code in any way - what you put on assembler's input is what you get in binary output.

However, it's nice to choose project, which:

    - is still maintained, so you're up to date with latest instruction set,
    - has ports to all OS-es you want to support,
    - is fast (this point is also neglable as long as you don't use your toolchain on production).


Also, syntax differencies between various assemblers are not so big, so if you'll prepare code generator for one, it should be easy to switch to another one if needed.

So my final advice is: don't think about what assembler to choose at the begin, just create working compiler first.
Post 01 Apr 2018, 20:22
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.