Why two layers of preprocessing in fasm1?

Index > Compiler Internals > Why two layers of preprocessing in fasm1?

Author

Thread

l4m2

Joined: 15 Jan 2015
Posts: 670

l4m2 27 Dec 2016, 10:52

It makes such code in fasmg can't be written in fasm1. Or it's just designing mistake?

Code:

if lab2-lab1>8
  macro foo
    do something
  end macro
end if
lab1:
some code
lab2:

27 Dec 2016, 10:52

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8543
Location: Kraków, Poland

Tomasz Grysztar 28 Dec 2016, 13:22

The main reason for this separation was the performance. After the initial preprocessing (which was originally a very simple operation, fasm's preprocessor became more sophisticated over the years) the source text is converted into an internal "byte code" which is much faster to process during the multiple assembly passes. This internal code has no longer anything to do with original text, therefore any text processing (like macro processing) is no longer possible after this conversion.

For example "mov eax,offset+1" line becomes a sequence of codes that look like:
[code: instruction; pointer to MOV handler]
[code: register; size/type: 4, number: 0]
[code: comma]
[code: numeric expression; micro-operations: push value from a label structure at given pointer, push number 1, add two numbers]

There is no longer a need to recognize any names, symbols like instruction names and labels are replaced with pointers to appropriate handlers or data structures. This allows fasm 1 to do multiple passes over such source quite quickly.

The idea of fasm 2 (now implemented in fasmg) was to get rid of these layers, but it required abandoning the internal code, and even when I first talked about fasm 2 at fasmcon 2009 I warned that for this reason fasm 2 would be much slower - because it would need to perform all the assembly passes on the source text. In fasmg the text is pre-tokenized (in a manner similar to how preprocessor was doing it in fasm 1), but the textual content of the tokens needs to be preserved. The macros and other features like namespace switching may cause the same name to refer to many different entities depending on context, so any name needs to be re-interpreted every time it is encountered.

On the other hand, the modern machines are fast enough that even assembly with an instruction set implemented in a form of macros is bearable on small projects, as fasmg's self-assembly demonstrates. In fact, I am so satisfied with these macros and how flexible they are that I no longer feel any pressure to create fasm 2 with internal x86 encoder, I just want to write more and more macros. Wink

28 Dec 2016, 13:22

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 28 Dec 2016, 14:50

Tomasz Grysztar wrote:

In fact, I am so satisfied with these macros and how flexible they are that I no longer feel any pressure to create fasm 2 with internal x86 encoder, I just want to write more and more macros.

But what is the asymptotic behavior of fasmg on big sources? This can be much more important than the speed of compilation on particular (newer or older) processors.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9

28 Dec 2016, 14:50

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8543
Location: Kraków, Poland

Tomasz Grysztar 28 Dec 2016, 15:36

JohnFound wrote:

But what is the asymptotic behavior of fasmg on big sources? This can be much more important than the speed of compilation on particular (newer or older) processors.

Asymptotic as a function of what variable? Any increase in the number of required passes is going to have a strong effect on the overall assembly time, but this is not a simple function of the size of source - you can have small source requiring many passes, or a large one requiring only a few.

In cases where the number of passes stays constant, the assembly with fasmg should be slower by a constant factor compared to fasm 1. You could very roughly approximate it by imagining how many lines from nested macros fasmg needs to process where fasm 1 processed just one line with internally handled instruction.

28 Dec 2016, 15:36

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 28 Dec 2016, 19:38

No, I am talking about the compilation time as a function of the source length (or lines of code, or number of the labels, which is approximately the same).

fasm1 since v1.51 seems to have O(n), while v1.50 looks like O(n^2) or even worse:

28 Dec 2016, 19:38

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8543
Location: Kraków, Poland

Tomasz Grysztar 28 Dec 2016, 20:47

OK, if you are interested in some pure engine comparisons, I did some tests with very simple but huge sources generated with a fasmg script that looks like:

Code:

N = 4000000

db 'test0 = 0',13,10
rept N, i:0,j:1
        db 'test',`j,' = (test',`i,'*33797+1) and 0FFFFFFFFh',13,10
        if %=%%
                db 'dd test',`j
        end if
end rept

The result on my machine are:

Code:

N               fasm 1.71.58    fasm g.hll54

100000          0.3 s           0.2 s
200000          0.6 s           0.3 s
300000          1.1 s           0.6 s
400000          1.5 s           0.8 s
600000          2.3 s           1.3 s
800000          3.1 s           1.8 s
1000000         4.1 s           2.1 s
2000000         8.7 s           4.1 s
4000000         out of memory   8.1 s

If we wanted to test more constructions, like an actual instructions etc. we would need to use macro implementations for fasmg and it would get slower by at least an order of magnitude, since an usual x86 macro requires processing a few dozen lines. But this should be a constant factor, so it should not matter for asymptotic behavior (but keep in mind the potential catastrophe caused by growing number of passes). Also, every macro call eats a bit of a memory and on a huge sources where every line causes several nested macro calls, this escalates quickly.

In the past I also performed similar tests with semi-randomized label names and deeper cross-referencing between values, but the results were similar.

28 Dec 2016, 20:47

l_inc

Joined: 23 Oct 2009
Posts: 881

l_inc 30 Dec 2016, 01:13

It's important to note that the comparison is made for the initial pass only. To make the comparison fair it's important to compare subsequent passes as well:

Code:

N = 4000000
P = 2

rept 1, v:P
        db 'P = ',`v,13,10
end rept
db 'if P > 2',13,10
db '    PASS = PASS + (1-PASS/(P-2))',13,10
db 'end if',13,10
rept N, i:0,j:1
        db 'test',`j,' = (test',`i,'*33797+1) and 0FFFFFFFFh',13,10
end rept
db 'test0 = 0',13,10
rept 1, v:N
        db 'dd test',`v,13,10
end rept

The following tests are made with fixed N = 4000000 and varying P ∊ {2..8}:

Code:

      fasm 1.71.57               fasm g.hnq73
2 passes, 10.9 seconds     2 passes, 11.3 seconds
3 passes, 11.2 seconds     3 passes, 16.7 seconds
4 passes, 11.6 seconds     4 passes, 21.7 seconds
5 passes, 11.8 seconds     5 passes, 27.1 seconds
6 passes, 12.0 seconds     6 passes, 32.0 seconds
7 passes, 12.3 seconds     7 passes, 37.6 seconds
8 passes, 12.4 seconds     8 passes, 42.3 seconds

Results for the original fasmg script on my machine:

Code:

      fasm 1.71.57               fasm g.hnq73
1 passes, 10.7 seconds     1 pass, 5.9 seconds

P.S. To summarize, the first pass is about 2 times more expensive for fasm 1 (1.8 for my machine), but each subsequent pass is about 20 times cheaper (21.4 for my machine).

_________________
Faith is a superposition of knowledge and fallacy

30 Dec 2016, 01:13

l4m2

Joined: 15 Jan 2015
Posts: 670

l4m2 30 Dec 2016, 02:42

What about the kinds of loop(1rept,1repeat,1hardloop,grept,ghardloop)

30 Dec 2016, 02:42

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8543
Location: Kraków, Poland

Tomasz Grysztar 30 Dec 2016, 09:51

l_inc wrote:

It's important to note that the comparison is made for the initial pass only. To make the comparison fair it's important to compare subsequent passes as well (...)

As I mentioned repeatedly, because the passes made by fasmg operate on the source text, every consecutive pass should take approximately the same time as the first one, so the total time of the assembly is roughly equal to the time of single pass multiplied by the number of passes. So when a single pass takes a considerable time, any increase in the number of passes will cause a substantial lengthening of the total time - I called it a potential "catastrophe" above. But since there is no simple correspondence between the length of source text and the number of passes, I saw it as an independent problem from the asymptotic data that JohnFound asked for. But at the same time I noted that this should be kept in mind, because it is going to matter in practical applications.

In case of fasm 1 the expensive preprocessing and parsing are done only once and the passes are performed quickly over the internal bytecode (as explained above), so adding another pass has little effect on the overall time. If fasm displayed the durations of preprocessing and parsing stages, you could notice that all the assembly passes also take approximately the same time each - but the overhead of parsing increases the total length.

l4m2 wrote:

What about the kinds of loop(1rept,1repeat,1hardloop,grept,ghardloop)

Yes, there is another place where the same effect shows up - the assembly loops (like REPEAT or WHILE in fasm 1). Again, fasm 1 preprocesses and parses the text of such loop only once (that's why you cannot have labels local to single repetition in fasm 1) and then repeatedly processes the internal bytecode, while fasmg operates on the source text every time. So you should find REPEAT and WHILE much slower than in fasm 1, too. On the other hand, REPT should be comparable.

PS. l_inc, nice to see you show up again to give a sober comment. And I'm glad that you still find some time to play with fasm (1 or g).

30 Dec 2016, 09:51

l_inc

Joined: 23 Oct 2009
Posts: 881

l_inc 30 Dec 2016, 14:59

Tomasz Grysztar

Quote:

As I mentioned repeatedly...

I'm not saying you did not warn about differences in source processing behaviour and corresponding timing consequences. I know you did, many times since the very appearance of fasmg. I'm just saying the actual numbers are incomplete. I see no reason to limit oneself to comparison of textual source code processing if fasm 1 uses intermediate byte-code interpretation as well.

Quote:

But since there is no simple correspondence between the length of source text and the number of passes, I saw it as an independent problem from the asymptotic data that JohnFound asked for.

The asymptotic timing behaviour can be expressed by a multivariate function.

Quote:

adding another pass has little effect on the overall time

Sure. It's just nice to have this "little" being expressed in numbers. I.e., 20 times less than fasmg's textual processing. Especially because, as you note, this "little" is not limited to subsequent passes, but also to subsequent assembly-time loop iterations, both of which are common causes of long compilation times.

Quote:

nice to see you show up again to give a sober comment. And I'm glad that you still find some time to play with fasm (1 or g)

Unfortunately I don't. I hope I will — maybe, in a few months — as I still have community-supportive tasks on my todo-list. But thank you for the motivation.

_________________
Faith is a superposition of knowledge and fallacy

30 Dec 2016, 14:59

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8543
Location: Kraków, Poland

Tomasz Grysztar 30 Dec 2016, 16:09

l_inc wrote:

The asymptotic timing behaviour can be expressed by a multivariate function.

Hence my initial request for clarification of the variables that question concerned. The assembly time grows linearly both with respect to the number of symbols and the number of passes, but these variables are not completely independent (though sometimes they might be, like in your synthetic example). In cases when there is a strong correlation between the length of source text and the number of passes required to assemble it, these two linear growths combine to something quadratic.

l_inc wrote:

Unfortunately I don't. I hope I will — maybe, in a few months — as I still have community-supportive tasks on my todo-list. But thank you for the motivation.

And I would like to thank you for your continued support as well. Your valuable comments in the past years kept me believing that there are still interesting things to do (and improve) with fasm.

30 Dec 2016, 16:09

ACP

Joined: 23 Sep 2006
Posts: 204

ACP 06 Jan 2017, 14:53

Sorry for interrupting such interesting thread and making an off-topic post but I am in the processes of rewriting very specific assembler, and that gives me a chance to (re)design it as well. Since this was a very simple multi pass assembler (actually 2 passes only as the syntax is so simple) I am thinking about extending syntax and making it truly a multi pass assembler. This gives an opportunity to either work internally on source code representation after preprocessing it or using fasmg approach.

I'd like to hear Tomasz opinion on which method you think is the future of assemblers. I don't have any speed/memory constrains to consider as this is a cross assembler.

06 Jan 2017, 14:53

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8543
Location: Kraków, Poland

Tomasz Grysztar 06 Jan 2017, 15:24

I think that fasmg approach is generally more in line with the expectations of users. There are many cases in fasm 1 where the separation of text-processing phase from the actual assembly is a source of unintuitive limitations. So when performance and resource usage is not your concern, I think it is better to use a text-based processing which is more predictable for a user.

06 Jan 2017, 15:24

ACP

Joined: 23 Sep 2006
Posts: 204

ACP 06 Jan 2017, 18:13

Tomasz Grysztar wrote:

I think that fasmg approach is generally more in line with the expectations of users.

Thank you for your feedback Tomasz. Again I don't want to hijack this thread so maybe we can move this discussion to another one, but I do wonder how you handle internally macros like "include" or inclusion of binary data from external files. I'm asking since I think it is easier to implement in old fasm approach.

06 Jan 2017, 18:13

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8543
Location: Kraków, Poland

Tomasz Grysztar 06 Jan 2017, 18:42

ACP wrote:

Thank you for your feedback Tomasz. Again I don't want to hijack this thread so maybe we can move this discussion to another one, but I do wonder how you handle internally macros like "include" or inclusion of binary data from external files. I'm asking since I think it is easier to implement in old fasm approach.

In fasmg there are caches in memory for both the source text files and for the binary files. The binary cache stores ranges of sectors, not the entire files, to avoid caching a huge file when you include only a small portion. The source cache stores a tokenized source text, so the next time a cached file is accessed through INCLUDE the assembler retrieves an already tokenized copy.

06 Jan 2017, 18:42

ACP

Joined: 23 Sep 2006
Posts: 204

ACP 06 Jan 2017, 20:14

Wow, sounds pretty complex for a simple task but I can understand where are you coming from. That is probably a great overhead that would introduced not needed complexity in case of my project. Overall I really like your elegant design of this functionality. Thanks for saving me trying to get all those implementation and design bits from the source code.

06 Jan 2017, 20:14

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum