flat assembler
Message board for the users of flat assembler.
Index
> Assembly > Pitfalls of optimistic multi-pass assembly |
Author |
|
Tomasz Grysztar 21 Apr 2019, 19:13
Previously, I have written about two types of multi-pass assembly, pessimistic and optimistic one. I am now going to talk more about the latter, which (in various incarnations) is more likely to be encountered in modern tools. All the assemblers I have ever written have been using this approach and this made me quite aware of both the fascinating possibilities and dangerous pitfalls that it brings to the table.
When choosing from multiple possible codes of an instruction, a pessimistic assembler selects one that is certain to be a good one for all circumstances (often not the shortest one), and only in later passes it optimizes the codes that can be safely shortened. An optimistic assembler selects the shortest possible code that might possibly be correct and if later it turns out that the assumption failed and generated code is wrong (for example value of offset is too large to fit in the short code), it fixes it by restarting the whole process and this time making longer variants of the problematic instructions. There is another way to characterize these two types. As should be obvious from the description of pessimistic assembler, it always generates code that should be correct. Therefore after any pass it can decide that what it has is good enough and it may stop optimizing, keeping a valid code. It only needs to fill some of the fields with the values that were forward-referenced - but this is always going to be possible to do correctly thanks to the pessimistic approach. This feature is the main advantage of pessimistic assembly, and can also be seen as its main defining characteristic. On the other hand, an optimistic assembler does not have a valid code until it finds a correct solution, perhaps after many passes, or perhaps never. In practice, as my previous article showed, it often finds a better optimized solution than a pessimistic one - but this is at the price of having to do as many passes as necessary to find a resolved state, with a possibility of never finding one. With this alternate characterization in mind, we can analyze some aspects of assembly language that are more interesting than a simple choice of a short or long instruction code. Consider the following: Code: if next < 1000h mov al,[next] end if ; ... next db ? But for an optimistic assembler this is just a matter of making a prediction and only worrying whether it was correct later (at the end of the pass). If it turns out that the assumption made about "next" was correct, great. If not, it can try doing another pass and this time making a better prediction, thanks to knowledge about possible value of "next" gathered during the previous pass. Where a pessimistic assembler would immediately say "nah, this is not going to work", an optimistic assembler is going to make an attempt. Therefore an optimistic assembler offers not only an possibility of a better optimization, but can also be much more flexible in what kinds of relations it may allow to define in the source code. But, again, this is at the price mentioned earlier. Consider a slight alteration of the above snippet: Code: org 0FFFh if next < 1000h mov al,[next] end if ; ... next db ? Probably nobody expects an assembler to be able to find a solution to self-contradicting instructions. But it may not always be so obvious that a given source text has no solution - or, even worse, that it may allow for a solution that was not intended. We are going to take a look at an example that demonstrates both these possible problems. Consider that we want to write a macro for a jump instruction, and we would like it to be able to not generate a redundant jump instruction when the target of the jump is immediately after it, so it can be reached without jumping: Code: jmp next
next: Code: macro jmp target { local after if target <> after jmp target end if after: } Code: previous:
jmp previous This shows how careful one must be when defining logical relations in the source text when using a much-allowing optimistic assembler. The intention of this kind of optimization cannot be simply defined in terms of addresses being equal or not to each other. The semantics of "was the label defined before the instruction or after it" need to be taken into consideration. Before we look at the solution to this problem, let's take a look at another one, with the same macro still: Code: jmp $+2 Code: if $+2 <> after jmp $+2 end if after: But again, this is just what the source asked for. A danger of an optimistic multi-pass is that it is often not obvious that the way the constraints are defined may lead to something not obvious. In the case of this specific macro, both problems may be fixed by checking whether the target of the jump is something that is defined later in the source or not (note that this requires quite recent fasm versions to assemble): Code: macro jmp target { local after if target <> after | definite target jmp target end if after: } An oscillation problem may not always be so easily soluble, though. And even the corrected version of our jump macro can be tricked into oscillating: Code: previous: jmp ADDRESS ADDRESS = previous + 2 It is up to a user of the assembler to decide then, whether true intent behind the source text differs from the rules that were written down. Perhaps there are some additional cases when a jump to a point immediately after it should be allowed to be generated. Or perhaps not, and the inability to find a resolved state should be treated as a problem of the consistency of the source. This is one of the considerations that led me to design fasmg, an optimistic assembler that has everything (including encoders for various instruction sets) defined in form of macros that may be modified or overridden by user. A method of dealing with oscillation problems that seems to work well in many cases is to use additional "sticky" flag, like: Code: macro jmp target { local after,sticky if target <> after | definite target | sticky = 1 jmp target sticky = 1 else sticky = 0 end if after: } In practice, the "sticky" flag usually makes it impossible for an assembler to remove the jump when it has at least once been generated during the intermediate passes. After "sticky" has been defined as 1, in the consecutive pass an assembler is almost certainly going to make a prediction that "sticky" is going to be 1 (unless it has some randomness intentionally seeded into its guesses) and it then cannot get out of this self-reinforcing loop. Only when the jump is consistently removed in each and every pass, it is allowed to stay this way. Note that for a snippet like: Code: jmp next
next: Last edited by Tomasz Grysztar on 08 Sep 2020, 16:58; edited 1 time in total |
|||
21 Apr 2019, 19:13 |
|
revolution 25 Apr 2019, 11:52
How will you solve it? Or will you let the user figure it out? It could be a very tricky thing to find if the only error is that the maximum passes was encountered. I found this with ARM code. It was frustrating and tedious to try and manually find these places. And fixing one can cause another.
I wonder if it could be an advantage to show a warning message to tell the user that a non-optimal instruction encoding has been used. I guess most people wouldn't care so perhaps such warnings could be enabled by a command line switch. |
|||
25 Apr 2019, 11:52 |
|
Tomasz Grysztar 25 Apr 2019, 12:39
revolution wrote: How will you solve it? Or will you let the user figure it out? Code: rb a xor 1 a: The samples are usually contrived, though - and as noted in my previous post, an oscillation with AVX-512 instructions is also not going to happen normally. A small DOS program is all that comes to mind where this could occur for something practical, and I don't think this would justify making optimization a little worse everywhere else. On the other hand, with fasmg there's always an option of tuning the encoders even individually for a specific project. And some its capabilities could be used to debug the cause of running out of passes. An interesting proof of concept I've just made: Code: include 'cpu/x64.inc' include 'cpu/ext/avx512.inc' macro ? line& line local addr if $ <> addr repeat 1, n:__LINE__ display 'oscillation detected at ',__FILE__,'[',`n,']',13,10 end repeat end if addr: end macro org 100h nop vaddss xmm0{k1},xmm1,[array+eax] array dd 16 dup ? |
|||
25 Apr 2019, 12:39 |
|
revolution 25 Apr 2019, 12:53
I think the AVX512 example is "solvable" by simply encoding the long version of the instruction. That is why I suggested to have a warning (if enabled) to tell the user such a thing is happening. But triggering that behaviour could be a bit tricky to get right without too many false positives. Perhaps to avoid it happening too many times only have it make a long encoding if the pass counter is almost exhausted, or something similar.
But ultimately I think producing a working executable is more important than failing to make anything due to not having100% optimal encodings everywhere. So a small amount of suboptimal encoding is okay IMO if it avoids the alternative of "pass limit exceeded". |
|||
25 Apr 2019, 12:53 |
|
Tomasz Grysztar 25 Apr 2019, 13:14
revolution wrote: But ultimately I think producing a working executable is more important than failing to make anything due to not having100% optimal encodings everywhere. So a small amount of suboptimal encoding is okay IMO if it avoids the alternative of "pass limit exceeded". It is not possible in general to avoid the risk of running endless passes (I believe this is related to the halting problem), but as fasm demonstrates in practice, it is possible to have that happen rarely enough to be an acceptable trade-off. There are myriads of tools out there that can produce a working code, I'm determined to highlight what makes fasm different from most of them, while accepting the drawbacks. Another interesting thing to note is that the language of an "optimistic" assembler becomes at least partly declarative instead of imperative. There is perhaps still more to write on this topic. |
|||
25 Apr 2019, 13:14 |
|
Tomasz Grysztar 25 Apr 2019, 13:40
One of the unwritten rules of fasm is that the number of the pass should not be affecting the resolving process (this is the reason why this number is never directly exposed). It was not always this way, early versions of fasm were doing some things differently in the initial pass and it took me a while to figure out current design.
Imagine that you take an entire self-contained module of code and put it inside an IF block. Let's say that we add a module to our program only if the size of the other code passes some threshold: Code: if $ > 1000h include 'module.inc' end if |
|||
25 Apr 2019, 13:40 |
|
pelaillo 26 Apr 2019, 13:18
revolution wrote: I think the AVX512 example is "solvable" by simply encoding the long version of the instruction. It would be possible to follow revolution's suggestion and always use the long version to produce a working executable. Then use a secondary "shortener" tool to work the binary trying to replace long for short version for the opcodes one at a time whenever possible? The tool could operate like this: 1. Binary analyzer similar to ollydbg's to identify innermost instructions 2. Shorten opcodes starting from the innermost ones 3. Do a verification pass 4. Repeat from 2 shortening opcodes to the outside p.s. This is one of these threads when I start learning something I didn't imagine even existed. Last edited by pelaillo on 26 Apr 2019, 13:51; edited 1 time in total |
|||
26 Apr 2019, 13:18 |
|
Tomasz Grysztar 26 Apr 2019, 13:38
pelaillo wrote: It would be possible to follow revolution's suggestion and always use the long version to produce a working executable. Then use a secondary "shortener" tool to work the binary trying to replace long for short version for the opcodes one at a time whenever possible? Note that this is an entire paradigm that influences what an assembler might be able to do, as shown in my second article (the one here). A "pessimistic" assembler needs to limit itself to allowing only expressions that have an outcome predictable enough to ensure that the code generated in the initial pass can be made correct. |
|||
26 Apr 2019, 13:38 |
|
bitRAKE 29 Apr 2019, 12:48
Sounds like you are selecting a desired option to partition n-dimensional space, and starting over if the option is invalidated. One of the best methods known for solving these kinds of problems, and a completely different paradigm than what the "pessimistic" type is doing.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
29 Apr 2019, 12:48 |
|
Tomasz Grysztar 29 Apr 2019, 13:28
It is worth noting that while traditional approach had been to use pessimistic optimization, several contemporary assemblers use variants of optimistic approach.
NASM performs optimistic passes, though at the same time it also severely limits what kind of dependencies can be defined by source, by allowing the values of addresses/offsets to affect the assembly only under specific circumstances (more like a traditional, pessimistic assembler). This makes it much harder to fabricate a source that would not get resolved, nonetheless it is still possible: Code: add si,second+3-first first: align 128 second: Code: ..\test.asm: error: Can't find valid values for all labels after 1004 passes, giving up. ..\test.asm: error: Possible causes: recursive EQUs, macro abuse. YASM, on the other hand, is able to resolve the same snippet fine. I believe it uses a variant of optimistic optimization where in the consecutive passes instructions are only allowed to grow, which is the same as the "sticky" flag method I described earlier. When the size of instructions can only increase, they are going to reach a maximum after a finite number of passes, so a solution is always going to be found. But, as I mentioned when explaining the "sticky" flag approach, this comes at a price of missing some optimization opportunities. Consider this artificial sample: Code: add si,131-second+first first: add si,125+second-first second: As my examples demonstrate, there are pros and cons of each approach. In practice, the differences are rarely noticed, though (at least on x86). |
|||
29 Apr 2019, 13:28 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.