flat assembler
Message board for the users of flat assembler.

Index > Assembly > Pitfalls of optimistic multi-pass assembly

Author
Thread Post new topic Reply to topic
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 21 Apr 2019, 19:13
Previously, I have written about two types of multi-pass assembly, pessimistic and optimistic one. I am now going to talk more about the latter, which (in various incarnations) is more likely to be encountered in modern tools. All the assemblers I have ever written have been using this approach and this made me quite aware of both the fascinating possibilities and dangerous pitfalls that it brings to the table.

When choosing from multiple possible codes of an instruction, a pessimistic assembler selects one that is certain to be a good one for all circumstances (often not the shortest one), and only in later passes it optimizes the codes that can be safely shortened. An optimistic assembler selects the shortest possible code that might possibly be correct and if later it turns out that the assumption failed and generated code is wrong (for example value of offset is too large to fit in the short code), it fixes it by restarting the whole process and this time making longer variants of the problematic instructions.

There is another way to characterize these two types. As should be obvious from the description of pessimistic assembler, it always generates code that should be correct. Therefore after any pass it can decide that what it has is good enough and it may stop optimizing, keeping a valid code. It only needs to fill some of the fields with the values that were forward-referenced - but this is always going to be possible to do correctly thanks to the pessimistic approach. This feature is the main advantage of pessimistic assembly, and can also be seen as its main defining characteristic.

On the other hand, an optimistic assembler does not have a valid code until it finds a correct solution, perhaps after many passes, or perhaps never. In practice, as my previous article showed, it often finds a better optimized solution than a pessimistic one - but this is at the price of having to do as many passes as necessary to find a resolved state, with a possibility of never finding one.

With this alternate characterization in mind, we can analyze some aspects of assembly language that are more interesting than a simple choice of a short or long instruction code. Consider the following:
Code:
if next < 1000h
        mov     al,[next]
end if
; ...
next db ?    
A pessimistic assembler cannot allow such IF construction. Because "next" is a value that may depend on things that are yet to be assembled during the current pass (including this very IF block), the assembler is not able to make a choice that would be guaranteed to make a consistent code.

But for an optimistic assembler this is just a matter of making a prediction and only worrying whether it was correct later (at the end of the pass). If it turns out that the assumption made about "next" was correct, great. If not, it can try doing another pass and this time making a better prediction, thanks to knowledge about possible value of "next" gathered during the previous pass.

Where a pessimistic assembler would immediately say "nah, this is not going to work", an optimistic assembler is going to make an attempt. Therefore an optimistic assembler offers not only an possibility of a better optimization, but can also be much more flexible in what kinds of relations it may allow to define in the source code. But, again, this is at the price mentioned earlier. Consider a slight alteration of the above snippet:
Code:
org 0FFFh
if next < 1000h
        mov     al,[next]
end if
; ...
next db ?    
The MOV instruction is obviously going to take some space, so if it is generated, the value of "next" is going to be above 1000h, in turn requiring that no MOV is assembled. This is a self-contradictory source, it has no solution. An optimistic assembler may go on performing as many passes as its counter size allows, but it is never going to find a resolved state.

Probably nobody expects an assembler to be able to find a solution to self-contradicting instructions. But it may not always be so obvious that a given source text has no solution - or, even worse, that it may allow for a solution that was not intended. We are going to take a look at an example that demonstrates both these possible problems.

Consider that we want to write a macro for a jump instruction, and we would like it to be able to not generate a redundant jump instruction when the target of the jump is immediately after it, so it can be reached without jumping:
Code:
        jmp     next
next:    
A plain and simple implementation that first comes to mind would look like this (in fasm 1 syntax):
Code:
macro jmp target
{
        local after
        if target <> after
                jmp     target
        end if
        after:
}    
And indeed, for a previous snippet it would work as expected, omitting a redundant jump. But we should consider what it would do in another case:
Code:
previous:
        jmp     previous     
Turns out, that here the macro might also omit the jump, obviously incorrectly. But it would do that by simply following the orders. When the jump instruction is omitted, the address of "after" is identical to the address of "previous", because there is nothing between them. And the condition required to generate the jump instruction is then false. There is no contradiction. It is a correct solution that realizes all the constraints defined by the source (including the text of the macro). But it obviously was not the intended solution.

This shows how careful one must be when defining logical relations in the source text when using a much-allowing optimistic assembler. The intention of this kind of optimization cannot be simply defined in terms of addresses being equal or not to each other. The semantics of "was the label defined before the instruction or after it" need to be taken into consideration.

Before we look at the solution to this problem, let's take a look at another one, with the same macro still:
Code:
        jmp     $+2    
This time evaluation of the macro leads to a constraint defined like:
Code:
        if $+2 <> after
                jmp     $+2
        end if
        after:    
The "$" is the value of current address, and the short jump on x86 architecture is 2 bytes long. When this jump is generated, the value of "after" becomes equal to what "$+2" is during the evaluation of IF. But then the instruction needs to be omitted, and the address of "after" moves 2 bytes down. This is again a self-contradictory source, in a manner very similar to the earlier example with MOV. I have been sometimes calling it "an oscillation problem", because when assembler tries to resolve it, it keeps going back an forth between two possible outcomes of IF, never arriving at a consistent solution.

But again, this is just what the source asked for. A danger of an optimistic multi-pass is that it is often not obvious that the way the constraints are defined may lead to something not obvious.

In the case of this specific macro, both problems may be fixed by checking whether the target of the jump is something that is defined later in the source or not (note that this requires quite recent fasm versions to assemble):
Code:
macro jmp target
{
        local after
        if target <> after | definite target
                jmp     target
        end if
        after:
}    
Now a jump to a label defined before is never removed. And because "$+2" is not a forward-referenced symbol, it cannot cause the jump to be omitted either.

An oscillation problem may not always be so easily soluble, though. And even the corrected version of our jump macro can be tricked into oscillating:
Code:
previous:
        jmp     ADDRESS
ADDRESS = previous + 2    
And again, if we look at the evaluation of the macro, this is a set of constraints that has no solution. While there is also a possibility that an optimistic assembler may not be able to find a resolved state even though a solution exists, more often when assembly fails to complete it is because of hidden self-contradiction like in this example.

It is up to a user of the assembler to decide then, whether true intent behind the source text differs from the rules that were written down. Perhaps there are some additional cases when a jump to a point immediately after it should be allowed to be generated. Or perhaps not, and the inability to find a resolved state should be treated as a problem of the consistency of the source. This is one of the considerations that led me to design fasmg, an optimistic assembler that has everything (including encoders for various instruction sets) defined in form of macros that may be modified or overridden by user.

A method of dealing with oscillation problems that seems to work well in many cases is to use additional "sticky" flag, like:
Code:
macro jmp target
{
        local after,sticky
        if target <> after | definite target | sticky = 1
                jmp     target
                sticky = 1
        else
                sticky = 0
        end if
        after:
}    
This extra flag removes the self-contradiction by loosening the condition that allows the jump to be generated - this is apparent even when looking purely at the logical relations defined by these constructions. A consistent state that can be reached by an assembler certainly always exists here.

In practice, the "sticky" flag usually makes it impossible for an assembler to remove the jump when it has at least once been generated during the intermediate passes. After "sticky" has been defined as 1, in the consecutive pass an assembler is almost certainly going to make a prediction that "sticky" is going to be 1 (unless it has some randomness intentionally seeded into its guesses) and it then cannot get out of this self-reinforcing loop. Only when the jump is consistently removed in each and every pass, it is allowed to stay this way.

Note that for a snippet like:
Code:
        jmp     next
next:    
the macro with "sticky" flag has two different correct solutions and which one is found is up to prediction algorithms utilized by a specific implementation of an optimistic assembler. In case of fasm the solution where the jump is removed is found (what likely means that the condition is predicted false in the initial pass), but in general this is not a certain thing - both solutions are logically valid and both would be accepted as a resolved state when found by an optimistic assembler. Sometimes a programmer may find that allowing a less-optimal state to be reached too easily may backfire, as it makes it harder or even impossible for an assembler to find the other solution. In a complex source text where initially many choices may be completely in flux, a "sticky" flag may prevent the assembler from even considering the second solution during the later passes, while it could turn out that it would a perfectly valid choice in the new context provided by resolving other things around it. This means that it may not always be a preferable choice to try preventing any oscillations altogether.


Last edited by Tomasz Grysztar on 08 Sep 2020, 16:58; edited 1 time in total
Post 21 Apr 2019, 19:13
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 25 Apr 2019, 11:19
It is worth noting that while under normal circumstances assembly of x86 instructions does not have oscillation problems, with EVEX-based encodings introduced by AVX-512 it is now a bit easier to construct one.

The displacement compression scheme (used only for EVEX-encoded instructions) allows to reduce length of an instruction that uses an address of form [register+displacement] as long as the displacement is the multiple of data unit size for given operation. If we construct an example where this value depends on the size of generated instructions (like in a DOS program having data in the same segment as code), it may happen that the displacement is aligned correctly for optimization as long as the instruction is generated longer, but becomes misaligned as soon as the assembler tries to make the instruction shorter, forcing the assembler to grow the instruction code back again and causing an oscillation:
Code:
org 100h

        nop
        vaddss  xmm0{k1},xmm1,[array+eax]

array dd 16 dup ?    
The value of "array" needs to be relatively small for this to cause an oscillation, when it becomes larger than 7Fh*4, a displacement compression is no longer possible no matter the alignment. So this is still a contrived sample, not very likely to occur in practice, unless someone is writing small program for DOS with use of AVX-512.
Post 25 Apr 2019, 11:19
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 25 Apr 2019, 11:52
How will you solve it? Or will you let the user figure it out? It could be a very tricky thing to find if the only error is that the maximum passes was encountered. I found this with ARM code. It was frustrating and tedious to try and manually find these places. And fixing one can cause another.

I wonder if it could be an advantage to show a warning message to tell the user that a non-optimal instruction encoding has been used. I guess most people wouldn't care so perhaps such warnings could be enabled by a command line switch.
Post 25 Apr 2019, 11:52
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 25 Apr 2019, 12:39
revolution wrote:
How will you solve it? Or will you let the user figure it out?
One of the points of my article was that this is not a solvable problem in general. It was always possible to construct sources that would cause unending oscillation in fasm, even with something as simple as:
Code:
rb a xor 1
a:    
And even if assembler detected some patterns of oscillation and altered its behavior for them, there would always be some other patterns (including ones that would exploit the very algorithms that assembler would use to try to avoid oscillation).

The samples are usually contrived, though - and as noted in my previous post, an oscillation with AVX-512 instructions is also not going to happen normally. A small DOS program is all that comes to mind where this could occur for something practical, and I don't think this would justify making optimization a little worse everywhere else.

On the other hand, with fasmg there's always an option of tuning the encoders even individually for a specific project. And some its capabilities could be used to debug the cause of running out of passes. An interesting proof of concept I've just made:
Code:
include 'cpu/x64.inc'
include 'cpu/ext/avx512.inc'

macro ? line&
        line
        local addr
        if $ <> addr
                repeat 1, n:__LINE__
                        display 'oscillation detected at ',__FILE__,'[',`n,']',13,10
                end repeat
        end if
        addr:
end macro


org 100h

        nop
        vaddss  xmm0{k1},xmm1,[array+eax]

array dd 16 dup ?    
Post 25 Apr 2019, 12:39
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 25 Apr 2019, 12:53
I think the AVX512 example is "solvable" by simply encoding the long version of the instruction. That is why I suggested to have a warning (if enabled) to tell the user such a thing is happening. But triggering that behaviour could be a bit tricky to get right without too many false positives. Perhaps to avoid it happening too many times only have it make a long encoding if the pass counter is almost exhausted, or something similar.

But ultimately I think producing a working executable is more important than failing to make anything due to not having100% optimal encodings everywhere. So a small amount of suboptimal encoding is okay IMO if it avoids the alternative of "pass limit exceeded".
Post 25 Apr 2019, 12:53
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 25 Apr 2019, 13:14
revolution wrote:
But ultimately I think producing a working executable is more important than failing to make anything due to not having100% optimal encodings everywhere. So a small amount of suboptimal encoding is okay IMO if it avoids the alternative of "pass limit exceeded".
The main reason why I'm writing these articles is to show in detail how fasm with its "optimistic" resolving algorithms differs from a traditional and reliable "pessimistic" assembler. What are the advantages and what is their price.

It is not possible in general to avoid the risk of running endless passes (I believe this is related to the halting problem), but as fasm demonstrates in practice, it is possible to have that happen rarely enough to be an acceptable trade-off.

There are myriads of tools out there that can produce a working code, I'm determined to highlight what makes fasm different from most of them, while accepting the drawbacks.

Another interesting thing to note is that the language of an "optimistic" assembler becomes at least partly declarative instead of imperative. There is perhaps still more to write on this topic.
Post 25 Apr 2019, 13:14
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 25 Apr 2019, 13:40
One of the unwritten rules of fasm is that the number of the pass should not be affecting the resolving process (this is the reason why this number is never directly exposed). It was not always this way, early versions of fasm were doing some things differently in the initial pass and it took me a while to figure out current design.

Imagine that you take an entire self-contained module of code and put it inside an IF block. Let's say that we add a module to our program only if the size of the other code passes some threshold:
Code:
if $ > 1000h
    include 'module.inc'
end if    
The resolving of the other code may take several passes, and it may turn out that it grows to more than the threshold no sooner that in the last one. Then the assembler starts including the module and resolving it from the beginning. This may be the pass number N, but for the module it is the first one. The interactions of this kind have convinced me to start to applying the rule.
Post 25 Apr 2019, 13:40
View user's profile Send private message Visit poster's website Reply with quote
pelaillo
Missing in inaction


Joined: 19 Jun 2003
Posts: 878
Location: Colombia
pelaillo 26 Apr 2019, 13:18
revolution wrote:
I think the AVX512 example is "solvable" by simply encoding the long version of the instruction.

It would be possible to follow revolution's suggestion and always use the long version to produce a working executable. Then use a secondary "shortener" tool to work the binary trying to replace long for short version for the opcodes one at a time whenever possible?

The tool could operate like this:
1. Binary analyzer similar to ollydbg's to identify innermost instructions
2. Shorten opcodes starting from the innermost ones
3. Do a verification pass
4. Repeat from 2 shortening opcodes to the outside

p.s. This is one of these threads when I start learning something I didn't imagine even existed.


Last edited by pelaillo on 26 Apr 2019, 13:51; edited 1 time in total
Post 26 Apr 2019, 13:18
View user's profile Send private message Yahoo Messenger Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 26 Apr 2019, 13:38
pelaillo wrote:
It would be possible to follow revolution's suggestion and always use the long version to produce a working executable. Then use a secondary "shortener" tool to work the binary trying to replace long for short version for the opcodes one at a time whenever possible?
This is more or less what a "pessimistic" type of assembler does. Please read my earlier article for details.

Note that this is an entire paradigm that influences what an assembler might be able to do, as shown in my second article (the one here). A "pessimistic" assembler needs to limit itself to allowing only expressions that have an outcome predictable enough to ensure that the code generated in the initial pass can be made correct.
Post 26 Apr 2019, 13:38
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4043
Location: vpcmpistri
bitRAKE 29 Apr 2019, 12:48
Sounds like you are selecting a desired option to partition n-dimensional space, and starting over if the option is invalidated. One of the best methods known for solving these kinds of problems, and a completely different paradigm than what the "pessimistic" type is doing.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 29 Apr 2019, 12:48
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 29 Apr 2019, 13:28
It is worth noting that while traditional approach had been to use pessimistic optimization, several contemporary assemblers use variants of optimistic approach.

NASM performs optimistic passes, though at the same time it also severely limits what kind of dependencies can be defined by source, by allowing the values of addresses/offsets to affect the assembly only under specific circumstances (more like a traditional, pessimistic assembler). This makes it much harder to fabricate a source that would not get resolved, nonetheless it is still possible:
Code:
        add     si,second+3-first
first:
        align   128
second:    
Feeding the above sample to NASM, I get this kind of response:
Code:
..\test.asm: error: Can't find valid values for all labels after 1004 passes, giving up.
..\test.asm: error: Possible causes: recursive EQUs, macro abuse.    

YASM, on the other hand, is able to resolve the same snippet fine. I believe it uses a variant of optimistic optimization where in the consecutive passes instructions are only allowed to grow, which is the same as the "sticky" flag method I described earlier. When the size of instructions can only increase, they are going to reach a maximum after a finite number of passes, so a solution is always going to be found. But, as I mentioned when explaining the "sticky" flag approach, this comes at a price of missing some optimization opportunities. Consider this artificial sample:
Code:
        add     si,131-second+first
first:
        add     si,125+second-first
second:    
The first ADD instruction can be short, but this only becomes apparent after the second one has been made long. While both NASM and fasm assemble this snippet to 7 bytes (one short and one long instruction), YASM generates 8 bytes (two long instructions) because of its "sticky" method.

As my examples demonstrate, there are pros and cons of each approach. In practice, the differences are rarely noticed, though (at least on x86).
Post 29 Apr 2019, 13:28
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.