flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > jumps (conditional or not) are not fully optimised

Author
Thread Post new topic Reply to topic
CandyMan



Joined: 04 Sep 2009
Posts: 319
Location: film "CandyMan" directed through Bernard Rose OR Candy Shop
CandyMan
compile "fasmd.asm" and compare size "fasmd.exe" it with original
see changes in "update.inc"

_________________
smaller is better


Last edited by CandyMan on 17 Nov 2014, 21:58; edited 1 time in total
Post 16 Nov 2014, 17:20
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7796
Location: Kraków, Poland
Tomasz Grysztar
fasm does not generate a 66h-prefixed jumps unless explicitly told so (with form like "jmp near word"). If you look at the sources of very early versions (0.9) of fasm you may notice that it did initially optimize conditional jumps this way, but I later changed it - after I learned that prefixes may actually hurt performance. Nowadays I consider this to be one of the controversial optimizations that I decided not to include as a default behavior, I think they should be chosen consciously by the programmer.
Post 16 Nov 2014, 22:21
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
It's also important to remember that $66 makes a relative jump not completely base independent (base range dependent), which means a functional difference for some cases. I.e. 66 0F 84 80 00 may jump to a location different from where 0F 84 80 00 00 00 jumps.

_________________
Faith is a superposition of knowledge and fallacy
Post 16 Nov 2014, 23:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17664
Location: In your JS exploiting you and your system
revolution
l_inc wrote:
It's also important to remember that $66 makes a relative jump not completely base independent (base range dependent), which means a functional difference for some cases. I.e. 66 0F 84 80 00 may jump to a location different from where 0F 84 80 00 00 00 jumps.
That alone is all that is needed to disqualify it as a potential optimisation. Any "optimisation" with a functional difference is not a real optimisation, but instead it is a timebomb waiting to go off.
Post 17 Nov 2014, 05:03
View user's profile Send private message Visit poster's website Reply with quote
CandyMan



Joined: 04 Sep 2009
Posts: 319
Location: film "CandyMan" directed through Bernard Rose OR Candy Shop
CandyMan
This optimization (jmp word 0x?) is turned off. (see jmp_address=1) I use it in unreal mode.

Try compile "fasmd.asm" attached and compare files.
Code:
file "fasmd.exe" generated by fasm

0xB5FA: 0F847D000000
    


Code:
file "fasmd.exe" generated by modified fasm

0xB5FA: 747D
    

_________________
smaller is better
Post 17 Nov 2014, 11:08
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
CandyMan
Quote:
Try compile "fasmd.asm" attached and compare files

Well, I got two identical files. And the standard fasm compiles the following:
Code:
org 0xB5FA
je 0xB679    

into a 2-bytes opcode.
revolution
Quote:
That alone is all that is needed to disqualify it as a potential optimisation

I agree with you, but fasm in fact does so called "aggressive" optimizations that may result in functional difference, which I'm not happy about. The point is however that there are assumptions that can legally be made explicit, and the optimization is then allowed to rely on these. E.g., you assume a flat model, or you assume the base dependency. But the assumptions should be made very carefully to not violate the general programmer's expectations such as that relative jumps are base independent.

_________________
Faith is a superposition of knowledge and fallacy
Post 17 Nov 2014, 13:25
View user's profile Send private message Reply with quote
CandyMan



Joined: 04 Sep 2009
Posts: 319
Location: film "CandyMan" directed through Bernard Rose OR Candy Shop
CandyMan
I am insisting in addition that, the jump ahead is badly calculated as near instead of short.
Code:
 
org 0xB5FA ;\ He will always be this way correctly
je 0xB679    ;/
    

_________________
smaller is better
Post 17 Nov 2014, 13:55
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
CandyMan
Please, provide a minimal complete source allowing to reproduce the problem.

_________________
Faith is a superposition of knowledge and fallacy
Post 17 Nov 2014, 14:02
View user's profile Send private message Reply with quote
CandyMan



Joined: 04 Sep 2009
Posts: 319
Location: film "CandyMan" directed through Bernard Rose OR Candy Shop
CandyMan
it will be sufficient to add these lines to the code:
Code:
check_for_short_jump:
;Start
cmp [current_pass],0
jnz check_to_short_jump
or [next_pass_needed],-1
jmp short_jump ;in pass 0 always create short jumps
check_to_short_jump:
;Finit
    

_________________
smaller is better
Post 17 Nov 2014, 14:44
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7796
Location: Kraków, Poland
Tomasz Grysztar
l_inc wrote:
CandyMan
Please, provide a minimal complete source allowing to reproduce the problem.
I think that what he means is actually related to the question you asked recently. When an expression contains some values that fasm sees for the first time, the complete value of expression is assumed to be zero - and the same applies to relative offsets used by jumps. This way in the first pass the instructions that may get optimized (not only jumps, but also instructions like "add eax,forward_referenced_constant") have zero predicted for their immediates when no better prediction is possible. Generating short forms in the initial pass helps to find a more compact solution.

The suggestion made by CandyMan is that in the very first pass short forms of instructions should be enforced even when the values are already known (like in the case of a jump backwards). And yes, it possible that this could help fasm to find a bit better solution, because these value can still change, and there is some probability that such value may in the next pass change in such a way, that the short form of instruction will stay.

I've just tried such change in the general mechanism - right at the beginning of "calculate_expression", where it sets up [value_undefined] as 0, I added a code that sets it to 1 in the very first pass. This way in the first pass the value of every expression is perceived to be zero. And yes, it does assemble fasmd.exe a four bytes shorter then. But at the same time it is dangerous and may cause some other things to stop working. On the other hand, using a less general mechanism, like proposed tweaking of the jump handler, is probably safe.

Any changes to the prediction methods are in fact a double-edged sword, I mentioned it one or two times already - any such tweak may make some types of sources to assemble into a smaller code, while breaking some other ones in such a way that they get assembled into a larger output or refuse to assemble at all. In the case of "fasm 1" I obviously choose the methods that focus on assembling the standard x86 code well (and they are not always working well with other things) - the zeroing of undefined expressions was created with this in mind. So perhaps this also could be a good modification - to make all such optimized instructions (not only jumps) have short form in the very first pass. But note that this still would be an additional mechanism, not a replacement of existing one - because zeroing of expressions has an effect not only in the first pass, but in the later ones too, if you have some dependencies that cause symbols to get defined only in later passes, etc.

It is all a bit of a heuristic, that should be chosen based on the results obtained for "the most typical" source codes. I recall that I tested some similar modification long time ago and at the time it did not make any difference, so I scrapped the idea. Now I have an example of a four-byte difference. Is it worth it then? Possibly.
Post 17 Nov 2014, 16:31
View user's profile Send private message Visit poster's website Reply with quote
CandyMan



Joined: 04 Sep 2009
Posts: 319
Location: film "CandyMan" directed through Bernard Rose OR Candy Shop
CandyMan
Tomasz, thank you for the exhaustive reply.

_________________
smaller is better
Post 17 Nov 2014, 22:02
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
Are you sure, it's about predicting zeroes? As long as CandyMan failed to respond to my previous post, I'm doing it myself now:
Code:
use32
mov [fwdref],0
je @F
times 125 nop
@@:
fwdref db ?    

A long form of the jump is compiled here because the increase in the jump size at the second pass is critical itself for the choice of the jump encoding. The critical difference is initiated by mispredicting fwdref to be a dword instead of a byte at the first pass and this way shifting the address of the jump at the second pass. Thus maybe it makes more sense to also predict shorter forms for instructions with encodings having nothing to do with predicting zeroes. Intuitively this would be a less of black magic solution.

_________________
Faith is a superposition of knowledge and fallacy
Post 18 Nov 2014, 01:31
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
I just realized that I actually repeated your suggestion. I was a bit confused by your example of "not only jumps" being add eax,forward_referenced_constant, and I thought that "all such optimized instruction" still referred solely to the instructions optimized by initially predicting a zero.

_________________
Faith is a superposition of knowledge and fallacy
Post 18 Nov 2014, 02:18
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7796
Location: Kraków, Poland
Tomasz Grysztar
Your example is quite important, though, because it's something I overlooked and that should be corrected right away. When a size of label is predicted to be 0 (which registers an "operand size not specified" recoverable error) the immediate should better be assumed to be 8-bit instead of 32-bit (and this, like the zeroing of expressions, is something that is independent from the pass number). I did it in some places (like the "basic_mem_imm" handler), but apparently I overlooked it in some other (like the "mov_mem_imm" you pointed out) - so it may be considered a kind of bug, actually.
Post 18 Nov 2014, 06:55
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
Well, then you may want to fix another problem that may go unnoticed because of this fix. With the version 1.71.26 even enforcing a short jump didn't work:
Code:
use32 
mov [fwdref],0
je short @F
times 125 nop 
@@: 
fwdref db ?
    

It doesn't seem reasonable to fallback to the larger offset in case the short keyword is specified. It is interesting however that defining a label at the jump location magically solved both problems making the mov be predicted in the short form from the very beginning:
Code:
use32 
mov [fwdref],0
x: je @F
times 125 nop 
@@: 
fwdref db ?    

This compiles into an optimal form with 1.71.26. This might reveal a solution for a bit wider range of optimization problems.

_________________
Faith is a superposition of knowledge and fallacy
Post 18 Nov 2014, 12:38
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7796
Location: Kraków, Poland
Tomasz Grysztar
l_inc wrote:
It is interesting however that defining a label at the jump location magically solved both problems making the mov be predicted in the short form from the very beginning: (...)
This is where the "label adjustment" mechanism comes into play. By defining a label you make assembler notice what is the "shift" between it value from previous pass and current one, and it takes this offset into consideration when predicting the value of next label.
Post 18 Nov 2014, 12:59
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
That's why a said, there might be a solution here. Maybe by defining something similar to a hidden label for every jump.

_________________
Faith is a superposition of knowledge and fallacy
Post 18 Nov 2014, 13:05
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7796
Location: Kraków, Poland
Tomasz Grysztar
l_inc wrote:
That's why a said, there might be a solution here. Maybe by defining something similar to a hidden label for every jump.
Well, it's easy to do with a macro. But for a standard process it'd be a bit of overkill, considering how barely noticeable the benefits are.

But, just for fun, I devised a simple modification that does an extreme version of this "overkill" and tracks additional hidden label for every line in source. This goes into PARSER.INC:
Code:
      parse_line:
        call    allocate_label  ; +
        mov     byte [edi],2    ; +
        inc     edi             ; +
        stos    dword [edi]     ; +
        xor     al,al           ; +
        stos    byte [edi]      ; +
        mov     [formatter_symbols_allowed],0
        cmp     byte [esi],1Ah
        jne     empty_instruction
        ...    
If you'd like to play with a modified version and determine how useful this can be, I encourage you to do so.
Post 18 Nov 2014, 14:37
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.