flat assembler
Message board for the users of flat assembler.
Index
> Compiler Internals > jumps (conditional or not) are not fully optimised |
Author |
|
Tomasz Grysztar 16 Nov 2014, 22:21
fasm does not generate a 66h-prefixed jumps unless explicitly told so (with form like "jmp near word"). If you look at the sources of very early versions (0.9) of fasm you may notice that it did initially optimize conditional jumps this way, but I later changed it - after I learned that prefixes may actually hurt performance. Nowadays I consider this to be one of the controversial optimizations that I decided not to include as a default behavior, I think they should be chosen consciously by the programmer.
|
|||
16 Nov 2014, 22:21 |
|
l_inc 16 Nov 2014, 23:53
It's also important to remember that $66 makes a relative jump not completely base independent (base range dependent), which means a functional difference for some cases. I.e. 66 0F 84 80 00 may jump to a location different from where 0F 84 80 00 00 00 jumps.
_________________ Faith is a superposition of knowledge and fallacy |
|||
16 Nov 2014, 23:53 |
|
revolution 17 Nov 2014, 05:03
l_inc wrote: It's also important to remember that $66 makes a relative jump not completely base independent (base range dependent), which means a functional difference for some cases. I.e. 66 0F 84 80 00 may jump to a location different from where 0F 84 80 00 00 00 jumps. |
|||
17 Nov 2014, 05:03 |
|
CandyMan 17 Nov 2014, 11:08
This optimization (jmp word 0x?) is turned off. (see jmp_address=1) I use it in unreal mode.
Try compile "fasmd.asm" attached and compare files. Code: file "fasmd.exe" generated by fasm 0xB5FA: 0F847D000000 Code: file "fasmd.exe" generated by modified fasm 0xB5FA: 747D _________________ smaller is better |
|||
17 Nov 2014, 11:08 |
|
l_inc 17 Nov 2014, 13:25
CandyMan
Quote: Try compile "fasmd.asm" attached and compare files Well, I got two identical files. And the standard fasm compiles the following: Code: org 0xB5FA je 0xB679 into a 2-bytes opcode. revolution Quote: That alone is all that is needed to disqualify it as a potential optimisation I agree with you, but fasm in fact does so called "aggressive" optimizations that may result in functional difference, which I'm not happy about. The point is however that there are assumptions that can legally be made explicit, and the optimization is then allowed to rely on these. E.g., you assume a flat model, or you assume the base dependency. But the assumptions should be made very carefully to not violate the general programmer's expectations such as that relative jumps are base independent. _________________ Faith is a superposition of knowledge and fallacy |
|||
17 Nov 2014, 13:25 |
|
CandyMan 17 Nov 2014, 13:55
I am insisting in addition that, the jump ahead is badly calculated as near instead of short.
Code: org 0xB5FA ;\ He will always be this way correctly je 0xB679 ;/ _________________ smaller is better |
|||
17 Nov 2014, 13:55 |
|
l_inc 17 Nov 2014, 14:02
CandyMan
Please, provide a minimal complete source allowing to reproduce the problem. _________________ Faith is a superposition of knowledge and fallacy |
|||
17 Nov 2014, 14:02 |
|
CandyMan 17 Nov 2014, 14:44
it will be sufficient to add these lines to the code:
Code: check_for_short_jump: ;Start cmp [current_pass],0 jnz check_to_short_jump or [next_pass_needed],-1 jmp short_jump ;in pass 0 always create short jumps check_to_short_jump: ;Finit _________________ smaller is better |
|||
17 Nov 2014, 14:44 |
|
Tomasz Grysztar 17 Nov 2014, 16:31
l_inc wrote: CandyMan The suggestion made by CandyMan is that in the very first pass short forms of instructions should be enforced even when the values are already known (like in the case of a jump backwards). And yes, it possible that this could help fasm to find a bit better solution, because these value can still change, and there is some probability that such value may in the next pass change in such a way, that the short form of instruction will stay. I've just tried such change in the general mechanism - right at the beginning of "calculate_expression", where it sets up [value_undefined] as 0, I added a code that sets it to 1 in the very first pass. This way in the first pass the value of every expression is perceived to be zero. And yes, it does assemble fasmd.exe a four bytes shorter then. But at the same time it is dangerous and may cause some other things to stop working. On the other hand, using a less general mechanism, like proposed tweaking of the jump handler, is probably safe. Any changes to the prediction methods are in fact a double-edged sword, I mentioned it one or two times already - any such tweak may make some types of sources to assemble into a smaller code, while breaking some other ones in such a way that they get assembled into a larger output or refuse to assemble at all. In the case of "fasm 1" I obviously choose the methods that focus on assembling the standard x86 code well (and they are not always working well with other things) - the zeroing of undefined expressions was created with this in mind. So perhaps this also could be a good modification - to make all such optimized instructions (not only jumps) have short form in the very first pass. But note that this still would be an additional mechanism, not a replacement of existing one - because zeroing of expressions has an effect not only in the first pass, but in the later ones too, if you have some dependencies that cause symbols to get defined only in later passes, etc. It is all a bit of a heuristic, that should be chosen based on the results obtained for "the most typical" source codes. I recall that I tested some similar modification long time ago and at the time it did not make any difference, so I scrapped the idea. Now I have an example of a four-byte difference. Is it worth it then? Possibly. |
|||
17 Nov 2014, 16:31 |
|
CandyMan 17 Nov 2014, 22:02
Tomasz, thank you for the exhaustive reply.
_________________ smaller is better |
|||
17 Nov 2014, 22:02 |
|
l_inc 18 Nov 2014, 01:31
Tomasz Grysztar
Are you sure, it's about predicting zeroes? As long as CandyMan failed to respond to my previous post, I'm doing it myself now: Code: use32 mov [fwdref],0 je @F times 125 nop @@: fwdref db ? A long form of the jump is compiled here because the increase in the jump size at the second pass is critical itself for the choice of the jump encoding. The critical difference is initiated by mispredicting fwdref to be a dword instead of a byte at the first pass and this way shifting the address of the jump at the second pass. Thus maybe it makes more sense to also predict shorter forms for instructions with encodings having nothing to do with predicting zeroes. Intuitively this would be a less of black magic solution. _________________ Faith is a superposition of knowledge and fallacy |
|||
18 Nov 2014, 01:31 |
|
l_inc 18 Nov 2014, 02:18
Tomasz Grysztar
I just realized that I actually repeated your suggestion. I was a bit confused by your example of "not only jumps" being add eax,forward_referenced_constant, and I thought that "all such optimized instruction" still referred solely to the instructions optimized by initially predicting a zero. _________________ Faith is a superposition of knowledge and fallacy |
|||
18 Nov 2014, 02:18 |
|
Tomasz Grysztar 18 Nov 2014, 06:55
Your example is quite important, though, because it's something I overlooked and that should be corrected right away. When a size of label is predicted to be 0 (which registers an "operand size not specified" recoverable error) the immediate should better be assumed to be 8-bit instead of 32-bit (and this, like the zeroing of expressions, is something that is independent from the pass number). I did it in some places (like the "basic_mem_imm" handler), but apparently I overlooked it in some other (like the "mov_mem_imm" you pointed out) - so it may be considered a kind of bug, actually.
|
|||
18 Nov 2014, 06:55 |
|
l_inc 18 Nov 2014, 12:38
Tomasz Grysztar
Well, then you may want to fix another problem that may go unnoticed because of this fix. With the version 1.71.26 even enforcing a short jump didn't work: Code: use32 mov [fwdref],0 je short @F times 125 nop @@: fwdref db ? It doesn't seem reasonable to fallback to the larger offset in case the short keyword is specified. It is interesting however that defining a label at the jump location magically solved both problems making the mov be predicted in the short form from the very beginning: Code: use32 mov [fwdref],0 x: je @F times 125 nop @@: fwdref db ? This compiles into an optimal form with 1.71.26. This might reveal a solution for a bit wider range of optimization problems. _________________ Faith is a superposition of knowledge and fallacy |
|||
18 Nov 2014, 12:38 |
|
Tomasz Grysztar 18 Nov 2014, 12:59
l_inc wrote: It is interesting however that defining a label at the jump location magically solved both problems making the mov be predicted in the short form from the very beginning: (...) |
|||
18 Nov 2014, 12:59 |
|
l_inc 18 Nov 2014, 13:05
Tomasz Grysztar
That's why a said, there might be a solution here. Maybe by defining something similar to a hidden label for every jump. _________________ Faith is a superposition of knowledge and fallacy |
|||
18 Nov 2014, 13:05 |
|
Tomasz Grysztar 18 Nov 2014, 14:37
l_inc wrote: That's why a said, there might be a solution here. Maybe by defining something similar to a hidden label for every jump. But, just for fun, I devised a simple modification that does an extreme version of this "overkill" and tracks additional hidden label for every line in source. This goes into PARSER.INC: Code: parse_line: call allocate_label ; + mov byte [edi],2 ; + inc edi ; + stos dword [edi] ; + xor al,al ; + stos byte [edi] ; + mov [formatter_symbols_allowed],0 cmp byte [esi],1Ah jne empty_instruction ... |
|||
18 Nov 2014, 14:37 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.