flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
Tomasz Grysztar 03 Jul 2013, 15:41
l_inc wrote: 1. I know, that in some cases semantics of the size modifiers is to produce a longer instruction encoding. And sometimes is to choose the corresponding addressing mode. Here are some examples of that behaviour beeing seemly inconsistent: In long mode it is choosing addressing mode, the same. Please compare the results of these two instructions in long mode: Code: lea rax,[dword -1] ; RAX := 0x00000000FFFFFFFF lea rax,[qword -1] ; RAX := 0xFFFFFFFFFFFFFFFF l_inc wrote:
There is however one relic in fasm's syntax that breaks this rule - when there is a size prefix before the immediate value, it enforces the long encoding of that immediate (which otherwise would get optimized, so possibly shortened). Keeping this feature does not cause much problems primarily because the size operator if needed, is usually applied to the main operand, not immediate, so the case when there is size prefix before immediate can be utilized for such a special purpose without causing any harm (as long as you are aware that such feature exists). However I'm not happy with this feature, as it breaks the general rule of abstraction that I chose to follow stricly in other places. I planned to have a very different way of dealing with this in fasm 2, but that's a different story... So you can enforce a long immediate encoding by putting a (most probably superfluous) size prefix next to that value, but it applies only to this one specific case. This feature was there to help with some self-modifying code where you'd like to make sure that the immediate value that you want to modify is full-size, to hold any possible value you'd like to put there. It does not apply to instructions like "inc dword eax", as there is no immediate there. |
|||
![]() |
|
Tomasz Grysztar 03 Jul 2013, 15:45
l_inc wrote: 2. In some cases fasm fails to compile correct instructions because of a failed optimization attempt. This is most probably related to this discussion: http://board.flatassembler.net/topic.php?t=5313 I haven't yet analyzed the specific cases you listed, though. Perhaps there are some additional bugs hidden there. |
|||
![]() |
|
Tomasz Grysztar 03 Jul 2013, 15:51
l_inc wrote: 5. I don't know if it's a documentation lack, but the behaviour of dotted numeric constants is unclear. I acutally didn't expect the following one to fail: |
|||
![]() |
|
Tomasz Grysztar 03 Jul 2013, 15:55
l_inc wrote: 3. This one is rather a documentation lack. The order of preprocessing is undefined when symbols with identical names are used for different purposes. The most critical case seems to be this one: |
|||
![]() |
|
l_inc 03 Jul 2013, 16:49
Tomasz Grysztar
Quote: In long mode it is choosing addressing mode, the same. Please compare the results of these two instructions in long mode Seems like my bad. Sorry. You are right. Quote: fasm's assembly language focuses primarily on what the instruction does, not how it is encoded. I know your position about controlling the instruction encoding. But you really can't claim, that code pieces of different sizes are functionally equivalent. I always thought, that this was the reason you allowed to control at least the instruction length (like with immediates). What if an instruction is on a page boundary and the next page would be not executable? What about all those multibyte nop's specifically introduced by Intel for code alignment? Disregarding their sizes those are all functionally equivalent. Would you want to optimize those out? Or why woudn't you optimize xchg rbx,rbx into a nop? I mean code size optimization is a good thing, but for an assembler compiler it absolutely has to be controllable. Quote: However I'm not happy with this feature, as it breaks the general rule of abstraction that I chose to follow stricly in other places. This statement of yours reminds of a discussion I had some time ago. The short summary of the discussion: "%t breaks the SSSO principle violently". I don't mean, it's bug. %t is a good feature. But this observation applies to a discussion we had here. Code: There is however one relic in fasm's syntax that breaks this rule I just can't disregard this statement. This "relic" is actually of very high importance. Quote: This is most probably related to this discussion I don't think so. Compilation problems described there are related to forward referencing labels. In my examples, fasm can't compile just normal instructions. Quote: This was later changed due to users' request, so that the numeric variable would not create a new locals prefix Fair enough. I fully support this decision. It just doesn't seem to be documented. Quote: Details like this are documented in the section 2.3.7 of official manual Yes. I know. "Like this". But I didn't find "exactly this" one. ![]() _________________ Faith is a superposition of knowledge and fallacy |
|||
![]() |
|
Tomasz Grysztar 03 Jul 2013, 17:21
l_inc wrote: What if an instruction is on a page boundary and the next page would be not executable? What about all those multibyte nop's specifically introduced by Intel for code alignment? Disregarding their sizes those are all functionally equivalent. Would you want to optimize those out? l_inc wrote: Or why woudn't you optimize xchg rbx,rbx into a nop? I myself changed my mind a few times about it. l_inc wrote: Compilation problems described there are related to forward referencing labels. In my examples, fasm can't compile just normal instructions. |
|||
![]() |
|
l_inc 03 Jul 2013, 18:20
Tomasz Grysztar
Quote: you provide the guidelines for assembler and it tries to find the best solution Please, don't overabstract. Finding best solutions is for high level language compilers. Assembler's main purpose is to provide the highest level of control over the output, while giving as much as possible of coding simplification features. Control is superior, convenience is important, but inferior. For situations, where the convenience is of higher importance, there is a plenty of high level languages and highly sophisticated optimizers for them. _________________ Faith is a superposition of knowledge and fallacy |
|||
![]() |
|
Tomasz Grysztar 03 Jul 2013, 21:46
l_inc wrote: Assembler's main purpose is to provide the highest level of control over the output, while giving as much as possible of coding simplification features. If there is something worth having control over, usually there is some abstract reason for it, which can be extracted and formulated in a more general way. It is much better to say "I want to have that instruction aligned to paragraph boundary" than to say "I want this instruction to be five bytes long" - the first one shows the true reason behind the request, while the second one may obscure the actual purpose. |
|||
![]() |
|
l_inc 04 Jul 2013, 13:12
Tomasz Grysztar
Quote: That may be the definition that you chose, but for me the assembly was primarily the abstraction that I mentioned. OK. But would you then clarify your understanding of the difference between an assembler and high level languages? Because abstracting from what some code looks like to what some code does is applicable to any high level language: you don't need to know the actual instructions if you just somehow specify (e.g., draw a picture), what you want those to do. Quote: The code resolving idea [...] was at the core of fasm since the beginning. Yes. This is the convenience part. But it does not and must not prevent from being able to specify an exact (and maybe suboptimal from your point of view) solution in case one needs that exact solution for whatever unexpressible reasons. And this is the control part. Size operators are an example for that. Quote: It is much better to say "I want to have that instruction aligned to paragraph boundary" than to say "I want this instruction to be five bytes long" - the first one shows the true reason behind the request Yes. For that reason people came up with high level languages. And the abstractioning process is definitely not at it's final point and not even near to it. Cause it's much better to literally say "I want the computer to understand my language" than to type all those mysterious processor instructions which make the computer understand human speech. _________________ Faith is a superposition of knowledge and fallacy |
|||
![]() |
|
randall 04 Jul 2013, 14:31
Quote:
The difference is that you use native machine *commands*. Commands, not encodings. I agree with Tomasz, programmer should express the intent using native machine commands (mnemonics) and assembler should choose the most optimal encodings. |
|||
![]() |
|
l_inc 04 Jul 2013, 14:49
randall
Quote: The difference is that you use native machine *commands*. Commands, not encodings. Machines natively work with instruction encodings. What you talk about are mnemonics, which are already a step towards higher level concepts: same mnemonic can have different functionality in different contexts; different mnemonics can have same functionality. Besides, definition of the word "mnemonic" is flexible. Are push and push dword the same mnemonic? Quote: programmer should express the intent with native machine commands (mnemonics) and assembler should choose the most optimal encodings Would you accept an assembler to compile a nop instead of xchg rbx,rbx? If not, why do you accept push dword 1 to be compiled into a short form push 1 without giving you any possibility to specify the longer form? _________________ Faith is a superposition of knowledge and fallacy |
|||
![]() |
|
Tomasz Grysztar 04 Jul 2013, 17:31
l_inc wrote: Besides, definition of the word "mnemonic" is flexible. Are push and push dword the same mnemonic? l_inc wrote: Would you accept an assembler to compile a nop instead of xchg rbx,rbx? l_inc wrote: If not, why do you accept push dword 1 to be compiled into a short form push 1 without giving you any possibility to specify the longer form? With that being said, fasm still allows to use this syntax variant to enforce the long immediate (and that's why I also had to add mnemonics like PUSHD). I added it solely for the purposes of SMC. But I'm not really satisfied with this solution, and for fasm 2 I planned to separate the assembly instruction syntax from the additional hints for the encoder, which would be specified as annotations beside the main instruction. This would make both worlds happy, I hope. |
|||
![]() |
|
l_inc 04 Jul 2013, 20:56
Tomasz Grysztar
Quote: I don't know any other definition of mnemonic than the one that states that it is that first word of assembly command It actually does not matter, if you consider it as a single or multiple tokens. Those become an encoded part of the instruction anyway. E.g. the processor documentation does not specify different mnemonics for (differently behaving) instructions CC and CD 03. Therefore you had to extend the conventional syntax making the '3' a part of the first token. The point is that if you claim to be willing to abstract from the encoding, you actually could do it the opposite way: int3 for CD 03 and int 3 for CC. Therefore it does not make sense to say like "this first token is a mnemonic and everything else is not": you can always make it a part of the mnemonic by either changing the syntax or extending the definition of the word "mnemonic". Quote: BUT, I decided to generally not implement optimizations that would cause the instruction in disassembly to look completely different from the one in source That's a very weak argumentation, because there are different syntaxes and every developer of a yet another disassembler invents something new. Even disregarding AT&T syntax disassembly, you could again consider the int3 example or, if it's not enough "completely different" for you, consider the opposite case: same instruction and different mnemonics. Like if I write xchg eax,eax I will see something completely different in a disassembler, right? Quote: I added it solely for the purposes of SMC. Yes, you said that before. But SMC is one use case, that kinda justifies the size modifiers. You experienced such a need for SMC and you added this. Don't you assume, there could be other use cases that require control for size? One example would be if I want to recompile some program by disassembling it into fasm syntax, change smth and then without any additional overhead to compile it again with fasm, so that the program works. If you don't allow for size control, then some instructions of the original program could become shorter and the whole program will change, which in turn may make it unusable. Quote: I planned to separate the assembly instruction syntax from the additional hints for the encoder, which would be specified as annotations beside the main instruction. Sounds really nice. I'm a little confused about the word "hint"? Does it mean, that I could provide a hint and the encoder would still ignore it for the sake of optimization? _________________ Faith is a superposition of knowledge and fallacy |
|||
![]() |
|
Tomasz Grysztar 04 Jul 2013, 21:10
l_inc wrote: E.g. the processor documentation does not specify different mnemonics for (differently behaving) instructions CC and CD 03. Therefore you had to extend the conventional syntax making the '3' a part of the first token. The point is that if you claim to be willing to abstract from the encoding, you actually could do it the opposite way: int3 for CD 03 and int 3 for CC. l_inc wrote: Sounds really nice. I'm a little confused about the word "hint"? Does it mean, that I could provide a hint and the encoder would still ignore it for the sake of optimization? |
|||
![]() |
|
l_inc 04 Jul 2013, 21:15
Tomasz Grysztar
Quote: I created two different mnemonics, because they are actually two different instructions - they differ a bit in what they do, this is not simply a matter of choice between longer or shorter encoding. Well... the exact purpose of the remark "(differently behaving)" in my previous post was to avoid occurrence of this explanation. ![]() Quote: I only thought about the case when you specified a hint "encode immediate as 32-bit value", but instruction had no immediate at all It seems like you have some cases in mind, where the presence of an immediate is not obvious for the programmer. Otherwise this situation is rather a subject to fail compilation and to report an error. _________________ Faith is a superposition of knowledge and fallacy |
|||
![]() |
|
Tomasz Grysztar 05 Jul 2013, 08:33
l_inc wrote: It seems like you have some cases in mind, where the presence of an immediate is not obvious for the programmer. Otherwise this situation is rather a subject to fail compilation and to report an error. |
|||
![]() |
|
l_inc 05 Jul 2013, 10:29
Tomasz Grysztar
Some side remarks on the idea: Maybe it would make sense to introduce strong and weak annotations (maybe weak for block annotations and strong for single instruction annotations). At least I never liked a compiler to silently ignore what I write. Additionally block syntax always has larger coding overhead (when applied to a single instruction/line) than single instruction or single line effect syntax. _________________ Faith is a superposition of knowledge and fallacy |
|||
![]() |
|
l_inc 07 Jul 2013, 22:26
Here's one more bug (2. 4. and 7. seem to be clearly compiler bugs).
8. Stores from a different addressing space into reserved data must not be thrown away from the output: Code: rb 1 space:: org 0 store byte 'A' at space:$$ _________________ Faith is a superposition of knowledge and fallacy |
|||
![]() |
|
Tomasz Grysztar 09 Jul 2013, 09:38
l_inc wrote:
What could be improved here, is an error message - perhaps fasm should tell that it cannot generate RIP-relative address and hint that instruction may be compilable by adding the absolute addressing enforcing. |
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.