flat assembler
Message board for the users of flat assembler.
Index
> Compiler Internals > "short" TEST instruction Goto page Previous 1, 2, 3, 4, 5 Next |
Author |
|
Tomasz Grysztar 23 Mar 2004, 12:54
It has nothing to do with your example, this kind of optimization is more of the kind that FASM was always doing behind your back - chosing the optimal opcode for the same operation (for example check the section 1.2.6 in documentation).
|
|||
23 Mar 2004, 12:54 |
|
comrade 23 Mar 2004, 12:58
I support Frank's opinion
|
|||
23 Mar 2004, 12:58 |
|
comrade 23 Mar 2004, 13:23
I am kind of confused actually. It seems like harmless optimization, but then you get the confusion "why did it put byte when i was testing dword?". Perhaps we should poll on this question?
Maybe in some rare case you are making disassembler, and want to test it by assembling test files using FASM and you cannot get desired instructions. (just an example) |
|||
23 Mar 2004, 13:23 |
|
Tomasz Grysztar 23 Mar 2004, 13:26
Yes, I'm also a bit confused, because being consequent as usual when implementing any feature, I would land at the "test dword [0],100h"->"test byte [1],1" optimization, which - though it is generally harmless in the same way - does look much more controversial.
There is no problem with generating the instructions you want though - in the same way as "or eax,dword 1" generates longer code than "or eax,1", even with TEST optimization enabled "test eax,dword 1" still generates the opcode with full double word data. |
|||
23 Mar 2004, 13:26 |
|
Tomasz Grysztar 23 Mar 2004, 20:37
As it leads to too much confusion, my proposal is: let's keep it away from the assembler itself, but make a good macro for this kind of optimization and make it available here.
|
|||
23 Mar 2004, 20:37 |
|
S.T.A.S. 24 Mar 2004, 07:11
Hmm.. I'm also confused..
I'm not so good with opcodes, but do know that simple MOV could be assembled by some different ways (with different size of opcodes). And everyone think that's OK. And when we compare FASM output with, for example, MASM's one, then we, probably, find some difference there.. Yes, it's again OK.. But when there is possibility to substitute "TEST operation between Accumulator & Immediate Operand", with.. also "TEST operation between Accumulator & Immediate Operand". There are lots of criticism.. Why? Of, cource, there are lots of other replacements that should be done through macro: LEA / TEST AH,1 (because of PF) / MOV EAX,0 / etc.., because they sometimes are (even) internally different operations.. And I have to agree with Privalov here: As it leads to too much confusion... Yes confusion.. Where are objective opinions, not subjective? (I must say, personally, I think TEST instruction is just unnecessary one. There are ways to use SHR/SHL/etc instead of it. But when we're using some API, which is coded with C, do we often look inside HUGE.INC to see WHAT some constant is?) Intel, MS, Borland abandoned assembler.. Just Privalov is going the right way, providing us with exelent tool. Let's not fight against him, but cooperate PS Code: 8300 07 add dword [ds:eax], 7 3E 818420 00000000 07000000 add dword [ds:eax], 7 |
|||
24 Mar 2004, 07:11 |
|
Frank 24 Mar 2004, 19:40
S.T.A.S. wrote: There are lots of criticism.. Why? S.T.A.S., you compare the proposed TEST optimisation to other, existing optimisations. But they are not in the same league. FASM's optimizations for OR, XOR, SUB etc. preserve the source code instruction. If I write "sub eax, 1" in the source, a disassembly of the binary will confirm that the instruction "sub eax, 1" has been assembled. Here I get exactly what I asked for. The TEST optimization would have gone much further, replacing source code instructions ("test eax, 1") by different ones ("test al, 1"). That's too much of a good thing, at least for me: if I write "eax", then I mean "eax", and not "al". There is nothing ambiguous or unclear about the source code instruction. S.T.A.S. wrote: And I have to agree with Privalov here: As it leads to too much confusion... "Confusion" was just a short-hand expression. In more objective terms, one could say that the TEST optimization, if built into the assembler, would have - made the assembler output less predictable, - reduced the match between source and disassembly, - required the FASM programmer to keep an additional "special case" in mind. I guess that sounds more objective, or rational, than the word "confusion". That said, your contribution has shown a great opportunity for size optimization, one that is often overlooked (certainly by me). I will review the library parts of my code to see where I can make good use of your idea, and I am grateful that you shared it. I don't think that this optimization should be hard-coded into an assembler, but if provided as a macro, programmers will get the greatest benefit from it. Regards, Frank |
|||
24 Mar 2004, 19:40 |
|
BiDark 25 Mar 2004, 02:23
I fully agree with Frank
|
|||
25 Mar 2004, 02:23 |
|
JohnFound 25 Mar 2004, 05:41
I fully agree with STAS.
If we talking about writing small routines and short demos, yes it is very easy to make manually optimization of the "test" instruction (and any other instruction). But we talking about creating huge programs for some big OS like Windows, where we use hundreds and thousends "test" with different symbolic constants, defined in huge .inc files. It's easy to say "I want to optimize my instructions manually". Yea, but often it is imposible. So, why not to let the assembler (that knows every constant better than us) to make this optimizations? Of course it will be good to have some directive that to switch ON/OFF this feature. In this case, it the option is ON, all possible size optimizations should be made - test byte/dword, lea/mov, etc. (of course only if the instructions are equal - not only the result but flags too.) My 2 cents. Regards. |
|||
25 Mar 2004, 05:41 |
|
S.T.A.S. 25 Mar 2004, 07:02
Thanks, Frank & BiDark.
Now I see your point here, indeed, it's reasonable. But we are with x86 IA-32.. Let us go deeper into opcodes There is some example with FASM output: Code: C1F0 02 sal eax, 2 C1E0 02 shl eax, 2 Output has the SAME mnemonics as the source does. Ok, let's take a look into IA-32.. Manual Volume №2 (245471-012) (or older version - 24547108.pdf - could be used): (page B-15) SAL – Shift Arithmetic Left -- same instruction as SHL (page B-16) SHL – Shift Left.. register by immediate count -- 1100 000w : 11 100 reg : imm8 data OR (page 3-703 in the book / 3-681 in PDF): C1 /4 ib SAL r/m32,imm8 Multiply r/m32 by 2, imm8 times C1 /4 ib SHL r/m32,imm8 Multiply r/m32 by 2, imm8 times So, opcode for BOTH instruction should be: C1h 11b (Mod), 100b (Reg/Opcode=/4), 000b(R/M=EAX) 02h (imm8) ==>> C1 E0 02 Now another "why?" What is C1 F0 02?? (page A-13 in the book & PDF): Code: Encoding of Bits 5,4,3 of the ModR/M Byte Opcode Mod 000 001 010 011 100 101 110 111 C1 reg, imm 11B ROL ROR RCL RCR SHL/SAL SHR -- SAR Are BOTH FASM & OllyDbg wrong? No, of cource. Really - there is something hidden in official docs Well, we know these 2 instructions wiht different mnemonics are equivalent in result Also, there is undocummented instruction with ModR/M Byte = F0 (for this concrete case) So let's say: C1 F0 02 == C1 E0 02. But let's equate one opcode to the SAL mnemonic, other - to SHL. And everyone is happy.. There are lots of such "wierd" things in IA32, and we are accustomed to this. They are HIDDEN from us in most cases. Why not add another one? Quote: replacing source code instructions ("test eax, 1") by different ones ("test al, 1"). What is EAX & AL? Isn't it just the same register here? (though, it isn't clearly described in docs) I agree, this could be done with macro.. Let's imagine we have superb-optimized macro for TEST. Well, if one wants plain TEST what should he do? Probably, to use DIFFERENT name for macro. And so on.. The way to use TEST EAX, DWORD 1 - is more practical in this case, IMHO. Yes, the border is very blurred here. But I still hope we will not pass the edge implementing this.. |
|||
25 Mar 2004, 07:02 |
|
f0dder 25 Mar 2004, 09:47
I would be careful about introducing too many "optimizations" in an assembler, though. As long as the code is functionally & speedwise identical, it's okay... but things like eg mov/lea substitution is a bad idea IMO.
Are there any timing differences between the short and long forms of TEST? I think almost everybody agrees that a thing like JMP optimization is just fine, because it doesn't have any functional difference and produces shorter code. So if there's no timing penalties with the short form of TEST, the optimization should be okay, objectively - it's just the subjective feeling of "eek! I am writin teh asm0r I want teh full control!" that bugs a lot of people, I guess |
|||
25 Mar 2004, 09:47 |
|
S.T.A.S. 25 Mar 2004, 11:33
f0dder wrote: Are there any timing differences between the short and long forms of TEST? And, yes, I even didn't think about "mov/lea" when creating this thread.. At the moment I can't see any other safe variants exept of TEST.. |
|||
25 Mar 2004, 11:33 |
|
Ralph 06 May 2004, 20:39
While we're at this, I don't know if this has been brought up before, but why not replace 'cd 03' with 'cc'?
|
|||
06 May 2004, 20:39 |
|
Tomasz Grysztar 06 May 2004, 20:56
int3 mnemonic does this (see docs).
|
|||
06 May 2004, 20:56 |
|
decard 06 May 2004, 21:13
I'm just curious, why did you made "int 3" generate CD03, and "int3" - CC, instead of automatically changing "int 3" into CC? Wouldn't it be simpler then?
|
|||
06 May 2004, 21:13 |
|
Tomasz Grysztar 06 May 2004, 21:38
Because you would still want to get CD03 code in some cases. And this was borrowed from TASM, if I recall correctly.
|
|||
06 May 2004, 21:38 |
|
LocoDelAssembly 09 Dec 2005, 22:03
According to Intel's manuals there is another difference between opcode CC and opcode CD
IA-32 Intel® Architecture Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M wrote: The INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the So if you want to put a trap instead of calling the interrupt 3 use allways int3 I have another instruction to discuss, what about "retn 0"? At this time FASM assembles C2 0000 (retn n) instead of C3 (retn). I think it could be optimized by using C3 and if the user wants to get the largest form then just force it by writing "retn word 0". Of course this optimization must be applied to retf and ret too. Regards, LocoDelAssembly |
|||
09 Dec 2005, 22:03 |
|
vid 12 Dec 2005, 10:06
and when will you allow forcing smaller operand this way?!?
("cmp eax, byte My_Value") so you can be sure (and it is clear from the code) what opcode is generated? |
|||
12 Dec 2005, 10:06 |
|
LocoDelAssembly 12 Dec 2005, 14:59
I'm proposing using a size operator to force the largest opcode because FASM optimize instructions with immediates except for ret/retn/retf. I think if FASM optimize something like [eax+0] to [eax] then the same optimization to the immediate operand must be applied to ret.
I don't understand what are you saying about "cmp eax, byte My_Value", note two things, the first and obvious is fasm doesn't support the size operator "byte" and second when My_Value can be encoded as an imm8 then a shorten form is chosen and when you write "cmp eax, dword My_Value" then the largest form is always chosen (and each one has their own opcode). My proposal is: retn = C3 retn 0 = C3 retn word 0 = C2 0000 |
|||
12 Dec 2005, 14:59 |
|
Goto page Previous 1, 2, 3, 4, 5 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.