flat assembler
Message board for the users of flat assembler.

flat assembler > Compiler Internals > "short" TEST instruction

Goto page Previous  1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6981
Location: Kraków, Poland
As it leads to too much confusion, my proposal is: let's keep it away from the assembler itself, but make a good macro for this kind of optimization and make it available here.
Post 23 Mar 2004, 20:37
View user's profile Send private message Visit poster's website Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
Hmm.. I'm also confused..
I'm not so good with opcodes, but do know that simple MOV could be assembled by some different ways (with different size of opcodes).
And everyone think that's OK.
And when we compare FASM output with, for example, MASM's one, then we, probably, find some difference there..
Yes, it's again OK..

But when there is possibility to substitute "TEST operation between Accumulator & Immediate Operand", with.. also "TEST operation between Accumulator & Immediate Operand". There are lots of criticism..

Why?

Of, cource, there are lots of other replacements that should be done through macro: LEA / TEST AH,1 (because of PF) / MOV EAX,0 / etc.., because they sometimes are (even) internally different operations..


And I have to agree with Privalov here: As it leads to too much confusion...
Yes confusion.. Where are objective opinions, not subjective?

(I must say, personally, I think TEST instruction is just unnecessary one. There are ways to use SHR/SHL/etc instead of it. But when we're using some API, which is coded with C, do we often look inside HUGE.INC to see WHAT some constant is?)


Intel, MS, Borland abandoned assembler.. Just Privalov is going the right way, providing us with exelent tool. Let's not fight against him, but cooperate Smile

PS
Code:
8300 07                        add     dword [ds:eax], 7
3E 818420 00000000 07000000    add     dword [ds:eax], 7    
Post 24 Mar 2004, 07:11
View user's profile Send private message Reply with quote
Frank



Joined: 17 Jun 2003
Posts: 100
S.T.A.S. wrote:
There are lots of criticism.. Why?


S.T.A.S., you compare the proposed TEST optimisation to other, existing optimisations. But they are not in the same league. FASM's optimizations for OR, XOR, SUB etc. preserve the source code instruction. If I write "sub eax, 1" in the source, a disassembly of the binary will confirm that the instruction "sub eax, 1" has been assembled. Here I get exactly what I asked for. The TEST optimization would have gone much further, replacing source code instructions ("test eax, 1") by different ones ("test al, 1"). That's too much of a good thing, at least for me: if I write "eax", then I mean "eax", and not "al". There is nothing ambiguous or unclear about the source code instruction.

S.T.A.S. wrote:
And I have to agree with Privalov here: As it leads to too much confusion...
Yes confusion.. Where are objective opinions, not subjective?


"Confusion" was just a short-hand expression. In more objective terms, one could say that the TEST optimization, if built into the assembler, would have
- made the assembler output less predictable,
- reduced the match between source and disassembly,
- required the FASM programmer to keep an additional "special case" in mind.
I guess that sounds more objective, or rational, than the word "confusion".

That said, your contribution has shown a great opportunity for size optimization, one that is often overlooked (certainly by me). I will review the library parts of my code to see where I can make good use of your idea, and I am grateful that you shared it. I don't think that this optimization should be hard-coded into an assembler, but if provided as a macro, programmers will get the greatest benefit from it.

Regards,

Frank
Post 24 Mar 2004, 19:40
View user's profile Send private message Reply with quote
BiDark



Joined: 22 Jun 2003
Posts: 110
Location: .th
I fully agree with Frank
Post 25 Mar 2004, 02:23
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3475
Location: Bulgaria
I fully agree with STAS.

If we talking about writing small routines and short demos, yes it is very easy to make manually optimization of the "test" instruction (and any other instruction). But we talking about creating huge programs for some big OS like Windows, where we use hundreds and thousends "test" with different symbolic constants, defined in huge .inc files. It's easy to say "I want to optimize my instructions manually". Yea, but often it is imposible. So, why not to let the assembler (that knows every constant better than us) to make this optimizations?
Of course it will be good to have some directive that to switch ON/OFF this feature. In this case, it the option is ON, all possible size optimizations should be made - test byte/dword, lea/mov, etc. (of course only if the instructions are equal - not only the result but flags too.)

My 2 cents.

Regards.
Post 25 Mar 2004, 05:41
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
Thanks, Frank & BiDark.
Now I see your point here, indeed, it's reasonable.

But we are with x86 IA-32.. Let us go deeper into opcodes Wink
There is some example with FASM output:
Code:
C1F0 02       sal     eax, 2
C1E0 02       shl     eax, 2    
Very clean at first glance, isn't it?
Output has the SAME mnemonics as the source does.

Ok, let's take a look into IA-32.. Manual Volume №2 (245471-012)
(or older version - 24547108.pdf - could be used):

(page B-15) SAL – Shift Arithmetic Left -- same instruction as SHL
(page B-16) SHL – Shift Left.. register by immediate count -- 1100 000w : 11 100 reg : imm8 data

OR
(page 3-703 in the book / 3-681 in PDF):
C1 /4 ib SAL r/m32,imm8 Multiply r/m32 by 2, imm8 times
C1 /4 ib SHL r/m32,imm8 Multiply r/m32 by 2, imm8 times

So, opcode for BOTH instruction should be:
C1h 11b (Mod), 100b (Reg/Opcode=/4), 000b(R/M=EAX) 02h (imm8) ==>> C1 E0 02

Now another "why?"

What is C1 F0 02??
(page A-13 in the book & PDF):
Code:
                                              Encoding of Bits 5,4,3 of the ModR/M Byte
Opcode                             Mod     000  001  010  011  100     101  110  111
C1 reg, imm                 11B     ROL  ROR  RCL  RCR  SHL/SAL SHR   --   SAR    
Well, we can see, there is NO instruction with opcode C1 F0 02 in intel docs.

Are BOTH FASM & OllyDbg wrong?
No, of cource. Really - there is something hidden in official docs Wink

Well, we know these 2 instructions wiht different mnemonics are equivalent in result
Also, there is undocummented instruction with ModR/M Byte = F0 (for this concrete case)

So let's say: C1 F0 02 == C1 E0 02. But let's equate one opcode to the SAL mnemonic, other - to SHL.
And everyone is happy..


There are lots of such "wierd" things in IA32, and we are accustomed to this.
They are HIDDEN from us in most cases. Why not add another one?

Quote:
replacing source code instructions ("test eax, 1") by different ones ("test al, 1").
After my example above I hope it would be clean, why I think it is "subjective opinion".
What is EAX & AL? Isn't it just the same register here? (though, it isn't clearly described in docs)


I agree, this could be done with macro..
Let's imagine we have superb-optimized macro for TEST.
Well, if one wants plain TEST what should he do? Probably, to use DIFFERENT name for macro.
And so on..
The way to use TEST EAX, DWORD 1 - is more practical in this case, IMHO.
Yes, the border is very blurred here. But I still hope we will not pass the edge implementing this..
Post 25 Mar 2004, 07:02
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3172
Location: Denmark
I would be careful about introducing too many "optimizations" in an assembler, though. As long as the code is functionally & speedwise identical, it's okay... but things like eg mov/lea substitution is a bad idea IMO.

Are there any timing differences between the short and long forms of TEST? I think almost everybody agrees that a thing like JMP optimization is just fine, because it doesn't have any functional difference and produces shorter code. So if there's no timing penalties with the short form of TEST, the optimization should be okay, objectively - it's just the subjective feeling of "eek! I am writin teh asm0r I want teh full control!" that bugs a lot of people, I guess Wink
Post 25 Mar 2004, 09:47
View user's profile Send private message Visit poster's website Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
f0dder wrote:
Are there any timing differences between the short and long forms of TEST?
Timing difference could be possible if the first operand is located in memory and is not DWORD aligned. Then "long" variant should be slower..

And, yes, I even didn't think about "mov/lea" when creating this thread..
At the moment I can't see any other safe variants exept of TEST..
Post 25 Mar 2004, 11:33
View user's profile Send private message Reply with quote
Ralph



Joined: 04 Oct 2003
Posts: 86
While we're at this, I don't know if this has been brought up before, but why not replace 'cd 03' with 'cc'?
Post 06 May 2004, 20:39
View user's profile Send private message Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6981
Location: Kraków, Poland
int3 mnemonic does this (see docs).
Post 06 May 2004, 20:56
View user's profile Send private message Visit poster's website Reply with quote
decard



Joined: 11 Sep 2003
Posts: 1095
Location: Poland
I'm just curious, why did you made "int 3" generate CD03, and "int3" - CC, instead of automatically changing "int 3" into CC? Wouldn't it be simpler then?
Post 06 May 2004, 21:13
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6981
Location: Kraków, Poland
Because you would still want to get CD03 code in some cases. And this was borrowed from TASM, if I recall correctly.
Post 06 May 2004, 21:38
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4634
Location: Argentina
According to Intel's manuals there is another difference between opcode CC and opcode CD

IA-32 Intel® Architecture Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M wrote:
The INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the
debug exception handler. (This one byte form is valuable because it can be used to replace the
first byte of any instruction with a breakpoint, including other one byte instructions, without
over-writing other code). To further support its function as a debug breakpoint, the interrupt
generated with the CC opcode also differs from the regular software interrupts as follows:
• Interrupt redirection does not happen when in VME mode; the interrupt is handled by a
protected-mode handler.
• The virtual-8086 mode IOPL checks do not occur. The interrupt is taken without faulting at
any IOPL level.
Note that the “normal” 2-byte opcode for INT 3 (CD03) does not have these special features.
Intel and Microsoft assemblers will not generate the CD03 opcode from any mnemonic, but this
opcode can be created by direct numeric code definition or by self-modifying code.


So if you want to put a trap instead of calling the interrupt 3 use allways int3

I have another instruction to discuss, what about "retn 0"? At this time FASM assembles C2 0000 (retn n) instead of C3 (retn). I think it could be optimized by using C3 and if the user wants to get the largest form then just force it by writing "retn word 0".

Of course this optimization must be applied to retf and ret too.

Regards,
LocoDelAssembly
Post 09 Dec 2005, 22:03
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7109
Location: Slovakia
and when will you allow forcing smaller operand this way?!?
("cmp eax, byte My_Value") so you can be sure (and it is clear from the code) what opcode is generated?
Post 12 Dec 2005, 10:06
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4634
Location: Argentina
I'm proposing using a size operator to force the largest opcode because FASM optimize instructions with immediates except for ret/retn/retf. I think if FASM optimize something like [eax+0] to [eax] then the same optimization to the immediate operand must be applied to ret.

I don't understand what are you saying about "cmp eax, byte My_Value", note two things, the first and obvious is fasm doesn't support the size operator "byte" and second when My_Value can be encoded as an imm8 then a shorten form is chosen and when you write "cmp eax, dword My_Value" then the largest form is always chosen (and each one has their own opcode).

My proposal is:

retn = C3
retn 0 = C3
retn word 0 = C2 0000
Post 12 Dec 2005, 14:59
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7109
Location: Slovakia
locodelassembly wrote:
I don't understand what are you saying about "cmp eax, byte My_Value"
it can be encoded in two ways, with sign-extended byte or with full dword (or word in 16 bits)

--------------------

Quote:
the first and obvious is fasm doesn't support the size operator "byte"
it does

--------------------

What i want is, that when someone reads my code, he sees:
Code:
cmp eax, byte Some_Initial_Value
Value_To_Compare label byte at $-1    
To which current equivalent (due to future-bug-proofness and readability) is:
Code:
cmp eax, Some_Initial_Value  ;should generate cmp eax,imm8
Value_To_Compare label byte at $-1
if Some_Initial_Value < -128 | Some_Initial_Value > 127
  display 'some error'
end if    
which is uglier, unclear, still harder to read etc. etc.

--------------------

Quote:
My proposal is:
retn = C3
retn 0 = C3
retn word 0 = C2 0000
"retn 0" itself says there has to be "0" somewhere in code, optimization is only for cases when nothing hints you which option to choose, so compiler is free to chose any option.
Post 12 Dec 2005, 15:13
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4634
Location: Argentina
About size operator "byte" sorry you are right, "byte" is not supported when you write something like "[eax+byte 0]" and I thought it can't be used anywhere. However note when you write "cmp eax, byte 0" FASM says "Error: operand sizes do not match".

If "retn 0" itself says there is has to be a "0" somewhere in code, why "lea eax, [eax+0]" doesn't say the same? I don't undestand why the decision of optimize an imm deleting it is only applied to addressing and for imm operands not.
Post 12 Dec 2005, 15:49
View user's profile Send private message Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6981
Location: Kraków, Poland
vid: it was working this way for some instructions in the early versions of fasm (with the "imul" instruction actually for a long time, since at first I forgot to update it with the others - you can check out that 1.56 still accepted "imul eax,byte 0"), but I changed it for the current one for the reasons I explained in the other thread.
Post 12 Dec 2005, 15:52
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7109
Location: Slovakia
tomasz: i believe you mean this one

I still disagree with your arguments, if you are going to do some more low-level things, you can end up with code like one you see upwards.

And I think I don't get the design principes then. What I find as a most straightforward syntax is this:
<mnemonics> <args>

- if args doesn't have size specified use best one (but still keep the mnemonics, eg. don't change MOV to LEA, altough they can be functionally equivalent)

- if some of arguments has specified size, then try to encode using instruction with specified argument size, and if such does not exist, then throw error.

You say that after deep thinking you found this design bad, what's wrong with it then? If there is instruction which compares 16bit register with 8byte constant, then i await this "cmp r16, byte imm8" will generate it.

This is IMO more "clear" than tricking assembler to do what i want. That's why i like FASM most, you don't have to trick it to get result you want, like you had to with TASM.

Please rethink it and let me know what's wrong with this.
Post 15 Dec 2005, 00:01
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16097
Location: Transylbonia
Quote:
but still keep the mnemonics, eg. don't change MOV to LEA, altough they can be functionally equivalent
Personally I see nothing wrong with encoding "lea eax,[ebx]" as "mov eax,ebx". What is the problem with it? Why don't you like it?

When using equates and/or structures sometimes a constant value can be zero and it is difficult to recognise this when coding an instruction like this:
Code:
lea eax,[ebx+STRUCTURE.member]    
How can we know if the constant is zero? Answer: we cannot be sure, especially if the structure definition is in a different file or written by a different person. We are best to let the assembler decide the best encoding at assembly time.

I can see no advantage to leaving the LEA in the target file. It just uses up extra space in the file. The MOV is a better alternative. I can't think of any instance where I deliberately want to have something like "lea eax,[ebx]", perhaps someone can please explain why they prefer this inefficient coding. If your reason is using a debugger then you will be dissappointed to discover that OLLY will happily display all of the following as the same:
Code:
db 08dh,040h,000h
db 08dh,044h,020h,000h
db 03eh,08dh,044h,020h,000h
db 08dh,080h,000h,000h,000h,000h
db 08dh,004h,005h,000h,000h,000h,000h
db 03eh,08dh,004h,005h,000h,000h,000h,000h    
All display as "LEA EAX,[EAX]", so even disassemblers will not distinguish all the instructions properly.

The same goes for all optimisations where functionality is the same, TEST, LEA, OR, AND, XOR. I much prefer the assembler to make good compact code than have it bloat the target with unnecessary bytes using up precious space in the caches.
Post 15 Dec 2005, 02:20
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2018, Tomasz Grysztar.

Powered by rwasa.