"mov eax,0" or "xor eax,eax"

Index > Main > "mov eax,0" or "xor eax,eax"

Goto page 1, 2, 3, 4 Next

Author

Thread

vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia

vid 06 Dec 2005, 00:54

which is faster? xor is smaller and simpler, but i've heard it somehow slowdowns with some dependancy lines or whatever... somebody has deeper knowledge on this?

06 Dec 2005, 00:54

LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4623
Location: Argentina

LocoDelAssembly 06 Dec 2005, 02:33

http://www.agner.org/assem/pentopt.pdf look chapter "15.10 Breaking dependences"

06 Dec 2005, 02:33

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 06 Dec 2005, 08:19

There has been a debate on that before on these boards and we came to a conclusion that the best in speedwise/sizewise is AND EAX,00h because XOR EAX,EAX is more complicated and MOV EAX,00000000h is too long.

When you read agner's manual you should note that SUB EAX,EAX is bad because it is not bit-independent it could get as bad as one bit overflow carries to all 31 other bits. That is not the case with XOR.

EDIT:
NB! There are special cases though where AND EAX,00h crosses DWORD fetch boundary while XOR EAX,EAX doesn't. So the conclusion is not final Razz

and is 3 bytes, where xor is 2 bytes.

Sorry for the type'o

Last edited by Madis731 on 06 Dec 2005, 12:39; edited 1 time in total

06 Dec 2005, 08:19

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 06 Dec 2005, 09:45

What about this?

06 Dec 2005, 09:45

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 06 Dec 2005, 10:05

Madis, I wonder how can be XOR r, r more complicated than AND r, imm?

06 Dec 2005, 10:05

decard

Joined: 11 Sep 2003
Posts: 1092
Location: Poland

decard 06 Dec 2005, 12:15

MazeGen: so, if I understood it correctly, xor is better for P4 and above:? ?

06 Dec 2005, 12:15

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 06 Dec 2005, 12:44

http://www.play-hookey.com/digital/images/xorn-01.gif
Four NOT-AND (NAND) gates used, while AND uses 1 or 2.

That is because AND can be derived from regular switches two or more in a row, but XOR logic is very controversial to natural human brain. You could think of it as a carryless adder. And when you consider XOR and adder, then adders are a lot more complicated than AND

...and MazeGen - the thread you posted - at the very end I put some test results from a CLI/4Giga loops/STI test case.

Code:

times_4294967295:
XOR EAX,EAX ;111124clk 100.0% xor's got a very tricky logic
SUB EAX,EAX ;109257clk  98.3% carries make a long dependancy
MOV EAX,0   ;105052clk  94.5% too much memory overhead
AND EAX,0   ; 85680clk  77.1% and's got the sweetest logic

I really hoped the XOR would be at least as fast as SUB, but wow...

06 Dec 2005, 12:44

vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia

vid 06 Dec 2005, 13:31

madis: could you post the entire code you tested it on?

06 Dec 2005, 13:31

tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa

tom tobias 06 Dec 2005, 16:21

Madis731 wrote:

...

Code:

times_4294967295:
XOR EAX,EAX ;111124clk 100.0% xor's got a very tricky logic
SUB EAX,EAX ;109257clk  98.3% carries make a long dependancy
MOV EAX,0   ;105052clk  94.5% too much memory overhead
AND EAX,0   ; 85680clk  77.1% and's got the sweetest logic

...

I suppose that ENTER, LEAVE, and PUSHA modify EBP. However, if one is not using those instructions, then an alternative, to eliminate the heavy burden of "memory overhead", associated with MOV EAX,0, yet retain the spirit of writing PROGRAMS, instead of CODE, would be to assign EBP the value of 0, (initialization), and thereafter, use EBP as a CONSTANT, thus:
MOV EAX,EBP ; remember, EBP is always equal to zero
I guess that operation would then be just as fast as xor eax,eax, though, for me personally, the penalty of obscurity and lack of readability with mov eax,ebp, renders this solution useless. I prefer to pay the penalty, SLOWER, but easier to read:
MOV EAX, ZERO.
Pity that the Intel architecture has such a paucity of registers, however, I am surprised to learn that there is such a severe penalty (5.5% slower) for constants sitting in cache. Thank you Madis for your excellent travail!
Smile

06 Dec 2005, 16:21

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 06 Dec 2005, 17:32

I downloaded an IOPL module somewhere, but I don't remember anymore wher from. I've also lost the code I tested it with because I didn't think it was neccessary :S. I tryed writing another one and succeeded in arguing with myself. The platform difference shouldn't matter (the laptop I'm using is 700MHz PentiumIII Coppermine-T while last time I used 2.66GHz PentiumIV Northwood).
And damn, these clocks were normalized values so I can't even calculate the real clock rate :@. Hmm, lets just discard them...

Code:

;::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
; Calcuting cycle -
; by Edgar Barbosa, a.k.a Opcode
;::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
include "win32ax.inc"

start:

int 0edh
nop
cli
rdtsc
mov [dat], eax
nop
;;;;;;;;;;;;;;[code]
mov ecx,100000000
.Calc8:
    mov eax,0FEDCBA98h
    xor eax,eax
    mov eax,0FFFFFFFFh
    xor eax,eax
    mov eax,055555555h
    xor eax,eax
    mov eax,076543210h
    xor eax,eax
    ;XOR = 600000003 or 600000007 clocks
    ;SUB = 600000003 or 600000007 clocks
    ;AND = 500000004 or 500000008 clocks
    ;MOV = 500000010 or 500000014 clocks
sub ecx,1
jnz .Calc8
;;;;;;;;;;;;;;[/code]
rdtsc
sub eax, [dat]
sub eax,189
sti
nop

    cinvoke wsprintf, dat, "%020d clock cycles", eax
    invoke MessageBox, NULL, ecx, "Opcode IOPL hack", MB_OK

    invoke ExitProcess, 0

data import
  library kernel,'KERNEL32.DLL',user,'USER32.DLL'
  import kernel,ExitProcess,'ExitProcess'
  import user,wsprintf,'wsprintfA',MessageBox,'MessageBoxA'
  dat    dd 0,0,0,0    ,   0,0,0,0,0
end data

Another type of inner loop:

Code:

mov ecx,100000000
.Calc8:
    mov eax,0
    mov eax,0
    mov eax,0
    mov eax,0
    ;XOR = 466666616 or 466666620 clocks
    ;SUB = 466666616 or 466666620 clocks
    ;AND = 444680876 or 444680880 clocks
    ;MOV = 300000005 or 300000009 clocks
sub ecx,1
jnz .Calc8

P.S. tom tobias: I love your sarcasm but for this 'travail' I had to open up a dictionary. Why couldn't you just say 'hard work' to us non-native English speaking guys here Sad

paucity = smallness, fewness Wink

06 Dec 2005, 17:32

decard

Joined: 11 Sep 2003
Posts: 1092
Location: Poland

decard 06 Dec 2005, 17:41

That's better. If he uses more uncommon words, it will make you check them in dictionary, this way you will learn a new phrase. And I'm sure you will remember it. (sorry for getting offtopic) Wink

06 Dec 2005, 17:41

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 06 Dec 2005, 18:09

...and here's another inner loop. This is how I got results that XOR is better than MOV:

Code:

;;;;;;;;;;;;;;[code]
rept 1000 {mov eax,0}
;13497,13423,11106,12109,13325 | 5 consecutive tests
;rept 1000 {and eax,0}
;7015,6777,6842,7008,7687
;rept 1000 {sub eax,eax}
;5098,4629,4685,4937,4650
;rept 1000 {xor eax,eax}
;4461,4419,4873,4755,4629
;;;;;;;;;;;;;;[/code]

As you can see the XOR takes the first place on a PIII, but AND was the best on a PIV so it all depends on the pipeline and cache etc.

06 Dec 2005, 18:09

tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa

tom tobias 06 Dec 2005, 18:38

Madis731 wrote:

... I love your sarcasm but for this 'travail' I had to open up a dictionary. Why couldn't you just say 'hard work' to us non-native English speaking guys here
paucity = smallness, fewness

Lots here:
1. Your effort was and IS very informative, detailed, thorough, readable, and INTERESTING. To me, this effort GOES WAY BEYOND mere "hard working", and in English, when we wish to laud someone's effort, we move away from our Germanic roots, and substitute the Latin equivalent (i.e. FRENCH), as indicative of TRULY HIGH ACCOMPLISHMENT.
Since 1066, French, not English, is the language of choice for signifying praise worthiness, among native English speakers (meme si nous ne peux capable ni de parler, ni d'ecrire!!!). I think this is also true in MOST of Eastern Europe, especially Poland and Russia, countries with the bulk of the membership of the FASM forum.
2. OK, there was a tiny bit of tongue in cheek, but really, sincerely, I DO ENJOY reading your contributions, and felt that simply labelling your submissions to this thread "hard work" demeaned your labor. You have given us some actual data. Terrific!
3. What about MOV eax, ebp? Is it as fast, or faster or slower than xor eax,eax?
Sincerely, without ANY sarcasm.
Wink

06 Dec 2005, 18:38

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 06 Dec 2005, 18:58

Hi,
First, I'm sorry that I misunderstood you.
Second, I did test with EBP, but the "without interrupts" part is very dangerous. Tests with above 1000 iterations ended in BSOD and restart. Tests with mov ebp,0 \ mov eax,ebp also ended with BSOD Sad

Code:

push ebp
xor ebp,ebp
rept 1000 {mov eax,ebp}
;4179,4886,5061,4620,4606
;Note! These are not comparable with previous ones because of the overhead
pop ebp

Don't program interruptless code unless you know what you are doing Razz

Maybe my CPU (Pentium III) has some kind of mechanism to detect infinite loops and stops it by reboot or halt Neutral

. Maybe...
1000 times the same instruction is called optimization, but 10000 times the same sequence is definately not logical code Very Happy

06 Dec 2005, 18:58

vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia

vid 06 Dec 2005, 19:37

what's that int ED? maybe you could make article on it, if it is something hacky-cracky interesting

06 Dec 2005, 19:37

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 07 Dec 2005, 12:34

I think you should talk with Edgar Barbosa about this. I think the loaded .sys-file defines some interrupt at this address. Haven't seen the source - just using the binaries Wink

EDIT: Did some diggin' on the net and voila:
http://win.asmcommunity.net/board/index.php?topic=18859.0

07 Dec 2005, 12:34

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 07 Dec 2005, 16:05

decard wrote:

MazeGen: so, if I understood it correctly, xor is better for P4 and above:? ?

According to those numbers, MOV is A BIT faster, but in context of dependecies, XOR may be faster.

07 Dec 2005, 16:05

El Tangas

Joined: 11 Oct 2003
Posts: 120
Location: Sunset Empire

El Tangas 11 Dec 2005, 21:18

Either instruction can be faster, the important thing is not to cross cache boundaries. If you test an endless repeat of the same instruction, this will favour the shorter instructions (xor and sub), because more cache boundaries will be crossed. If you test a loop, results may be different.

Now, what about the worst way to clear a register?
I propose shr reg,32.

Just a note: In the athlon series, the mov reg, imm are "executed" by the decoder, so they are as nops and dont take execution resources.

11 Dec 2005, 21:18

MazeGen

Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia

MazeGen 11 Dec 2005, 21:26

Heh, you can't do SHL/R with 32.

The wortest way? What about IMUL reg,reg,0?

11 Dec 2005, 21:26

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 11 Dec 2005, 22:09

There are endless possibilities like:

Code:

test_zero:
sub eax,1
cmp eax,0
jne test_zero

;-OR-

lea eax,[0] ; That is 6 bytes: 8D 05 00 00 00 00

P.S. and yes, you can do:
shl eax,32
shr eax,32 ; but I don't remember which one of them was optimized
The requirement is imm8 so you can even do SHR EBP,255

EDIT:
Hmm, okey here's a summary:

Code:

33C0             XOR   EAX, EAX  ;Variants for 2-byte resets
2BC0             SUB   EAX, EAX
83E0 00          AND   EAX, 0    ;Many possibilities for 3-byte resets
C1E0 32          SHL   EAX, 32
C1E8 32          SHR   EAX, 32
6BC0 00          IMUL  EAX, EAX, 0
B8 00000000      MOV   EAX, 0    ;There are no 4-byte ones but there is one 5-byte reset and
8D05 00000000    LEA   EAX, [0]  ; a 6-byte one.
; An exception here:
6B05 00000000 00 IMUL  EAX, [0], 0 ; A 7-byte one but you must have a read-accessible
                                   ; memory-address here

Its getting interesting Very Happy

11 Dec 2005, 22:09

Goto page 1, 2, 3, 4 Next

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum