flat assembler
Message board for the users of flat assembler.

Index > Main > "mov eax,0" or "xor eax,eax"

Goto page 1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7108
Location: Slovakia
vid
which is faster? xor is smaller and simpler, but i've heard it somehow slowdowns with some dependancy lines or whatever... somebody has deeper knowledge on this?
Post 06 Dec 2005, 00:54
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
http://www.agner.org/assem/pentopt.pdf look chapter "15.10 Breaking dependences"
Post 06 Dec 2005, 02:33
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
There has been a debate on that before on these boards and we came to a conclusion that the best in speedwise/sizewise is AND EAX,00h because XOR EAX,EAX is more complicated and MOV EAX,00000000h is too long.

When you read agner's manual you should note that SUB EAX,EAX is bad because it is not bit-independent it could get as bad as one bit overflow carries to all 31 other bits. That is not the case with XOR.

EDIT:
NB! There are special cases though where AND EAX,00h crosses DWORD fetch boundary while XOR EAX,EAX doesn't. So the conclusion is not final Razz
and is 3 bytes, where xor is 2 bytes.

Sorry for the type'o


Last edited by Madis731 on 06 Dec 2005, 12:39; edited 1 time in total
Post 06 Dec 2005, 08:19
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 959
Location: Czechoslovakia
MazeGen
What about this?
Post 06 Dec 2005, 09:45
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 959
Location: Czechoslovakia
MazeGen
Madis, I wonder how can be XOR r, r more complicated than AND r, imm?
Post 06 Dec 2005, 10:05
View user's profile Send private message Visit poster's website Reply with quote
decard



Joined: 11 Sep 2003
Posts: 1095
Location: Poland
decard
MazeGen: so, if I understood it correctly, xor is better for P4 and above:? ?
Post 06 Dec 2005, 12:15
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
http://www.play-hookey.com/digital/images/xorn-01.gif
Four NOT-AND (NAND) gates used, while AND uses 1 or 2.

That is because AND can be derived from regular switches two or more in a row, but XOR logic is very controversial to natural human brain. You could think of it as a carryless adder. And when you consider XOR and adder, then adders are a lot more complicated than AND

...and MazeGen - the thread you posted - at the very end I put some test results from a CLI/4Giga loops/STI test case.
Code:
times_4294967295:
XOR EAX,EAX ;111124clk 100.0% xor's got a very tricky logic
SUB EAX,EAX ;109257clk  98.3% carries make a long dependancy
MOV EAX,0   ;105052clk  94.5% too much memory overhead
AND EAX,0   ; 85680clk  77.1% and's got the sweetest logic 
    

I really hoped the XOR would be at least as fast as SUB, but wow...
Post 06 Dec 2005, 12:44
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7108
Location: Slovakia
vid
madis: could you post the entire code you tested it on?
Post 06 Dec 2005, 13:31
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
Madis731 wrote:
...
Code:
times_4294967295:
XOR EAX,EAX ;111124clk 100.0% xor's got a very tricky logic
SUB EAX,EAX ;109257clk  98.3% carries make a long dependancy
MOV EAX,0   ;105052clk  94.5% too much memory overhead
AND EAX,0   ; 85680clk  77.1% and's got the sweetest logic 
    
...

I suppose that ENTER, LEAVE, and PUSHA modify EBP. However, if one is not using those instructions, then an alternative, to eliminate the heavy burden of "memory overhead", associated with MOV EAX,0, yet retain the spirit of writing PROGRAMS, instead of CODE, would be to assign EBP the value of 0, (initialization), and thereafter, use EBP as a CONSTANT, thus:
MOV EAX,EBP ; remember, EBP is always equal to zero
I guess that operation would then be just as fast as xor eax,eax, though, for me personally, the penalty of obscurity and lack of readability with mov eax,ebp, renders this solution useless. I prefer to pay the penalty, SLOWER, but easier to read:
MOV EAX, ZERO.
Pity that the Intel architecture has such a paucity of registers, however, I am surprised to learn that there is such a severe penalty (5.5% slower) for constants sitting in cache. Thank you Madis for your excellent travail!
Smile
Post 06 Dec 2005, 16:21
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
I downloaded an IOPL module somewhere, but I don't remember anymore wher from. I've also lost the code I tested it with because I didn't think it was neccessary :S. I tryed writing another one and succeeded in arguing with myself. The platform difference shouldn't matter (the laptop I'm using is 700MHz PentiumIII Coppermine-T while last time I used 2.66GHz PentiumIV Northwood).
And damn, these clocks were normalized values so I can't even calculate the real clock rate :@. Hmm, lets just discard them...
Code:
;::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
; Calcuting cycle -
; by Edgar Barbosa, a.k.a Opcode
;::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
include "win32ax.inc"

start:

int 0edh
nop
cli
rdtsc
mov [dat], eax
nop
;;;;;;;;;;;;;;[code]
mov ecx,100000000
.Calc8:
    mov eax,0FEDCBA98h
    xor eax,eax
    mov eax,0FFFFFFFFh
    xor eax,eax
    mov eax,055555555h
    xor eax,eax
    mov eax,076543210h
    xor eax,eax
    ;XOR = 600000003 or 600000007 clocks
    ;SUB = 600000003 or 600000007 clocks
    ;AND = 500000004 or 500000008 clocks
    ;MOV = 500000010 or 500000014 clocks
sub ecx,1
jnz .Calc8
;;;;;;;;;;;;;;[/code]
rdtsc
sub eax, [dat]
sub eax,189
sti
nop

    cinvoke wsprintf, dat, "%020d clock cycles", eax
    invoke MessageBox, NULL, ecx, "Opcode IOPL hack", MB_OK

    invoke ExitProcess, 0

data import
  library kernel,'KERNEL32.DLL',user,'USER32.DLL'
  import kernel,ExitProcess,'ExitProcess'
  import user,wsprintf,'wsprintfA',MessageBox,'MessageBoxA'
  dat    dd 0,0,0,0    ,   0,0,0,0,0
end data
    

Another type of inner loop:
Code:
mov ecx,100000000
.Calc8:
    mov eax,0
    mov eax,0
    mov eax,0
    mov eax,0
    ;XOR = 466666616 or 466666620 clocks
    ;SUB = 466666616 or 466666620 clocks
    ;AND = 444680876 or 444680880 clocks
    ;MOV = 300000005 or 300000009 clocks
sub ecx,1
jnz .Calc8
    


P.S. tom tobias: I love your sarcasm but for this 'travail' I had to open up a dictionary. Why couldn't you just say 'hard work' to us non-native English speaking guys here Sad
paucity = smallness, fewness Wink
Post 06 Dec 2005, 17:32
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
decard



Joined: 11 Sep 2003
Posts: 1095
Location: Poland
decard
That's better. If he uses more uncommon words, it will make you check them in dictionary, this way you will learn a new phrase. And I'm sure you will remember it. (sorry for getting offtopic) Wink
Post 06 Dec 2005, 17:41
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
...and here's another inner loop. This is how I got results that XOR is better than MOV:
Code:
;;;;;;;;;;;;;;[code]
rept 1000 {mov eax,0}
;13497,13423,11106,12109,13325 | 5 consecutive tests
;rept 1000 {and eax,0}
;7015,6777,6842,7008,7687
;rept 1000 {sub eax,eax}
;5098,4629,4685,4937,4650
;rept 1000 {xor eax,eax}
;4461,4419,4873,4755,4629
;;;;;;;;;;;;;;[/code]
    

As you can see the XOR takes the first place on a PIII, but AND was the best on a PIV so it all depends on the pipeline and cache etc.
Post 06 Dec 2005, 18:09
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
Madis731 wrote:
... I love your sarcasm but for this 'travail' I had to open up a dictionary. Why couldn't you just say 'hard work' to us non-native English speaking guys here Sad
paucity = smallness, fewness Wink

Embarassed
Lots here:
1. Your effort was and IS very informative, detailed, thorough, readable, and INTERESTING. To me, this effort GOES WAY BEYOND mere "hard working", and in English, when we wish to laud someone's effort, we move away from our Germanic roots, and substitute the Latin equivalent (i.e. FRENCH), as indicative of TRULY HIGH ACCOMPLISHMENT.
Since 1066, French, not English, is the language of choice for signifying praise worthiness, among native English speakers (meme si nous ne peux capable ni de parler, ni d'ecrire!!!). I think this is also true in MOST of Eastern Europe, especially Poland and Russia, countries with the bulk of the membership of the FASM forum.
2. OK, there was a tiny bit of tongue in cheek, but really, sincerely, I DO ENJOY reading your contributions, and felt that simply labelling your submissions to this thread "hard work" demeaned your labor. You have given us some actual data. Terrific!
3. What about MOV eax, ebp? Is it as fast, or faster or slower than xor eax,eax?
Sincerely, without ANY sarcasm.
Wink
Post 06 Dec 2005, 18:38
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
Hi,
First, I'm sorry that I misunderstood you.
Second, I did test with EBP, but the "without interrupts" part is very dangerous. Tests with above 1000 iterations ended in BSOD and restart. Tests with mov ebp,0 \ mov eax,ebp also ended with BSOD Sad
Code:
push ebp
xor ebp,ebp
rept 1000 {mov eax,ebp}
;4179,4886,5061,4620,4606
;Note! These are not comparable with previous ones because of the overhead
pop ebp
    

Don't program interruptless code unless you know what you are doing Razz
Maybe my CPU (Pentium III) has some kind of mechanism to detect infinite loops and stops it by reboot or halt Neutral. Maybe...
1000 times the same instruction is called optimization, but 10000 times the same sequence is definately not logical code Very Happy
Post 06 Dec 2005, 18:58
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7108
Location: Slovakia
vid
what's that int ED? maybe you could make article on it, if it is something hacky-cracky interesting
Post 06 Dec 2005, 19:37
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
I think you should talk with Edgar Barbosa about this. I think the loaded .sys-file defines some interrupt at this address. Haven't seen the source - just using the binaries Wink



EDIT: Did some diggin' on the net and voila:
http://win.asmcommunity.net/board/index.php?topic=18859.0
Post 07 Dec 2005, 12:34
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 959
Location: Czechoslovakia
MazeGen
decard wrote:
MazeGen: so, if I understood it correctly, xor is better for P4 and above:? ?

According to those numbers, MOV is A BIT faster, but in context of dependecies, XOR may be faster.
Post 07 Dec 2005, 16:05
View user's profile Send private message Visit poster's website Reply with quote
El Tangas



Joined: 11 Oct 2003
Posts: 120
Location: Sunset Empire
El Tangas
Either instruction can be faster, the important thing is not to cross cache boundaries. If you test an endless repeat of the same instruction, this will favour the shorter instructions (xor and sub), because more cache boundaries will be crossed. If you test a loop, results may be different.

Now, what about the worst way to clear a register?
I propose shr reg,32.

Just a note: In the athlon series, the mov reg, imm are "executed" by the decoder, so they are as nops and dont take execution resources.
Post 11 Dec 2005, 21:18
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 959
Location: Czechoslovakia
MazeGen
Heh, you can't do SHL/R with 32.

The wortest way? What about IMUL reg,reg,0?
Post 11 Dec 2005, 21:26
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
There are endless possibilities like:
Code:
test_zero:
sub eax,1
cmp eax,0
jne test_zero

;-OR-

lea eax,[0] ; That is 6 bytes: 8D 05 00 00 00 00
    


P.S. and yes, you can do:
shl eax,32
shr eax,32 ; but I don't remember which one of them was optimized
The requirement is imm8 so you can even do SHR EBP,255

EDIT:
Hmm, okey here's a summary:
Code:
33C0             XOR   EAX, EAX  ;Variants for 2-byte resets
2BC0             SUB   EAX, EAX
83E0 00          AND   EAX, 0    ;Many possibilities for 3-byte resets
C1E0 32          SHL   EAX, 32
C1E8 32          SHR   EAX, 32
6BC0 00          IMUL  EAX, EAX, 0
B8 00000000      MOV   EAX, 0    ;There are no 4-byte ones but there is one 5-byte reset and
8D05 00000000    LEA   EAX, [0]  ; a 6-byte one.
; An exception here:
6B05 00000000 00 IMUL  EAX, [0], 0 ; A 7-byte one but you must have a read-accessible
                                   ; memory-address here
    


Its getting interesting Very Happy
Post 11 Dec 2005, 22:09
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.