flat assembler
Message board for the users of flat assembler.

 Index > Main > Optimizing Education...Please? Goto page Previous  1, 2
Author
tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 30 May 2005, 11:50
beppe85 wrote:
....

And "xor r32, r32" is clear for me... You got it? "clear" hehehehe

Yes, thanks, always appropriate to invoke humor!! Well, and actually, this is PRECISELY the point, isn't it? The English word "clear", which in this context means to RESET a register with all bits assigned a value of zero, also implies TRANSPARENT, or EASILY UNDERSTOOD. My point then is this: THERE IS NO NEED to invoke a Boolean operator, like xor, to accomplish a reset. Yes, an incidental CONSEQUENCE of executing this instruction in the peculiar, and unnatural, situation where BOTH operands represent the SAME register, is to assign a value of zero to all bits, JUST AS IF that had been our intention from the outset. However, good programming practice does not include writing CODE, that is ILLOGICAL, and based upon accidental, implementation dependent sequelae. It is not only NOT NORMAL, it is NOT LOGICAL, to employ a BOOLEAN OPERATOR to clear a register. If the architects had instead decided, (in that extremely rare, and unusual circumstance where a BOOLEAN operator is used on itself, i.e. XOR EAX, EAX, ) to set all bits to a value of 1, instead of zero, would you then use ANOTHER Boolean operator, NOT, in order to RESET the consequence of XOR EAX, EAX, so that EAX could assume the value of zero? My point then, in brief, is this: If, as a programmer, you seek to RESET the contents of a register to zero, DO SO: MOV EAX, zero
Simple, readily understood, i.e. TRANSPARENT, not confounded with BOOLEAN logic, and, as beppe85 noted: CLEAR.
30 May 2005, 11:50
r22

Joined: 27 Dec 2004
Posts: 805
r22 31 May 2005, 02:09
If you have no reason to optimize your code for speed than there is no "logical" point to using assembly, as you could develope much faster using C/C++.

XOR reg32,reg32 is an optimization, if you can't think of it as being a synonym for MOV reg32,0 then that's a personal fault of yours not the person using it. You can equate this to an educated person with a large vocabulary when speaking to a child they would dumb their vocab down so that they could be understood (MOV reg32,0) but when speaking with a similarly educated person they would use more complex words (XOR reg32,reg32) to their point accross in the most EFFICIENT way.

Ambiguous ASM code should have comments to more clearly represent the purpose of the instructions. But using slower instructions in your code because of clarity is illogical (if you want clear code use a HLA with syntax that people can understand).

Expecting machine code to be written so EVERYONE can more easily understand it's purpose undermines the purpose of writing machine code in the first place (speed).
31 May 2005, 02:09
tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 31 May 2005, 12:00
thank you r22 for your thoughtful ideas, much appreciated. I have some comments in response, but these thoughts are directed to the forum as a whole, not to you as criticisms.
First, your very provocative, BUT WIDELY ACCEPTED as valid, statement, that one can successfully complete a programming assignment with C/C++ (hll) more efficiently than with Assembly: sorry, I simply do not share this opinion, but, I do acknowledge that 95%+ of the world's programmers agree with you.
ASM is much easier to read and understand, than hll, in my opinion, when accessing any hardware component.
Second: I do not share your opinion that one is "dumbing down" to employ mov eax,zero, rather than xor eax,eax. However, if someone feels more productive, or more professional, or less amateurish to write xor eax, eax, instead of mov eax,zero, then, fine, no problem. Were such a person employed by me, I would then inquire: WHY does such an individual feel that this conduct represents a sign of superior skill? I would then inquire whether there may be OTHER instructions which perhaps ALSO assign a value of zero to a register. Does use of one of those instructions also convey a sense of superiority, or professionalism? Does such a person acknowledge the FACT that (mis)used frequently enough, to simply clear the contents of a register, xor will be overlooked BY OTHERS, reading this person's code, upon encountering the instruction in its intended setting: xor eax,ecx.
Third: machine code is not synonomous with Assembly, nor should the two be confounded in a discussion of the merits of Assembly.
Fourth: "But using slower instructions in your code because of clarity is illogical...". Hmm.
How do you know that mov eax,zero is SLOWER than xor eax,eax? Can you offer a benchmark study (which requires assignment of successively DIFFERENT, random values from memory, i.e. not data sitting in cache, prior to clearing the register by the two instructions)? Consider two identical programs, one written in asm in 1965 for the IBM 360, and the other written in hll (i.e. inefficient) in 2005, for one of the desktop cpu's widely available. Which program would execute faster??? Hardware improvements during the past four decades have fundamentally rendered moot the question of speed of execution. Rarely does one reject a program today because it executes too slowly. Therefore, instead of focusing on instruction speed, one ought to attend to the EASE of WRITING and READING a good program, defining good by virtue of facility for modification by another author.
31 May 2005, 12:00
r22

Joined: 27 Dec 2004
Posts: 805
r22 31 May 2005, 19:13
EAX is set to a random 32 bit value and then cleared using Xor, this is repeated 0xFFFFFFFF times.
EAX is set to a random 32 bit value and then cleared using Mov, this is repeated 0xFFFFFFFF times.

This simple benchmark shows Xor clearing to be faster than Mov on a P4 3.2ghz 1gbRam machine.
I've reversed the execution (testing the Mov loop first and Xor second) just to make sure, the results remain the same, Xor is faster.

After multiple tests the results of Xor loop then Mov loop testing AND Mov then Xor testing are as follows.
(Smaller numbers mean faster)
OpCode Range of results in milliseconds
Xor: 54894 - 55175
Mov: 55835 - 56203

Code:
```format PE GUI 4.0
entry start
include '%fasminc%\win32a.inc'

start:
call MakeSeed
mov ebx,0FFFFFFFFh
call [GetTickCount]
mov esi,eax
.lp1:
call Random32
xor eax,eax
dec ebx
jnz .lp1
call [GetTickCount]
sub eax,esi
mov esi,eax
mov ebx,0FFFFFFFFh
call [GetTickCount]
mov edi,eax
.lp2:
call Random32
mov eax,0
dec ebx
jnz .lp2
call [GetTickCount]
sub eax,edi
push eax
push esi
push fmt
push buffer
call [wsprintf]
push 4  ;yes/no yes continue / no end program
push buffer
push buffer
push 0
call [MessageBox]
.ending:
push 0
call [ExitProcess]

Random32:
push ebp
push ebx
mov ebp,RandomSeed
mov eax,[ebp]
mov ebx,[ebp+4]
mov ecx,[ebp+8]
mov edx,[ebp+12]
shld ebx,eax,1
ror eax,7
bswap eax
shld edx,ecx,1
rol ecx,5
mov [ebp],eax
mov [ebp+4],ebx
mov [ebp+8],ecx
mov [ebp+12],edx
xor eax,ecx
pop ebx
pop ebp
ret 0

SetSeed:
.seed equ esp+4 ;,+8,+12,+16
movdqu xmm0,[.seed]
movntdq dqword[RandomSeed],xmm0
ret 16

MakeSeed:
rdtsc
mov edx,eax
call [GetTickCount]
mov ecx,eax
mul edx
mov [RandomSeed],eax
xor edx,ecx
mov [RandomSeed+4],edx
bswap ecx
xor eax,ecx
mov [RandomSeed+8],eax
not edx
bswap edx
mul edx
mov [RandomSeed+12],eax
ret 0

fmt db 'Xor: %lu  Mov: %lu',0
buffer rb 0ffh

align 16
RandomSeed dd 1318699, 1015727, 1235239, 412943

section '.idata' import data readable writeable

library kernel32,'KERNEL32.DLL',\
user32,'USER32.DLL'
include  "%fasminc%\apia\kernel32.inc"
include  "%fasminc%\apia\user32.inc"

```
31 May 2005, 19:13
tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 31 May 2005, 20:30
First, I want to congratulate you on a job well done. I think it was EXCELLENT work, and I am impressed that you followed up on this topic. GOOD.
Umm, there are a couple of small points to address.
1. Let's assume, for sake of argument, that your benchmark study was perfect, and it is really quite good, so even if it had some small flaw, one can usefully draw some conclusions based on these results. My subsequent comments will focus on suggested "improvements", though, one person's improvements are another person's disposables!!! So, looking at your millisecond data,
Xor: 54894 - 55175
Mov: 55835 - 56203
we can see, very clearly, xor completes faster than mov, at least for this sequence of instructions. So, this certainly VINDICATES your position, in my opinion, you have succeeded in demonstrating, at least to my satisfaction, that I am wrong, and you are correct, xor is faster than mov.
What one might wish to ask about this data, though, is this: The delta, approximately 300-400 milliseconds over 55-56k milliseconds, represents about 0.5% increase in speed in the best case. In other words, is the obscurity associated with use of a boolean operator to clear a register, worth gaining such a modest improvement in execution speed, particularly realizing that the average program of 10,000 lines of code will have register clearing only about 10% of the time, i.e. 1000 total instructions? The time savings, while measureable, will not be noticeable by a human: (1000 x .005 x 2 clock cycles, (maximum), or about 1 microsecond slower with mov than with xor.
2. But, was this an ideal test??: I think a better test is one which creates a buffer of 10 million random 32 bit values (some number much larger than the 1 megabyte cache on a modern cpu.) After creating this large buffer, then, one has three groups of instructions to time: initially, (no clearing) assigning the first random value to the register, then, manipulating that value in some way, say, adding one to it, and repeat until finished. Then, for the second group of instructions, the register is cleared, by means of xor, and in the THIRD iteration, with mov. NO LOOPS in any of these three groups. ALL INLINE CODE (need a big editor, and a lot of cutting and pasting!!!) Timing is done as follows:
FIRST: load the first value into the register, perform the increment, BUT NO SUBSEQUENT CLEARING OF REGISTER, go to the next location for the second random value, and so on, repeated (but without loops), performed a million times (i.e. a million different 32 bit random numbers,) measure the time (I doubt it will be in milliseconds, just as well, for it is difficult to measure times of brief duration accurately.)
SECOND: load the first value, increment just as before, but then, CLEAR the register using xor, otherwise, exactly the same as first step, measure the time.
THIRD: load the first value, and increment just as before, but now, CLEAR the register using mov, otherwise, exactly the same as first step, measure the time.
I note that you were working in a Windows environment. This is not an appropriate testing environment for measuring execution times of these two instructions. You need a cpu with MINIMAL overhead, no interrupts, no polling, no i/o processes. Otherwise, random background is much too noisy. Best is NO OPERATING system. Apart from these minor, and trivial comments, good job.
31 May 2005, 20:30
UCM

Joined: 25 Feb 2005
Posts: 285
UCM 31 May 2005, 22:21
This is what you could do for a better test:
a. Use fasm's repeat...
Code:
```[body.asm]
repeat [some_large_amount]
...
end repeat
```

then, make a beginning of a com file...
Code:
```[head.asm]
format binary
org 256
.... ....
cli
```

then, make a footer...
Code:
```[ft.asm]
sti
...
```

and then, in WinNT's cmd.exe...
[code]
fasm body.asm body.bin
fasm ft.asm ft.bin
type body.asm >> prog.com
type body.asm >> prog.com
type body.asm >> prog.com
type body.asm >> prog.com
type body.asm >> prog.com
......
type ft.asm >> prog.com
then you have a nice long com file.

b.Copy your prog.com onto a floppy, then put DOS on it (making it bootable). Then, boot the floppy, and in realmode, execute prog.com.

PS. I don't think it matters whether you use mov or xor! The marginal speed improvement doesn't actually make any difference (unless you're looping like around a trillion trillion times clearing registers). It's only more satisfying, and also smaller.

_________________
This calls for... Ultra CRUNCHY Man!
Ta da!! *crunch*
31 May 2005, 22:21
smiddy

Joined: 31 Oct 2004
Posts: 557
smiddy 01 Jun 2005, 01:28
Holy smokes ladys and gents. I didn't want to open a can of worms here. I am currently on Holiday and won't return until June 5th. I have some comments but alas I'm having too much fun to add to the current force of this thread. I will when I return...as it seems it won't die before then

BTW, thanks everyone for your input I am learning quite a lot from your posts.
01 Jun 2005, 01:28

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Code:
```times_4294967295:
XOR EAX,EAX ;111124clk 100.0% xor's got a very tricky logic
SUB EAX,EAX ;109257clk  98.3% carries make a long dependancy
MOV EAX,0   ;105052clk  94.5% too much memory overhead
AND EAX,0   ; 85680clk  77.1% and's got the sweetest logic
```

I don't believe this test either because my tests in a non-threaded environment show that xor's issue 50% faster than mov's no matter were it in a cacheable or cacheless state, because of memory-read bottlenecks. If you provide it with cache hints - you can get it as fast as 250% in the best case* scenario.

*This means that there are no bottlenecks in the µops so that xor handles different register in over any consecutive 3 instructions. mov on the other hand does not seem to give a rats a\$\$ about written registers so no tricks needed there
02 Jun 2005, 08:56
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area
Goto page Previous  1, 2

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum