flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2 |
Author |
|
r22 31 May 2005, 02:09
If you have no reason to optimize your code for speed than there is no "logical" point to using assembly, as you could develope much faster using C/C++.
XOR reg32,reg32 is an optimization, if you can't think of it as being a synonym for MOV reg32,0 then that's a personal fault of yours not the person using it. You can equate this to an educated person with a large vocabulary when speaking to a child they would dumb their vocab down so that they could be understood (MOV reg32,0) but when speaking with a similarly educated person they would use more complex words (XOR reg32,reg32) to their point accross in the most EFFICIENT way. Ambiguous ASM code should have comments to more clearly represent the purpose of the instructions. But using slower instructions in your code because of clarity is illogical (if you want clear code use a HLA with syntax that people can understand). Expecting machine code to be written so EVERYONE can more easily understand it's purpose undermines the purpose of writing machine code in the first place (speed). |
|||
![]() |
|
tom tobias 31 May 2005, 12:00
thank you r22 for your thoughtful ideas, much appreciated. I have some comments in response, but these thoughts are directed to the forum as a whole, not to you as criticisms.
First, your very provocative, BUT WIDELY ACCEPTED as valid, statement, that one can successfully complete a programming assignment with C/C++ (hll) more efficiently than with Assembly: sorry, I simply do not share this opinion, but, I do acknowledge that 95%+ of the world's programmers agree with you. ASM is much easier to read and understand, than hll, in my opinion, when accessing any hardware component. Second: I do not share your opinion that one is "dumbing down" to employ mov eax,zero, rather than xor eax,eax. However, if someone feels more productive, or more professional, or less amateurish to write xor eax, eax, instead of mov eax,zero, then, fine, no problem. Were such a person employed by me, I would then inquire: WHY does such an individual feel that this conduct represents a sign of superior skill? I would then inquire whether there may be OTHER instructions which perhaps ALSO assign a value of zero to a register. Does use of one of those instructions also convey a sense of superiority, or professionalism? Does such a person acknowledge the FACT that (mis)used frequently enough, to simply clear the contents of a register, xor will be overlooked BY OTHERS, reading this person's code, upon encountering the instruction in its intended setting: xor eax,ecx. Third: machine code is not synonomous with Assembly, nor should the two be confounded in a discussion of the merits of Assembly. Fourth: "But using slower instructions in your code because of clarity is illogical...". Hmm. How do you know that mov eax,zero is SLOWER than xor eax,eax? Can you offer a benchmark study (which requires assignment of successively DIFFERENT, random values from memory, i.e. not data sitting in cache, prior to clearing the register by the two instructions)? Consider two identical programs, one written in asm in 1965 for the IBM 360, and the other written in hll (i.e. inefficient) in 2005, for one of the desktop cpu's widely available. Which program would execute faster??? Hardware improvements during the past four decades have fundamentally rendered moot the question of speed of execution. Rarely does one reject a program today because it executes too slowly. Therefore, instead of focusing on instruction speed, one ought to attend to the EASE of WRITING and READING a good program, defining good by virtue of facility for modification by another author. ![]() |
|||
![]() |
|
r22 31 May 2005, 19:13
EAX is set to a random 32 bit value and then cleared using Xor, this is repeated 0xFFFFFFFF times.
EAX is set to a random 32 bit value and then cleared using Mov, this is repeated 0xFFFFFFFF times. This simple benchmark shows Xor clearing to be faster than Mov on a P4 3.2ghz 1gbRam machine. I've reversed the execution (testing the Mov loop first and Xor second) just to make sure, the results remain the same, Xor is faster. After multiple tests the results of Xor loop then Mov loop testing AND Mov then Xor testing are as follows. (Smaller numbers mean faster) OpCode Range of results in milliseconds Xor: 54894 - 55175 Mov: 55835 - 56203 Code: format PE GUI 4.0 entry start include '%fasminc%\win32a.inc' section '.code' code readable executable start: call MakeSeed mov ebx,0FFFFFFFFh call [GetTickCount] mov esi,eax .lp1: call Random32 xor eax,eax dec ebx jnz .lp1 call [GetTickCount] sub eax,esi mov esi,eax mov ebx,0FFFFFFFFh call [GetTickCount] mov edi,eax .lp2: call Random32 mov eax,0 dec ebx jnz .lp2 call [GetTickCount] sub eax,edi push eax push esi push fmt push buffer call [wsprintf] add esp,12 push 4 ;yes/no yes continue / no end program push buffer push buffer push 0 call [MessageBox] .ending: push 0 call [ExitProcess] Random32: push ebp push ebx mov ebp,RandomSeed mov eax,[ebp] mov ebx,[ebp+4] mov ecx,[ebp+8] mov edx,[ebp+12] shld ebx,eax,1 adc eax,0 ror eax,7 bswap eax shld edx,ecx,1 adc ecx,0 rol ecx,5 mov [ebp],eax mov [ebp+4],ebx mov [ebp+8],ecx mov [ebp+12],edx xor eax,ecx pop ebx pop ebp ret 0 SetSeed: .seed equ esp+4 ;,+8,+12,+16 movdqu xmm0,[.seed] movntdq dqword[RandomSeed],xmm0 ret 16 MakeSeed: rdtsc mov edx,eax call [GetTickCount] mov ecx,eax mul edx mov [RandomSeed],eax xor edx,ecx mov [RandomSeed+4],edx bswap ecx xor eax,ecx mov [RandomSeed+8],eax not edx bswap edx mul edx mov [RandomSeed+12],eax ret 0 section '.data' data readable writeable fmt db 'Xor: %lu Mov: %lu',0 buffer rb 0ffh align 16 RandomSeed dd 1318699, 1015727, 1235239, 412943 section '.idata' import data readable writeable library kernel32,'KERNEL32.DLL',\ user32,'USER32.DLL' include "%fasminc%\apia\kernel32.inc" include "%fasminc%\apia\user32.inc" section '.reloc' fixups data discardable |
|||
![]() |
|
tom tobias 31 May 2005, 20:30
First, I want to congratulate you on a job well done. I think it was EXCELLENT work, and I am impressed that you followed up on this topic. GOOD.
Umm, there are a couple of small points to address. 1. Let's assume, for sake of argument, that your benchmark study was perfect, and it is really quite good, so even if it had some small flaw, one can usefully draw some conclusions based on these results. My subsequent comments will focus on suggested "improvements", though, one person's improvements are another person's disposables!!! So, looking at your millisecond data, Xor: 54894 - 55175 Mov: 55835 - 56203 we can see, very clearly, xor completes faster than mov, at least for this sequence of instructions. So, this certainly VINDICATES your position, in my opinion, you have succeeded in demonstrating, at least to my satisfaction, that I am wrong, and you are correct, xor is faster than mov. What one might wish to ask about this data, though, is this: The delta, approximately 300-400 milliseconds over 55-56k milliseconds, represents about 0.5% increase in speed in the best case. In other words, is the obscurity associated with use of a boolean operator to clear a register, worth gaining such a modest improvement in execution speed, particularly realizing that the average program of 10,000 lines of code will have register clearing only about 10% of the time, i.e. 1000 total instructions? The time savings, while measureable, will not be noticeable by a human: (1000 x .005 x 2 clock cycles, (maximum), or about 1 microsecond slower with mov than with xor. 2. But, was this an ideal test??: I think a better test is one which creates a buffer of 10 million random 32 bit values (some number much larger than the 1 megabyte cache on a modern cpu.) After creating this large buffer, then, one has three groups of instructions to time: initially, (no clearing) assigning the first random value to the register, then, manipulating that value in some way, say, adding one to it, and repeat until finished. Then, for the second group of instructions, the register is cleared, by means of xor, and in the THIRD iteration, with mov. NO LOOPS in any of these three groups. ALL INLINE CODE (need a big editor, and a lot of cutting and pasting!!!) Timing is done as follows: FIRST: load the first value into the register, perform the increment, BUT NO SUBSEQUENT CLEARING OF REGISTER, go to the next location for the second random value, and so on, repeated (but without loops), performed a million times (i.e. a million different 32 bit random numbers,) measure the time (I doubt it will be in milliseconds, just as well, for it is difficult to measure times of brief duration accurately.) SECOND: load the first value, increment just as before, but then, CLEAR the register using xor, otherwise, exactly the same as first step, measure the time. THIRD: load the first value, and increment just as before, but now, CLEAR the register using mov, otherwise, exactly the same as first step, measure the time. I note that you were working in a Windows environment. This is not an appropriate testing environment for measuring execution times of these two instructions. You need a cpu with MINIMAL overhead, no interrupts, no polling, no i/o processes. Otherwise, random background is much too noisy. Best is NO OPERATING system. Apart from these minor, and trivial comments, good job. ![]() |
|||
![]() |
|
UCM 31 May 2005, 22:21
This is what you could do for a better test:
a. Use fasm's repeat... Code: [body.asm] repeat [some_large_amount] ... end repeat then, make a beginning of a com file... Code: [head.asm] format binary org 256 .... .... cli then, make a footer... Code: [ft.asm] sti ... and then, in WinNT's cmd.exe... [code] fasm head.asm head.bin fasm body.asm body.bin fasm ft.asm ft.bin type head.asm >> prog.com type body.asm >> prog.com type body.asm >> prog.com type body.asm >> prog.com type body.asm >> prog.com type body.asm >> prog.com ...... type ft.asm >> prog.com then you have a nice long com file. b.Copy your prog.com onto a floppy, then put DOS on it (making it bootable). Then, boot the floppy, and in realmode, execute prog.com. PS. I don't think it matters whether you use mov or xor! The marginal speed improvement doesn't actually make any difference (unless you're looping like around a trillion trillion times clearing registers). It's only more satisfying, and also smaller. _________________ This calls for... Ultra CRUNCHY Man! Ta da!! *crunch* |
|||
![]() |
|
smiddy 01 Jun 2005, 01:28
Holy smokes ladys and gents. I didn't want to open a can of worms here. I am currently on Holiday and won't return until June 5th. I have some comments but alas I'm having too much fun to add to the current force of this thread. I will when I return...as it seems it won't die before then
![]() BTW, thanks everyone for your input I am learning quite a lot from your posts. |
|||
![]() |
|
Madis731 02 Jun 2005, 08:56
Code: times_4294967295: XOR EAX,EAX ;111124clk 100.0% xor's got a very tricky logic SUB EAX,EAX ;109257clk 98.3% carries make a long dependancy MOV EAX,0 ;105052clk 94.5% too much memory overhead AND EAX,0 ; 85680clk 77.1% and's got the sweetest logic I don't believe this test either because my tests in a non-threaded environment show that xor's issue 50% faster than mov's no matter were it in a cacheable or cacheless state, because of memory-read bottlenecks. If you provide it with cache hints - you can get it as fast as 250% in the best case* scenario. *This means that there are no bottlenecks in the µops so that xor handles different register in over any consecutive 3 instructions. mov on the other hand does not seem to give a rats a$$ about written registers so no tricks needed there ![]() |
|||
![]() |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.