flat assembler
Message board for the users of flat assembler.
Index
> Main > XOR EAX,EAX Goto page Previous 1, 2, 3, 4, 5, 6, 7, 8 Next |
Author |
|
FrozenKnight 19 Jul 2007, 08:50
i bet that soon one of these guys is going to post about the advantages of using .net in your asm coding.
|
|||
19 Jul 2007, 08:50 |
|
tom tobias 19 Jul 2007, 09:58
r22 wrote: (If you want to port it to linux so the weirdos can use it please do so). r22 wrote:
This is surely a step forward, in the discussion. Thank you for this contribution. Well done. Problems are two: a. we need the "plain vanilla" version, i.e. standard 32 bit cpu, not the more "exotic" 64 bit cpu architecture. b. we need a real world application, (hint: FASM itself!!!!) which has had the XOR's replaced by MOV's. Then we ALSO need, ACTUAL TIMES, (using the clock on the motherboard, as Michael Abrash showed, many decades ago,) not percentages, because if the real world application runs one third slower with MOV, than with XOR, and still executes, FROM the perspective of the end user, in the SAME time, then, the fact that it is one third faster with XOR is MEANINGLESS. CPU architecture has improved execution speeds more than 1000 times, during the past three decades. The same cannot be said for software development times, which drag on, seemingly forever. My point is all about IMPROVING SOFTWARE IMPLEMENTATION TIMES, not execution times. I have no doubt that your 64 bit task above executes faster with XOR, than with MOV, I made no claim to superiority of speed or memory utilization associated with MOV. My argument, which thus far seems a tad difficult to comprehend by FASM forumers, is that neither EXECUTION SPEED nor memory reduction remain as current foci of interest in developing commercially viable software. Banks, Factories, Hospitals, i.e. places which use computers, and need reliable software, do not have the slightest interest in whether or not a program written with one "coding" philosophy executes in 439 microseconds, while another program, implementing precisely the same task, requires 439 MILLIseconds, i.e. 1000 times slower, JUST SO LONG as the end user cannot detect any difference. But, if the slower executing program, is FAR MORE READILY understood, by the programming staff of the hospital or factory, then, the contract will be awarded to the software developer who wrote a PROGRAM, instead of the other developer who wrote (much faster executing) CODE. XOR here, is not the MAIN point, it serves only to illustrate what is wrong with the "code" written on the FASM forum, including FASM itself. r22 wrote: XOR is faster that's all their is to it. If you want to continue using mov to clear your registers then QUIETLY continue being slow/wrong. |
|||
19 Jul 2007, 09:58 |
|
revolution 19 Jul 2007, 10:32
vid wrote: tom: as i countless time told you, XOR instruction on x86 is not same as boolean XOR. boolean XOR operates on true / false values, but instruction XOR operates on set of 32 true/false values, and modifies one of it's operands. Do not mistake these two. |
|||
19 Jul 2007, 10:32 |
|
FrozenKnight 19 Jul 2007, 13:29
tom tobias, a list of earl world application where that may make a difference unfortunately i don't have the time to do the research or codeing to make the examples.
Encryption of large amounts of data. memory checkers, random number generators (i would show you my example of this but i managed ti find to remove all 3 of the xors from it for better optimization.) but tom my point doesn't end at xor. if we were to remove every optimization just to gain readability then why are we developing in asm. The only advantage to codeing in asm over other languages is that you have control over the execution speed and size by using such optimizations. if i just wanted to make a readable program then C or C++ would work much better. i code in ASM to show off tricks and skills not make thing easy for the lazy among us. yes this has hit me in the a** a few times and it probably will a few more but big deal. even when i'm codeing in C i some times wonder why that line is there and why i pot it there. asm wont change that. and neither will using non optimized code. Note: to all newbs coders like tom tobias are why you now need a 2 ghz or better cpu to run any game you buy at the store. |
|||
19 Jul 2007, 13:29 |
|
vador 23 Jul 2007, 08:09
I believe in the future they will be programming games in C# of VB.NET, just because of readability and because the game would ship earlier. That makes me sad...
|
|||
23 Jul 2007, 08:09 |
|
calpol2004 23 Jul 2007, 13:46
I can't believe we're arguiing about something so petty .
My stance on the subject is this, if your going to use complex instructions just to save a few bytes then go ahead, just make sure you comment it and maybe even comment in the alternative more easy to understand method. Alot of people only use assembly for speed and size and couldn't give a rats ass about readability, if they did then they'd use a HLL language. Tom does have a point however, one of the main problems with assembly is that to a begginner the code looks like mud . And vador unfortunatly it seems to be going that way, but communities like this will always exist. I feel your pain however, the college course i'm doing teaches visual basic and .NET and all that crap . |
|||
23 Jul 2007, 13:46 |
|
kohlrak 23 Jul 2007, 21:57
vador wrote: I believe in the future they will be programming games in C# of VB.NET, just because of readability and because the game would ship earlier. That makes me sad... This is a sad truth. |
|||
23 Jul 2007, 21:57 |
|
vid 23 Jul 2007, 22:25
Code: XOV VS MOV 64-BIT Benchmark Started... (time in processor ticks) Note: Req32 is used because the upper half of the 64bit register is cleared Function1 time (xor r32,r32): 0x10834D66E Function2 time (mov r32,0x0): 0x10420351E Percentage speed difference : -1.568697% feed |
|||
23 Jul 2007, 22:25 |
|
kohlrak 23 Jul 2007, 22:30
What about sub eax, eax, or shl eax, 32/shr eax, 32? And there's and eax, 0... Then there's also mul 0... I'm sure there's others too.
|
|||
23 Jul 2007, 22:30 |
|
LocoDelAssembly 23 Jul 2007, 22:54
All of them does not perform register renaming so the processor silly waits for instruction to finish to store the obvious zero result.
vid, what processor did you use? |
|||
23 Jul 2007, 22:54 |
|
vid 23 Jul 2007, 23:00
Quote: vid, what processor did you use? Intel Core 2 Duo T5500 |
|||
23 Jul 2007, 23:00 |
|
tom tobias 24 Jul 2007, 00:06
r22 wrote: XOR is faster that's all their [sic] is to it. |
|||
24 Jul 2007, 00:06 |
|
LocoDelAssembly 24 Jul 2007, 00:48
Since the zeroing constitutes a third of the total loop code I wonder why you get such small negative difference. You did run the test several times, right?
|
|||
24 Jul 2007, 00:48 |
|
tom tobias 24 Jul 2007, 02:31
Jeff Reilly and Dave Salvator, Intel Corp wrote: The ideal benchmark uses the applications and performs the operations you need. If this is not the case, you have to assess how representative the benchmark is of your needs.... http://www.automotivedesignline.com/howto/bodyelectronics/197700787;jsessionid=VZ0N0HAJRIY0CQSNDLQCKIKCJUNN2JVN I think we need a benchmark that tests XOR and MOV, without loops, (that's correct, BIG program, guess what, you have LOTS of memory, USE it!!), and without any overhead, such as accessing the stack, or using call/return. I believe it is advantageous to develop such a benchmark, here, on the forum, and thereby refute the argument that assembly language is difficult for "begginners [sic] to understand". No, it is clearly difficult, when written as CODE, for anyone to understand. The notion that one should perform a test initially with a 64 bit cpu, seems to me counter productive--let's first get a good testing program running ON ANY 32 bit x86 cpu. With regard to LocoDelAssembly's query re: quantity of iterations, the test ought to produce the same result every time: that means, NOTHING else is running on the computer, so, the notion of "setting priority" is nonsense. Of course such a test CANNOT BE EXECUTED, meaningfully, under windows, or linux. It must run in protected mode, in Ring zero, not ring 3, with NO OPERATING SYSTEM overhead. Proper benchmarking is DIFFICULT. The only way, to have meaningful results, is to create such a huge program that one can measure the time in minutes/seconds with a wristwatch. The moment one starts accessing hardware on the computer, with interrupts, then the fragile times required for performing the actual instruction of interest are disturbed by the interrupts. No, a proper test needs to disable all interrupts during the test. Executing the task must be the sole activity of the cpu. To avoid switching times, there are two identical versions of the same test. One version uses XOR, the other MOV. Both tests must be initiated separately. power on, enter protected mode, prompt to start, (user must record the time)disable interrupts, run program, enable interrupts, prompt user upon completion, approximately 10-20 seconds later. The test itself ought to include assigning zero to all four registers, EAX, EBX, ECX, EDX, then assigning any arbitrary integer to the same four registers, and then back to zero again, incrementing the integer, and repeat, but without loops. A truly slick program would create the benchmark: i.e. a program to write a program on the fly. |
|||
24 Jul 2007, 02:31 |
|
handyman 24 Jul 2007, 03:32
Quote:
Plue, you are missing the reason for commenting. Comments should not repeat the command, instead it should indicate what is happening in the logic flow and/or the reason that the instruction is being done, if it clarifies the code to your mind. Commenting is needed for when you may want to come back to your own code a year from now and want to know what is happening and why. |
|||
24 Jul 2007, 03:32 |
|
LocoDelAssembly 24 Jul 2007, 03:34
I think you risk to loose easily that way Tom The prefeching must be really good to keep the processor feeded completely all the time.
Can you provide the code that must be repeated across all the available RAM? (assembly code, not descriptive human language). Note that I think that we must still use a loop to make better meassurements, but we can unroll the loop N times so we don't need to iterate no more than 4 times on a 1 GB RAM system. The program could be designed to ensure that the body of the loop will be executed N times and will be unrolled up to fill all available memory reducing the loop count from N to N/unrolling_factor. About execution environment, I think we could use some of the OSes developed by fasm members and hack and strip them to our needs (apart of starting our own boot code). |
|||
24 Jul 2007, 03:34 |
|
DOS386 24 Jul 2007, 06:27
Forum wrote:
Quote: Replies: 116 My most successful thread |
|||
24 Jul 2007, 06:27 |
|
vid 24 Jul 2007, 08:28
Loco: i runned it twice, with similar results. I can play with it more later
|
|||
24 Jul 2007, 08:28 |
|
vador 24 Jul 2007, 08:59
[joke]
i don't have a 64-bit processor so i'll run this benchmark inside qemu-x64 running itself inside bochs 2.3 running inside virtualbox running inside VMWare workstation on a transmetta processor [/joke] |
|||
24 Jul 2007, 08:59 |
|
Goto page Previous 1, 2, 3, 4, 5, 6, 7, 8 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.