flat assembler
Message board for the users of flat assembler.
Index
> Main > XOR EAX,EAX Goto page Previous 1, 2, 3, 4, 5, 6, 7, 8 Next |
Author |
|
Borsuc 24 Jul 2007, 12:38
Now having come back to this forum and seeing THIS thread still going? sorry for offtopic...
Tom, I understand the "readability" you seek, and it's importance for you. It is not something hard to understand. What really bugs me from you, however, is that you keep saying "assembly can be as 'readable' as Pascal". Of course it can, but why is there both Pascal & assembly? There are different tools for different ideologies and different programmers. You want readability, Pascal is your thing (or any other language that suits this). Why shouldn't assembly be used for readability (for human logic by the way)? Because by definition, it is the computer's logic -- and readable for computers. We, as humans, are different and limited. I still don't see a point in using assembly AT ALL if you want human-readability. Look at this example: -> You want readability --> assembly is slow --> A HLL is more readable and is faster because you're lazy enough, but at least the compiler will optimize for you. -> You want small/fast/control --> assembly is perfect for this --> A HLL abstracts information and code for you. The point is, there is absolutely no point in using assembly for readability. Not all programs are designed to be read by people like you, because not all people are both biologists, architects, astronauts, computer programmers, artists, etc.. etc.. It's like the saying in art: "You either have it or you don't, baby" (no pun intended). the last phrase probably took me off a bit, eh it's been awhile since I was last on these forums |
|||
24 Jul 2007, 12:38 |
|
r22 24 Jul 2007, 13:07
re: Vid
Thanks, so I guess on the mobile Core2's there's no XOR vs MOV benefit. I ran the benchmark on a AMD x2 3800+. Would be great if people with other 64bit processors ran the benchmark code and shared their results. Then we'd be able to tell if there's an overall speed benefit to using XOR over MOV for reg clearing or if it's only applicable to the older architectures. |
|||
24 Jul 2007, 13:07 |
|
0.1 24 Jul 2007, 13:51
Much ado about ...
|
|||
24 Jul 2007, 13:51 |
|
LocoDelAssembly 24 Jul 2007, 15:32
vid: Perhaps it is the power saving that clocks down your processor? Try changing the code to make it call TestMov first and TestXor second.
Tom: About prefetching, if instructions are longer the prefetching capabilities of the processor are pushed to the limit and hence you risk to not have all the execution units of the processor busy. Although the benchmark don't care about this, still the MOV version could run slower by this. About using all memory is because replicating a code 10 millon times produces a big code that probably will use all the physical memory on some systems. The loop is to ensure that the requested amount of times can be executed independently of the amount of available RAM (the user would be warned about how much times the loop was unrolled). And yes, the RTC still works (which doesn't have millisecond precision), what it doesn't work is the clock maintained by the PIT's interrupt handler. Anyway, if you will not provide the test code then we have nothing to test. |
|||
24 Jul 2007, 15:32 |
|
r22 25 Jul 2007, 00:36
Results on my Core2 E6300 where the same as Vids mobile Core2.
-I did also try running the MOV test before the XOR test and saw that the benchmark had a ~1.5% error. So it appears that the optimization is not valid on the Core2, but useful on an AMD 64 (the new K10 architecture may remove the XOR benefit). It's always more interesting when rants and dribble are backed up with code. |
|||
25 Jul 2007, 00:36 |
|
rugxulo 25 Jul 2007, 06:29
I can hear it now:
Quote:
And then Intel goes out of business. |
|||
25 Jul 2007, 06:29 |
|
FrozenKnight 25 Jul 2007, 08:46
AMD does seem to make really nice processors, They run a bit hot though.
|
|||
25 Jul 2007, 08:46 |
|
Madis731 25 Jul 2007, 13:56
Feed
XOV VS MOV 64-BIT Benchmark Started... (time in processor ticks) Note: Req32 is used because the upper half of the 64bit register is cleared Function1 time (xor r32,r32): 0x108565374 Function2 time (mov r32,0x0): 0x105023ED4 Percentage speed difference : -1.275221% T7200 (mobile if you didn't know) and Server 2003 Enterprise x64 Edition not that it matters. ============AND============ XOV VS MOV 64-BIT Benchmark Started... (time in processor ticks) Note: Req32 is used because the upper half of the 64bit register is cleared Function1 time (xor r32,r32): 0x10A484DDC Function2 time (mov r32,0x0): 0x10668FB58 Percentage speed difference : -1.475688% E6700 (the last of FSB1066 series) and Server 2003 Enterprise x64 Edition not that it matters. Both tests ran in a relatively clean environment. Only desktop showing with no taskbar mini-icons. EDIT: Btw, try to put 6 NOPs in front of XOR loop like this: Code: times 6 nop rdtsc mov r14d,eax mov r15d,edx jmp xorlp and you will get EQUAL results. NOTE!!! This is extremely dependent on OS and CPU architecture, also the given situation the thread is given the go-signal. That is the problem of 16-byte code cache alignment NOT the data alignment. XOR takes exactly 16 bytes, but MOV takes 22 so MOV doesn't fit in one but it must take two, but XOR may or may not fall inside this. EDIT2: put add edx,eax instead of add rdx,rax and XOR beats MOV and changing every line of that code to use 32-bit registers makes no difference: XOR still beats MOV, but that is about as much fair as code that runs fast only on Intel or only on AMD or ATI, nVidia. These fights have been held before and will be held in the future |
|||
25 Jul 2007, 13:56 |
|
levicki 29 Jul 2007, 19:06
Hello everyone, I just had to register when I saw this discussion going on and on.
I am a devleoper whose main focus is code optimization. I write mainly in C/C++ and I also write in assembler using SIMD extensively and I am fluent in SSE, SSE2, SSE3, SSSE3 and SSE4.1. Tom, I really do not understand how you can keep arguing about this. You even cite book examples of assembler code which uses MOV eax, 0 as some sort of proof that MOV reg, 0 is preferred over XOR reg, reg which is a complete nonsense because anyone can find counter-examples to "prove" you wrong. Today compilers are much more advanced than they were just few years ago. They are analyzing complex data flow in a program in a ways human being can match only with tremendous effort, and they are using all known micro-architectural shortcuts in order to make code execute as fast as possible. In other words, they are not pragmatic, but oportunistic. Nowadays, there is no sense in using assembler for large portions of code where readability might be important. It is usually used sparingly in situations where compiler cannot optimize your high-level code to your satisfaction. So, the main reason for using assembler is to squeeze extra performance from the underlying micro-architecture in order to meet performance level you have been asked to provide. That means readability is no longer the most important issue -- performance is. You may argue that MOV reg, 0 is not much slower than XOR reg, reg, but when it comes to performance you cannot observe this independently of other code because various penalites accumulate and you get what it is called "death by a thousand papercuts" effect. You may argue that XOR reg, reg purpose is not obvious. You have already been told to make a macro CLEAR(reg) or ZERO(reg) which internally does XOR reg, reg and you refused to consider it. From your stubborness on the subject, it is obvious that you are a keyboard warrior and an adamant brute who insists on imposing their own, often flawed, views on others. In this case, your views and knowledge are old school but not in a positive way at all. Your evolution as an assembler programmer has obviously stopped at 386 code level which was many, many years ago. Mind you, your views are not flawed because you are stupid which you clearly aren't, but because you are ignorant. While you were happily ignoring progress, CPU micro-architectures have evolved several times. Modern CPUs have mind bogglingly complex OOO (Out-Of-Order) execution engines. Careless use of instructions and registers creates complex dependency chains and penalties on those modern CPUs, not to mention reduction of overall code and data throughput. Did you know that when a modern CPU sees XOR reg, reg instruction, it automatically knows that code which uses that register following XOR instruction does not depend on the code using the same register before it? That is a hint which you cannot pass to the CPU by using MOV instruction so XOR is often used to break dependency chains in the code. So Tom, I suggest you to start your "Back to the future" trip here: http://agner.org/optimize/ If you are by any chance programming for modern CPUs made by Intel I would strongly advise you to read IntelĀ® 64 and IA-32 Architectures Optimization Reference Manual. Relevant documentation for AMD CPUs also exists and I am sure you know how to use Google to find it, but most of the tricks explained in Intel's and Agner's manuals can be used on all modern CPUs and the only thing one has to consider is the level of SIMD support on a particular CPU which the code is targeting. After reading all those manuals, you will hopefully be able to accept the fact that both MOV reg, 0 and XOR reg, reg have their valid place under the Sun. If that doesn't help, and if after all this time you still haven't got yourself used to the XOR reg, reg, maybe it is time to retire and leave the real work to those who care about saving precious bytes and CPU cycles? I strongly urge someone in power to sticky and close this thread because further discussion is pointless. |
|||
29 Jul 2007, 19:06 |
|
LocoDelAssembly 22 Dec 2007, 22:37
Discussion about Intel C/C++ compiler continues here
|
|||
22 Dec 2007, 22:37 |
|
revolution 23 Dec 2007, 06:24
http://www.nytimes.com/2007/12/23/us/technology/23newinstruction.html?_r=1&ref=technology&oref=slogin
Santa Clara, CA - Today Intel announced an addition to the IA32 and IA32-64 instructions sets. For many years developers, coders and forum participants have been caught in an unending battle about standards, efficiency and readability of the instruction choices used in programs. To help stem the tide of vitriolic and sometimes abusive postings from all over the world, Intel have introduced a specialised instruction designed to ease the path of the affected programmers. Paul Otellini, Intel CEO, said that "This new instruction will allow all developers to move forward in peace and harmony for ever and ever. The time has come to settle these long and deep arguments once and for all". The developer community is hailing it as a superb upgrade to the 29 year old instruction set. Industry analysts are picking it to be the "Best thing since sliced bread". At the time of it's introduction the IA32 instruction set has been lacking a clear and concise way to do certain things. Some developers declared that readability was paramount, while others said efficiency in size and execution time was vital to any modern program and eclipsed any desire for other things. With this new instruction Intel has raised the bar for other chip makers to follow. AMD has yet to officially comment but the inside word is that they will eventually follow like lambs and bow to the superiority of it's bigger brother but they may use a different mnemonic just so that they don't look like they are just followers. Technical details of this new instruction are still not confirmed as the instruction is still under wraps. Experts in the field expect it to be unveiled on Dec-25, allowing Intel to give a Christmas present of goodwill to the world. An unnamed source has leaked the details to us and we can now share the joy with everyone. An unnamed source wrote:
Intel also announced that the new instruction will retroactively be made valid on all of it's previous IA32 architecture chips by using a previously unannounced secret remote access backdoor to update all processors from 8086 and up without needing to remove it from the system. |
|||
23 Dec 2007, 06:24 |
|
tom tobias 23 Dec 2007, 13:03
revolution wrote: Actually I meant to lock the XOR EAX,EAX topic. This split thread [referring to the C/C++ nonsense over in HEAP] still [has] some life in it yet, but limited I fear. I remain convinced, more than ever, that readability trumps execution speed, and that software development time is far more critical, than application throughput. Designing and writing good programs, as opposed to sloppily written code, is not trivial, and independent of the language used. For those (many) who believe that homo sapiens is so sapiens that he (or she, for 0.1) arrives on planet earth with an innate ability to distinguish XOR eax, ecx from a page full of XOR eax, eax, here's a genuine link to a real article, which lays to rest the myth of homo's self annointed brilliance: http://news.nationalgeographic.com/news/2007/12/071203-AP-chimp-memory.html Thank you very much Loco, for splitting the thread, and for (again) requesting the benchmark code, which I have not forgotten about. This thread is moribund only in the minds of those who believe they are so intelligent that they need not (or should not!) write clearly, but can prosper with muddled garfle mixed up with obfuscated nonsense. Why would we want to "lock" any thread on the FASM forum? This issue (readability versus execution speed) is at least as important today, in the era of multicore cpu's as it ever was in bygone days of single core computing. |
|||
23 Dec 2007, 13:03 |
|
revolution 23 Dec 2007, 13:18
This thread:
Code: MerryGoRound:
Optimise for size
Optimise for speed
Optimise for readability
jmp MerryGoRound |
|||
23 Dec 2007, 13:18 |
|
FrozenKnight 29 Dec 2007, 11:40
[quote="tom tobias"]
revolution wrote:
tom tobias it's not that we feel that we are smarter, we just feel that if readability is to be primary for programming ASM, then what is the advantage to using ASM over other more readable languages like Basic, C or Python? But since you think you need to have readable ASM programs I'll let you waste your time trying to make probably one of the more unreadable languages readable. IDK what you do with your time, my only use for ASM is to trick the CPU to preforming beyond what a compiler can output. other than that I'll use a compiler. |
|||
29 Dec 2007, 11:40 |
|
kohlrak 29 Dec 2007, 18:12
revolution wrote: This thread: Readability is BS as far as i'm concerned. Comments fix all in that regard, and if you can't read the code you shouldn't be programming in that programming language. ASM is the best for readability in that regard, since there are no 1 liners. |
|||
29 Dec 2007, 18:12 |
|
revolution 30 Dec 2007, 01:52
kohlrak wrote: ASM is the best for readability in that regard, since there are no 1 liners |
|||
30 Dec 2007, 01:52 |
|
kohlrak 30 Dec 2007, 05:05
Well... There aren't any one-liners that do more than making a raw text file...
|
|||
30 Dec 2007, 05:05 |
|
revolution 30 Dec 2007, 05:07
kohlrak wrote: Well... There aren't any one-liners that do more than making a raw text file... |
|||
30 Dec 2007, 05:07 |
|
revolution 30 Dec 2007, 05:11
e.g.
Code: irp x,include 'win32ax.inc',.code,start:,<invoke MessageBox,HWND_DESKTOP,"Hi! I'm the example program!","Win32 Assembly",MB_OK>,<invoke ExitProcess,0>,.end start{x} |
|||
30 Dec 2007, 05:11 |
|
Goto page Previous 1, 2, 3, 4, 5, 6, 7, 8 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.