flat assembler
Message board for the users of flat assembler.

Index > Heap > Intel C/C++ compiler discussion

Goto page 1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Edit by Loco: This is a continuation of this post from the XOR EAX, EAX thread


levicki wrote:
I am a devleoper whose main focus is code optimization. I write mainly in C/C++ and I also write in assembler using SIMD extensively and I am fluent in SSE, SSE2, SSE3, SSSE3 and SSE4.1.
...
Today compilers are much more advanced than they were just few years ago. They are analyzing complex data flow in a program in a ways human being can match only with tremendous effort, and they are using all known micro-architectural shortcuts in order to make code execute as fast as possible. In other words, they are not pragmatic, but oportunistic.


Sorry, but that's a big generalization. There are hundreds of compilers in existence, and you're obviously saying that new ones are somehow hundreds of times better than "older" pre-SSE ones. I doubt it. Plus, you fail to mention even one specific compiler. Please be more specific with whom you refer (MS VC++, Digital Mars, GCC??).

Secondly, not trying to overstate the obvious, but compilers are braindead. No matter how much you tell it to do, it can never beat a human being. Sorry, it just ain't gonna happen (else the computers would actually do what they're supposed to do instead of annoying us users all the time, heh).

levicki wrote:

Nowadays, there is no sense in using assembler for large portions of code where readability might be important. It is usually used sparingly in situations where compiler cannot optimize your high-level code to your satisfaction.


No sense? You can surely write readable assembly, but maybe your boss (or partners) don't like it or understand it. You are only limited by your own skills and according to external factors like OS support, people you work with / for, etc. Very high-level assembly is not new, and some people can write it quite fast, efficiently (ahem, Octavio), and readable too (err, maybe not Laughing ).
Post 31 Jul 2007, 06:18
View user's profile Send private message Visit poster's website Reply with quote
Mac2004



Joined: 15 Dec 2003
Posts: 313
Mac2004
Quote:
but compilers are braindead. No matter how much you tell it to do, it can never beat a human being. Sorry, it just ain't gonna happen (else the computers would actually do what they're supposed to do instead of annoying us users all the time, heh).


rugxulo: Well said. Very Happy

regards,
Mac2004
Post 31 Jul 2007, 19:26
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
rugxulo wrote:
but compilers are braindead. No matter how much you tell it to do, it can never beat a human being. Sorry, it just ain't gonna happen (else the computers would actually do what they're supposed to do instead of annoying us users all the time, heh).
True.

The only thing people don't realize is that, if compilers (actually 'computers') ever become "conscious" and optimize like us (i.e they understand the code as an algorithm, rather than just a bunch of symbols), what will make them "optimize" for us? If someone comes to me and asks be to optimize his code, will I necessarily do it? I have my own mind so if I don't want, then I don't.

Not saying that assembly means better algorithms always. But if you know assembly, you design your code differently, you think differently, and usually this results in better algorithms. And often it's not even tied to a specific architecture (of course I'm not talking about instruction tricks like XOR eax, eax, but more on a larger scale).

I usually start my projects in C (I hate C++'s OOP abstraction thing Wink ), just to 'get them running', even if they aren't optimized -- just to test it works the way I want (the algorithm). But even in that phase, before I design the algorithm, I think just how efficient it would really be. A very simple example would be that division is way slower than multiplication. But the list goes a lot beyond that.

After which when I'm satisfied with a given design I go to asm. Mind you I also learnt a lot from the compiler's assembly listing Laughing

Also the C code is useful if I want to port the algorithm to a different system --> it's usually easier to translate C code to assembly than assembly to assembly Wink

Plus I love the macros for assemblers, like Fasm and Nasm.
Post 03 Aug 2007, 12:15
View user's profile Send private message Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
rugxulo wrote:
Sorry, but that's a big generalization. There are hundreds of compilers in existence, and you're obviously saying that new ones are somehow hundreds of times better than "older" pre-SSE ones. I doubt it. Plus, you fail to mention even one specific compiler. Please be more specific with whom you refer (MS VC++, Digital Mars, GCC??).


Like MSVC 2005, GCC4 and especially Intel C/C++ 10.0. And while MSVC and GCC produce reasonable code for today standards, Intel compiler is ahead of them because it has auto-vectorizer and auto-parallelizer plus it can perform numerous loop transformations in order to squeeze maximum efficiency and keep the pipeline busy. Happy now?

rugxulo wrote:
Secondly, not trying to overstate the obvious, but compilers are braindead. No matter how much you tell it to do, it can never beat a human being. Sorry, it just ain't gonna happen (else the computers would actually do what they're supposed to do instead of annoying us users all the time, heh).


Then you obviously haven't tried one recently otherwise you wouldn't be talking like that.

Of course that human being can beat a compiler but (at least when it comes to Intel compiler) it can do so at a great cost and the benefit generally is very small.

Consider this simple but common code sequence:

Code:
// test.cpp
float a[1000], b[1000], c = 3.14f;

for (int i = 0; i < 1000; i++) {
    a[i] = b[i] * c;
}
    


Surely you can optimize it in assembler, use SIMD and vectorize the loop, write proper tail if number of elements is not divisible by four, write a preroll code to align vectors if you can't assume alignment, etc, but it won't be any faster than code generated by Intel C++, and it will take you much more time to do all that than it will take me to type icl /O3 /QxT test.cpp and get executable optimized for Core 2 Duo.

What if you need version which works well on AMD64/Opteron as well as Core 2 Duo? I just type icl /O3 /QaxOT test.cpp, and I get code with runtime dispatcher based on the CPU detected and three versions of the function (A64/Opteron, C2D, x86 generic). How about you? Will you waste your time coding dispatcher and several versions of the same function when you know that compiler can do it for you?

How about future CPUs, say Penryn? No problem for me, I type icl /O3 /QxS test.cpp and I have heavily optimized code for the CPU which is not yet available in retail. You will have to learn SSE4.1 first, to find assembler which properly implements those instructions and then to figure out how to efficiently use them.

Then if you want to fine tune the code you have to try to schedule instructions manually, something which compiler does automatically because it has a table with latencies, throughput, and execution units for each instruction and for each CPU it supports.

rugxulo wrote:
No sense? You can surely write readable assembly, but maybe your boss (or partners) don't like it or understand it. You are only limited by your own skills and according to external factors like OS support, people you work with / for, etc. Very high-level assembly is not new, and some people can write it quite fast, efficiently (ahem, Octavio), and readable too (err, maybe not Laughing ).


I can write it too, but what is the point if compiler produces code which is good enough? Why should I waste my precious time writing 90% of the unimportant code in assembler, when I can focus on improving compiler output for that 10% where it matters?

The_Grey_Beast wrote:
(i.e they understand the code as an algorithm, rather than just a bunch of symbols)


They are well above that level for a long time. You guys must have been sleeping. When we write code we analyze data flow and then figure out the best wey to accomplish the task. Modern compilers do the same. They no longer just parse the code but they analyze data flow as well, they look for dependencies, etc. Nowadays, compilers are science.

The_Grey_Beast wrote:
But if you know assembly, you design your code differently, you think differently, and usually this results in better algorithms.


I agree that you think differently, in the sense that an assembler programmer would never write:

Code:
for (int j = 0; j < 100; j++) {
    for (int i = 0; i < 100; i++) {
        a[i][j] = 0; // instead of a[j][i] = 0;
    }
}
    


But I disagree on the algorithm part. Knowing assembler doesn't have anything to do with knowing how to write better algorithm.

The_Grey_Beast wrote:
A very simple example would be that division is way slower than multiplication.


Any decent compiler replaces division with multiplication if precision allows.

What I am trying to say here, is that compilers are like computer chess programs. They get better and better because they have more and more rules and cases they can recognize and optimize well.

Today no human chess player can beat the computer chess program, best they can get is a draw. I am just saying that same is going to happen soon with assembler programmers .vs. compilers.

Don't get me wrong, I am saying all this as someone who loves assembler very much. I grew up with it, starting with Z80, then MC68000 and then X86. Back when I was younger I wanted to write everything in assembler because I was obsessed with speed. Everything had to be as fast as possible. Nowadays I understand that if I do so, I won't be able to finish any project on time.

I still use assembler and I can still beat any compiler, but the margin has shrinked dramatically over the years. I wouldn't like you to doubt my knowledge so here is the link to an article I wrote some time ago for Intel Developer Services:

http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/code/languages/194751.htm

That code sample is a perfect example where programmer knows better than the compiler. So I am not denying that such cases exist, but don't forget that the next version of the compiler will learn that trick.

I suggest you to do some research and even perform some tests on your own, see how good you are against the machine. There is a free trial version of Intel C++ available for download and there is even free non-commercial version for Linux. Give it a try and then we can discuss it further if you want.
Post 04 Aug 2007, 04:01
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
levicki, thanks for posting all that, I appreciate it very much (specially your first post that synthetized why to use XOR very well).
Post 04 Aug 2007, 04:51
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
levicki: excellent article, thanks.

Quote:
There is a free trial version of Intel C++ available for download

You mean, there is some trial version for windows too? I do have linux version, but i boot linux only very seldom...
Post 04 Aug 2007, 10:40
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
LocoDelAssembly, clearly naive, wrote:

levicki, thanks for posting all that, I appreciate it very much (specially your first post that synthetized why to use XOR very well).
Well, "levicki", whoever you are, I think you may profit from reading ANY philosophy textbook, because your long winded diatribe amounts to a repudiation of the rationale for this forum:
levicki, a prototype fascist, wrote:

I strongly urge someone in power to sticky and close this thread because further discussion is pointless.

Notwithstanding your command to silence, having written the DEFINITIVE proclamation, explaining to everyone's satisfaction, or, at least to Loco's satisfaction, and thus OBVIATING need for any further discussion on XOR, you curiously submitted a SECOND rejoinder, which clearly has NOTHING whatsoever to do with XOR, thus confirming, to me at least, that you are a rather arrogant person, who believes that he knows quite a bit more than anyone else, and can therefore expound on whatever topic, whenever it pleases him, regardless of the thrust of the thread:
levicki, further demonstrating his intolerance, wrote:

Why should I waste my precious time writing 90% of the unimportant code in assembler, when I can focus on improving compiler output for that 10% where it matters?

I am not going to address any of the nonsense you have written about C compilers, take that garbage not simply to another thread, but to some other forum.
I will write my opinion of your PRONOUNCEMENTS, or perhaps DICTA is a superior term to explain your attitude, regarding XOR, since that is the topic of this thread.
Misquoting, and misrepresenting are further HALLMARKS of fascism. How I hate the fascists.
levicki wrote:

After reading all those manuals, you will hopefully be able to accept the fact that both MOV reg, 0 and XOR reg, reg have their valid place under the Sun.

I defy anyone to find a quote of mine contradicting this. I never claimed that XOR reg, reg, is INVALID. I claimed, AND STILL CLAIM, notwithstanding Agner Fog, or the Israelis who designed the cpu, or anyone else, especially a minor leaguer like you "levicki", that XOR reg, reg, OUGHT TO BE RESERVED for use with Exclusive OR operations involving TWO different registers. In that situation, and ONLY that situation, I too, would use XOR. I have ZERO interest in SPEED, or SAVING MEMORY, I have interest ONLY in readability.
levicki, continuing with his condescending, know it all attitude, wrote:

Did you know that when a modern CPU sees XOR reg, reg instruction, it automatically knows that code which uses that register following XOR instruction does not depend on the code using the same register before it? That is a hint which you cannot pass to the CPU by using MOV instruction so XOR is often used to break dependency chains in the code.
So many points, so much nonsense. I do not like, and do not seek, CODE, in which furtiveness is extolled. I like OPENNESS, and sunlight, not darkness and concealment. "Automatic"??? Everything is automatic. I believe you are here explaining how convenient it is to write CODE, that cannot be understood by simply reading the text, but which requires a knowledge that transcends the instruction itself. This is PRECISELY why I have no interest in clearing a register, or performing ANY OTHER TASK, with special, secret, "optimized" instructions. NOPE. NO INTEREST. I prefer SIMPLE, EASILY READ, EASILY UNDERSTOOD programs, not code. I am COMPLETELY disinterested in the Intel/AMD architecture, so those "refinements", which "AUTOMATICALLY" interpret program flow, in certain cases executing instructions out of sequence, are of no interest to me. I do not seek to have Intel's engineers changing my program flow.
levicki, again, misrepresenting, wrote:

You even cite book examples of assembler code which uses MOV eax, 0 as some sort of proof that MOV reg, 0 is preferred over XOR reg, reg which is a complete nonsense because anyone can find counter-examples to "prove" you wrong.
Nonsense. I cited the textbook of Dandamudi, late professor of computer science, as an illustration of WELL WRITTEN, WELL DOCUMENTED PROGRAMMING. In fact, I suspect, though I do not remember off hand, that he also has used XOR reg1, reg1 to clear reg1. His book is well done, ANYWAY. I forgive him for misusing XOR, if in fact he did--I simply don't remember.
levicki, with another dictum, wrote:

Nowadays, there is no sense in using assembler for large portions of code where readability might be important.
Oh, I must be wrong then.....
Wow. What a revelation. To have been deceived for all those DECADES, thinking that my teachers, authors from Europe, mainly, were correct, having taught me in futility that readability was of paramount importance, well, what a sad day, to learn they were all wrong, but, praise be to allah, that levicki could explain what so many of them could not.
levicki, hitting the nail on the head, wrote:

That means readability is no longer the most important issue -- performance is.
Oh yes, I see the light. Thank you levicki. Thank you for teaching me what so many other FASM forumer's have endeavored for so long to explain. READABILITY is DEAD. Long live performance. Hallelujah. Epiphany.
levicki, who doesn't beat around the bush, wrote:

If that doesn't help, and if after all this time you still haven't got yourself used to the XOR reg, reg, maybe it is time to retire and leave the real work to those who care about saving precious bytes and CPU cycles?
thank you for stating the obvious, I am too old, and too decrepit, and too incompetent to appreciate the modern era with electricity and motor cars---where's my horse?
Post 04 Aug 2007, 11:28
View user's profile Send private message Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
Listen! Listen! Listen!
The biggest debate ever on xor eax, eax is here!
Wow! Asm people are just fun to read/hear/listen Wink
Post 04 Aug 2007, 12:48
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Tom, yes, I didn't agree with that last part of the first post neither, I just mean the content regarding to the topic.
Post 04 Aug 2007, 14:54
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
Levicki, good article and more importantly, logical points.
Just ignore Tobias's repsonse because he only deals in verbiage and petty one-up-manship.

Most people here know that ASM programming is largely ::THIS WILL LIKELY BE QUOTED OUT OF CONTEXT:: unneeded, but that won't stop people from using it. COBOL is still widely used today even though newer languages may be considered superior. If people know the language then they will prefer using it, that's as simple as it gets.
Your chess example is perfect... people won't stop playing chess just because a computer can beat them at it.

Programming in ASM is a hobby for most or used to optimize algorithms/bottlenecks that compilers still may have trouble with. Any large projects done in ASM are done because the programmer prefers the language not for some bloated superiority complex. So, there's really no need to push any philosophy.

The debates about performance are just best practice discussions for people who enjoy using ASM. The misinformed views about compiler technology are just due to ignorance of the topic mostly because of lack of interest in it. I think if Intel offered their compile as a full version free of charge then the facts would be more widely available.

MOV RAX, QWORD [Rant_Over]
RET 0
Post 04 Aug 2007, 20:03
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
FrozenKnight



Joined: 24 Jun 2005
Posts: 128
FrozenKnight
Wow, this seems to get more and more heated. but it's fun so i'll throw another log onto the fire.

Yes i will admit that on modern cpu's the benefit of using XOR is minor. but there are sevral reasons i still stick to using the old XOR.

first if anyone has doubt as to what it does then they can look it up on Google.

Second both AMD and Intell are optimizing the XOR instruction for best performance when zeroing a register.

Third XOR is smaller shaving a few bites of downloads on slow computers.

fourth the prefetch, this is the most overlooked part of optimization. what does the prefetch have to do with anything? well it is simple the prefetch is the amount of data the CPU can grab ahead of time to' read and execute an instruction. say you have a 32 bit cpu and it has a 32 bit prefetch (not uncommon especially on older computers.) and you use mov since mov is 5 bytes to zero a register the size of the instruction is 40 bits, since the cpu can only grab 32 bits at a time the cpu has to wait for one extra cycle to grab the remaining 8 bits. while using xor which is only 16 bits you can zero 2 instructions in as little as one cycle (if the coy can process the remaining prefecth at the same time if not it only does one at a time.) which can offer a significant performance boost in some critical code.

i know some of you are going to argue that this doesn't affect modern cpu's but that isn't entirely true. even though modern cpu's are fast and have large cache's they still have to pass the prefecth through a prefetch buffer (which i don't think id being documented any more) anyway unless the buffer is large enough to handle 72 bits on a 64 bit cpu the cpu will still have this wait to process the instruction. (this is possibly why amd's XOR is faster than it's MOV.) because only the designer knows for certain how the data is processed in the cpu it's better to stick with what you know will work at it's best in all cases. (btw, this is why XOR has been perfered over and over again even though it could have memory access problems in older cpu's. the only other instruction what was ever considered other than "XOR reg, reg" for these cases was "SUB reg, reg". but XOR eventually won out.)

But in the end who really cares how you zero a register. it's the coders choice, if they choose to optimize then everyone is happier. if not i hope they have a good reason for it. (like planing a million updates a year.)
Post 05 Aug 2007, 06:43
View user's profile Send private message Reply with quote
GISTAPO



Joined: 05 Aug 2007
Posts: 11
Location: world
GISTAPO
XOR is simplest and easy way for electronic devices. Look this --> (+)(+)=(-) "XOR"
Post 05 Aug 2007, 07:26
View user's profile Send private message Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
LocoDelAssembly wrote:
levicki, thanks for posting all that, I appreciate it very much (specially your first post that synthetized why to use XOR very well).


vid wrote:
levicki: excellent article, thanks.


You are welcome.

vid wrote:
You mean, there is some trial version for windows too? I do have linux version, but i boot linux only very seldom...


Sure there is, not only for Windows but also for Mac OS:
http://www.intel.com/cd/software/products/asmo-na/eng/219690.htm

tom tobias wrote:
Well, "levicki", whoever you are, I think you may profit from reading ANY philosophy textbook, because your long winded diatribe amounts to a repudiation of the rationale for this forum:


Since you are recommending me philosophy textbooks on an assembler forum it seems that you are more interested in philosophy than in assembler. That leads me to conclude that you are more interested in the debate itself rather than in its substance, so I won't take any of your posts to heart. But I will reply and I apologize in advance for the off-topic.

tom tobias wrote:
levicki, a prototype fascist, wrote:

I strongly urge someone in power to sticky and close this thread because further discussion is pointless.


Notwithstanding your command to silence, having written the DEFINITIVE proclamation, explaining to everyone's satisfaction, or, at least to Loco's satisfaction, and thus OBVIATING need for any further discussion on XOR,


You obviously do not understand the meaning of the word "command" which is pretty basic vocabulary. I wrote "I strongly urge", which is a suggestion, not a command.

Since you are having trouble with simple words, one cannot get mad at you for using more complicated words such as fascist to qualify a person (in this case me) who disagrees with your point of view.

By the way, if you really read any of those philosophy textbooks you are recommending me, you would at least learn how to disagree with someone in a more civilized manner.

You have also misunderstood my call for closing of this thread. I am not against all debate (that is why I posted again contradicting myself a bit). I just consider this XOR debate pointless.

This thread is dragging since December 8th, 2006 and the arguments I presented in my post weren't posted by others so far. So yes, I believe I know a lot on the subject, and I belive that I have contributed the DEFINITIVE answer to the XOR debate to those who want to see it.

Unfortunately there is something called delusion and also self-deception. That is exactly why no amount of reasoning can make some individuals accept the facts.

As I said, XOR is more than just a way to zero a register, it is a hint for breaking data dependency chains. As such, its use is more than justified and if someone here on this board believes they are smarter than the compiler writers who use XOR and the CPU architects who made it work the way it does, then I think they are gravely mistaken.

tom tobias wrote:
levicki, further demonstrating his intolerance, wrote:
Why should I waste my precious time writing 90% of the unimportant code in assembler, when I can focus on improving compiler output for that 10% where it matters?

I am not going to address any of the nonsense you have written about C compilers, take that garbage not simply to another thread, but to some other forum.


So you claim I am intolerant for saying that there are other languages than assembler, then you say that I write nonsense without providing any valid argument, and finally you show how tolerant you really are by telling me to leave. Or should I say how much of a hypocrite?

tom tobias wrote:
I claimed, AND STILL CLAIM, notwithstanding Agner Fog, or the Israelis who designed the cpu, or anyone else, especially a minor leaguer like you "levicki", that XOR reg, reg, OUGHT TO BE RESERVED for use with Exclusive OR operations involving TWO different registers.


And what if those two different registers hold the same value resulting in ZERO in destination? Would you ban that as well because it is not obvious enough? I can already see the revised claim:

XOR reg, reg, OUGHT TO BE RESERVED for use with Exclusive OR operations involving TWO different registers having TWO different values

tom tobias wrote:
I have ZERO interest in SPEED, or SAVING MEMORY, I have interest ONLY in readability.


Then answer a simple question you have been asked many times so far in this thread: Why do you write in assembler if your goal is neither SPEED nor SAVING MEMORY?

If you do not answer this you will confirm my suspicion that you are just a keyboard warrior.

tom tobias wrote:
So many points, so much nonsense.


Again without proof or valid counter-argument.

tom tobias wrote:
I like OPENNESS, and sunlight, not darkness and concealment.


Darkness and concealment are disspelled by learning, not by running away from them.

tom tobias wrote:
but which requires a knowledge that transcends the instruction itself.


What are you talking about?!? The only knowledge you need to understand XOR instruction is that of Boolean Algebra. There is no witchcraft involved in clearing registers, just Boolean algebra rules which say that the bit will be 1 only if both bits are different and 0 otherwise.

tom tobias wrote:
I am COMPLETELY disinterested in the Intel/AMD architecture, so those "refinements", which "AUTOMATICALLY" interpret program flow, in certain cases executing instructions out of sequence, are of no interest to me. I do not seek to have Intel's engineers changing my program flow.


Well guess what? Modern CPUs always execute your instructions out of sequence because that is what makes them fast and efficient, so you can now go and cry in your little corner if that bothers you so much.

tom tobias wrote:
I forgive him for misusing XOR, if in fact he did--I simply don't remember.


Wow, you forgive him? Shocked Laughing
This is becoming more and more ridiculous by the minute.

tom tobias wrote:
levicki, with another dictum, wrote:
Nowadays, there is no sense in using assembler for large portions of code where readability might be important.

...To have been deceived for all those DECADES, thinking that my teachers, authors from Europe, mainly, were correct, having taught me in futility that readability was of paramount importance...


Nice example of word twisting serving the ultimate purpose of showing off your educated background. Unfortunately you missed the proverbial nail and hit your own thumb.

What I tried to convey is that for large portions of code where readability might be important, assembler is not exactly the most readable language available out there.

tom tobias wrote:
levicki, hitting the nail on the head, wrote:
That means readability is no longer the most important issue -- performance is.

Oh yes, I see the light. Thank you levicki. Thank you for teaching me what so many other FASM forumer's have endeavored for so long to explain. READABILITY is DEAD. Long live performance.


Another nice example, this time of ripping my words out of context to imply that I claimed something I didn't. I never said readability is not important or dead. I just claimed that there are parts of code (10% or less I spoke of earlier) where readability is willingly sacrificed for performance.

Again, that doesn't mean one cannot or should not use comments to explain what has been done and why.

tom tobias wrote:
thank you for stating the obvious, I am too old, and too decrepit, and too incompetent to appreciate the modern era with electricity and motor cars---where's my horse?


No, the problem is that you won't look at the things from a distance. You are not seeing the proverbial forest from a tree.

For example, someone tells you to write a function, giving you requirements in form of input and desired output. You write MOV EAX, 0 in that function and the code is readable but you are completely ignoring anything outside the code you were told to write. What if said function is getting called 1,000,000,000 times in a loop? What if your MOV takes just one extra cycle? 1 cycle x 1,000,000,000 loop iterations = 1 second longer on a 1 GHz CPU. What if that code is part of a larger processing application and is getting called 600 times to process some dataset? 600 frames x 1 second = 600 seconds or 10 minutes of wasted time. Ten minutes may not look like a big deal but if you are processing medical dataset from an X-Ray scan of a person who just had their spine broken in a car accident and is waiting for operation those ten minutes can easily decide life or death.

r22 wrote:
Levicki, good article and more importantly, logical points.


Thanks.

r22 wrote:
If people know the language then they will prefer using it, that's as simple as it gets.


I know, but the problem is that they believe they know the language (in this case assembler) but they don't. Knowing assembler is not just knowing the syntax, you have to understand the architecture you are targeting with your code and because architecture is constantly evolving you have to evolve with it as well.

r22 wrote:
Programming in ASM is a hobby for most or used to optimize algorithms/bottlenecks that compilers still may have trouble with. Any large projects done in ASM are done because the programmer prefers the language not for some bloated superiority complex. So, there's really no need to push any philosophy.


I agree with you. Again I never said ASM is not needed. It is still needed, and I like using it in every situation where it gives me an advantage.

r22 wrote:
The debates about performance are just best practice discussions for people who enjoy using ASM. The misinformed views about compiler technology are just due to ignorance of the topic mostly because of lack of interest in it. I think if Intel offered their compile as a full version free of charge then the facts would be more widely available.


If people were more interested in how compilers of today work, they would be able to write better code themselves. As for those facts, they are available for quite a long time in Intel 64 and IA-32 Architectures Optimization Reference Manual (PDF, 3.04MB) which includes a neat list of assembler and compiler coding rules. Anyone can follow those rules and write good assembler code.
Post 05 Aug 2007, 18:05
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
FrozenKnight wrote:
...Yes i will admit that on modern cpu's the benefit of using XOR is minor....
You mean, the time saved may be less significant than was formerly believed to be the case. Certainly I was not protesting this point. I don't know if MOV is as fast, or as slow, as XOR. So far, I haven't seen convincing data, one way or the other. But, for me, if for no one else here, it is quite simply irrelevant how much faster or slower XOR may be. XOR should not be used to clear a register, because, among other reasons, XOR, but not MOV, changes flags, without that fact being apparent upon looking at the code. What if there are MILLIONS and BILLIONS of clearing operations to perform??? Shouldn't we be worried about saving every picosecond possible???
Yes and no. It really depends on the application. For 99% of the world's applications, reducing execution time is far less important than saving development/improvement/revision time, therefore, readability, particularly on large scale projects, is of far greater significance than saving several tens of milliseconds. Yes, there are a couple of important applications where one must use liquid nitrogen to cool custom designed proprietary gate arrays, because every picosecond matters. The Intel architecture may be inadequate for those applications requiring genuine real time performance assessing critical data arriving by digital cameras, such as real time satellite based remote surgical operating room protocols, and flight simulators for the world's various military establishments. For most applications, the cpu is just sitting there, idle. What gain is there if the cpu is idle 74.5% of the time (XOR, perhaps), instead of 67.9% (MOV, perhaps)??? Most of the childish arguments against MOV are based on thinking from 35 years ago, and perhaps, back then, there may have been a significant performance degradation, one perceived by the user, but I doubt it.
levicki wrote:
...The only knowledge you need to understand XOR instruction is that of Boolean Algebra. There is no witchcraft involved in clearing registers, just Boolean algebra rules which say that the bit will be 1 only if both bits are different and 0 otherwise. ....

Maybe I am reading an older edition of the manual than everyone else.
In my copy, the following sentence occurs:
ancient, dog eared, old copy of 486 manual, with slobber all over it, representing the drooling of a senile old goat wrote:

Flags Affected
The CF and OF flags are cleared; the SF, ZF, and PF flags are set according to the result; the AF flag is unidentified.

Alas, George Boole himself, could not figure this out, absent a manual from Intel. http://www.kerryr.net/pioneers/boole.htm
My opposition to use of XOR to clear a register INCLUDES the fact that it is counterintuitive, but is not BASED upon that fact.
levicki wrote:
...And what if those two different registers hold the same value resulting in ZERO in destination? Would you ban that as well because it is not obvious enough? ...

Of course not. I already explained, MANY times, that it is not Boolean Algebra that I dislike, it is MISUSE of Boolean algebra to perform a clearing operation on a single register. There is nothing wrong with performing an exclusive OR function, or any other logical function, so long as the algorithm requires it. There is something wrong, in my opinion, with employing a Boolean operation, on a single register, not because it is mathematically incorrect, no, I don't write that, I appreciate that there is no error involved for mathematicians, it is wrong, rather, because WE ARE NOT MATHEMATICIANS, and our goal, is not to perform a mathematical function, but rather, to simply clear a register, an activity readily accomplished by the INTUITIVELY OBVIOUS instruction: MOV REG, ZERO. Whatever trivial penalty one must pay for use of a more easily understood, more easily modified instruction sequence, it is almost always worth it. Maybe, (PERHAPS), 30 years ago, the situation was a little bit different, I can not say for sure, but today, THE LEAST of our problems with typical software, is that it is too slow because of misusing MOV, when one ought to be using instead, XOR!!!!
levicki wrote:
...
so you can now go and cry in your little corner if that bothers you so much...
Frankly, I don't care if you especially, or anyone else for that matter, agrees with me or not. For me, XOR is only significant as a symptom of the much greater problem that FASM forumers tend to create code, instead of authoring programs.
levicki wrote:
...Why do you write in assembler if your goal is neither SPEED nor SAVING MEMORY?
If you do not answer this you will confirm my suspicion that you are just a keyboard warrior. ...
I am sufficiently disinterested in you, to avoid answering any question you may ask. There is someone on the forum who knows the answer to your riddles--I write for his "benefit", not yours.
Post 05 Aug 2007, 20:27
View user's profile Send private message Reply with quote
MichaelH



Joined: 03 May 2005
Posts: 402
MichaelH
Quote:

ancient, dog eared, old copy of 486 manual, with slobber all over it, representing the drooling of a senile old goat wrote:


LOL Smile
Post 05 Aug 2007, 22:58
View user's profile Send private message Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
tom tobias wrote:
You mean, the time saved may be less significant than was formerly believed to be the case.


I admit that on modern CPUs difference between MOV and XOR may not be perceivable especially if the code is not in a performance critical path. But there are two distinct cases where the difference may be noticed:

1. You have data dependency chain which XOR can break and MOV can't

2. You are hitting the the limit of instruction decoding bandwidth where using shorter instructions can help to keep as much code in instruction/trace cache as possible.

Those cases are not imaginary, they exist in high-performance code. Ignoring them won't get you anywhere.

tom tobias wrote:
XOR, but not MOV, changes flags, without that fact being apparent upon looking at the code.


Whoever looks at the assembler code should have proper (PRIOR) architectural knowledge. One should not expect random Joe Sixpack coming off the street and meddling with XORs and MOVs.

tom tobias wrote:
For most applications, the cpu is just sitting there, idle. What gain is there if the cpu is idle 74.5% of the time (XOR, perhaps), instead of 67.9% (MOV, perhaps)???


It matters a lot, at least to me as a user and I believe others will agree. Having more idle time makes for smoother multi-tasking and the ability to run more background tasks without choking the machine.

tom tobias wrote:
Maybe I am reading an older edition of the manual than everyone else.


You are indeed, that is why I gave you a link to those "new and improved" manuals. But it seems that you believe you already know everything worth knowing.

tom tobias wrote:
The CF and OF flags are cleared; the SF, ZF, and PF flags are set according to the result; the AF flag is unidentified.


So what? That cannot hurt you unless you are changing instruction order and that means you are writing out-of-order code which you already said you dislike.

tom tobias wrote:
it is MISUSE of Boolean algebra to perform a clearing operation on a single register.


How can a Boolean algebra be misused?!?

That sounds as crazy as if you said that calculating number PI to 1,000,000 digits using Chudnovsky algorithm is a misuse of Fourier transform and how everyone should use only Gauss-Legendre to calculate PI.

tom tobias wrote:
There is nothing wrong with performing an exclusive OR function, or any other logical function, so long as the algorithm requires it.


Well my algorithm for clearing a register requires it. I want all my 0's in a register XOR-ed with 0's and all 1's XOR-ed with 1's. Very Happy

tom tobias wrote:
THE LEAST of our problems with typical software, is that it is too slow because of misusing MOV, when one ought to be using instead, XOR!!!!


For once you are right. XOR is the least of our problems. There are many others:

- using 8-bit or 16-bit register parts leading to partial stalls or LCP stalls
- using LAHF/SAHF and other legacy junk also leading to partial stalls
- changing floating point rounding mode too often
- blocking store to load and load to store forwarding
- using self-modifying code leading to machine state clears
- using plain FPU/ALU when you can use SIMD units

And in general using legacy 386 code mix on latest processors which in my opinion is a sort of sin akin to pouring diesel into a military aircraft engine.

tom tobias wrote:
For me, XOR is only significant as a symptom of the much greater problem that FASM forumers tend to create code, instead of authoring programs.


So when you "author a program" you are not "creating the code"? Shocked

Wow... it seems like someone here has more issues with Boolean algebra (particulary with Venn's diagrams) than just with XOR.

tom tobias wrote:
I am sufficiently disinterested in you, to avoid answering any question you may ask. There is someone on the forum who knows the answer to your riddles--I write for his "benefit", not yours.


That question wasn't asked by me, I just repeated it to remind you that it was asked. Frankly, I don't care if you won't answer it.
Post 06 Aug 2007, 01:07
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
Quote:
- changing floating point rounding mode too often
- using self-modifying code leading to machine state clears

how big performace hits are these?

I can imagine cases where self-modifying code could be able to get drastical improvement speed, for example something like this:
Code:
abc = read_some_settings();  // non-time-critical initialization
for (x=0; x<N; x++)   //time-critical loop
{
  do_something();
  if (abc)
    do_something1();
  else
    do_something2();
}    

of course you can use switch and have two loops, but this way becomes nasty to code and maintain with every another value you need to check inside loop.

With SMC, you can "generate" best code for loop before executing it. In some cases, i would see this as drastical speed improvement, because you can save at least few comparisons per loop iteration, and have code better fit to single memory page.
Post 06 Aug 2007, 09:22
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
I never said compilers were useless or inferior in every way, just that they are overrated, sometimes really really stupid (because they really can't think for themselves), and are usually VERY bloated AND expensive (vs. tiny FASM.EXE) and unable to support everything the processor supports (easily ... if at all). If compilers were REALLY useless, nobody would use them ... ever! But they serve a purpose ... up to a point. Just not for everything (same with assembly ... good but not for 100% of stuff, you usually don't write your .BATs and makefiles in assembly but use external tools).

Compilers have indeed improved (thankfully!), but they are not the panacea that you claim. Even chess programs are pretty much relying on tons of RAM or huge databases just to "compete" with a reasonable human. Besides, like already mentioned, you have to know how to use it (asm or HLL) or else all the optimizations (or lack of) don't mean squat.

Believe me, there are plenty of projects that are ACHING for speed and size improvements, but so far no compiler has even come close. Way too much bloat / entropy / whatever. Sorry, you won't convince me than anything (esp. Intel) will ever remove that with their "newest $600 version!!!" or anything else.

It's good for some stuff, and some people can use 'em with ease, but it's definitely not for everything or everybody.
Post 06 Aug 2007, 16:10
View user's profile Send private message Visit poster's website Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
vid wrote:
how big performace hits are these?


Rounding mode change is most noticeable performance hit on a Netburst architecture. Long time ago I wrote SUPER PI patch (northwood_pi and prescott_pi) which changes float to int conversion routine to use SSE2 or SSE3 instructions resulting in ~12% speedup. Many fans of that CPU "benchmark" hated me because of that.

Core architecture handles rounding mode change much better. Still, it is very easy to avoid it. SSE3 has FISTTP instruction to explicitly truncate float to int just like C-style cast does it. With it you can truncate without changing mode. For SIMD there is CVTTPS2DQ which does it on a vector of four single precision elements, as well as scalar version CVTTSS2SI. Double precision counterparts also exist.

vid wrote:
I can imagine cases where self-modifying code could be able to get drastical improvement speed, for example something like this:


I believe that SMC is a penalty if you write directly ahead of the current EIP. However, if you write code into another memory page marked as data, then mark it as executable and jump into it, that should be ok. After all, that is what .exe decompressors are doing, right?

About that code, if first do_something() doesn't depend on the other two and vice versa, then you can write it like this:

Code:
      abc = read_some_settings();
 for (x=0; x<N; x++) {
               do_something();
     }
      if (abc) {
             for (x=0; x<N; x++) {
                       do_something1();
            }
      } else {
          for (x=0; x<N; x++) {
                       do_something2();
            }
      }
    


You would be surprised how splitting complex loops into 2-3 simple ones can speed things up. Yet another way to accomplish what you want is to use function pointer:

Code:
       abc = read_some_settings();
 if (abc) {
             do_something_ptr = do_something1;
   } else {
          do_something_ptr = do_something2;
   }
      for (x=0; x<N; x++) {
               do_something();
             do_something_ptr();
 }
    


SMC code should be avoided if possible and it is possible in most cases.

rugxulo wrote:
I never said compilers were useless or inferior in every way, just that they are overrated, sometimes really really stupid (because they really can't think for themselves), and are usually VERY bloated AND expensive (vs. tiny FASM.EXE) and unable to support everything the processor supports (easily ... if at all).


You sound like you are repeating something you may have heard long time ago in some discussion and you don't sound convinced at all.

I suggest you to try Intel compiler, you can try it for free, and the full version doesn't cost $600 either. Try to compile some simple code and take a look at assembler listing, you may end up surprised and may even learn some neat tricks.

Compilers have come a long way and they have really improved considerably. I haven't tried FASM but if the homepage has correct info, it supports up to SSE3 and I need SSSE3 and SSE4.1 instructions too. With recent compilers you can use those instructions either via SIMD classes, built-in intrinsics or via inline assembler, or better yet let the compiler use them for you.

rugxulo wrote:
Believe me, there are plenty of projects that are ACHING for speed and size improvements, but so far no compiler has even come close.


I agree with the first part. However, problem is not with the compiler. Problem is with the developers. Many of them do not have a clue of underlying architecture. Those are hyper-productive code monkeys spewing millions of lines of poor and inefficient code. Even if they wrote in pure assembler their code would still be poor and inefficient because they don't know basics such as memory layout and access cost, cache associativity and aliasing, data type alignment, paging and TLB priming, I/O stack, API layers, etc.

To illustrate my point I will tell you about a developer in a company I worked for. That guy had an assignment to pad an image. It seems that he didn't know how to do it, and he somehow figured out that a friend of mine has written Transpose() function which does the padding as a side-effect. So he simply called Transpose() twice!!!

Mind you that was time-critical code and it was in a loop which iterated between 360-720 times. If the processing time didn't jump up considerably he would have gotten away with it. Of course, no compiler could optimize those two Transpose() calls away but no developer should do such a thing in the first place and believe me such things are being done a lot.
Post 07 Aug 2007, 04:34
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
levicki, fasm does support Supplemental SSE3 since version 1.67.10. There is not support for SSE4 yet though

[edit]Note that the preprocessor layer and interpreter layer are so powerful that you could add SSE4 ISA via macros if you want. Look at Macros to choose target CPU written by revolution (the best macro writer in the forum IMHO) as an example.[/edit]
Post 07 Aug 2007, 05:15
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.