flat assembler
Message board for the users of flat assembler.

Index > Heap > Intel C/C++ compiler discussion

Goto page Previous  1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
macoln



Joined: 17 Sep 2007
Posts: 6
macoln
Is there a table I could look at that gives some relative indicator of opcode speeds?
Post 19 Oct 2007, 06:16
View user's profile Send private message Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
macoln wrote:
Is there a table I could look at that gives some relative indicator of opcode speeds?


There is, for Intel it is in document order #248966 - Intel 64 and IA-32 Architectures Optimization Reference Manual in Apendix C.

You can also look at Agner's manuals at http://www.agner.org/optimize/
Post 19 Oct 2007, 13:42
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
levicki wrote:
Second, instead of casting only when it is needed you cast all that time? Whatever way you look at it that doesn't count as an optimization in my book.
Casting is not a speed-hit. The compiler does it all the time when you don't cast explicitly. It's just more typing that's all (like comments), but the final .exe is the same whether you cast manually or not.

however sometimes the compiler is too 'stupid' to know that you need a different pointer type (amount of memory read).

Actually it isn't stupid, but the compiler is pretty much brain-dead. I agree that "brains" built the compiler. However, those "Brains" built the compiler for a generic case.

What I mean by brain-dead is that the compiler knows next to nothing about your algorithms, it only processes some code. It optimizes those code based on 'general' methods, but your specific purpose might require something else, you might need to modify the algorithm to make it better, etc.

Note that this doesn't necessarily concern assembly language, but relying on your compiler for optimization is IMHO a dumb assumption. The best optimizer is between your ears (again, this doesn't necessarily mean assembly).

The advantages of assembly, at least for me, are that it allows me to think differently about the code; I think much more low-level, and I usually skip abstract code (that is at the mercy of the compiler). I write C code and assembly and I have to admit, compilers don't magically optimize algorithms. Most code I first write in C and then, if I need, in asm (and I usually need for my desires Razz). Compilers only optimize code. They are dumb.

Another fact where assembly is useful for me is in my perfectionist desire (you know). I understand it's a waste for 80% of you out there, and I also am not dumb enough to assume that my code will ever get perfect (in fact there are always better ways of doing it, so I'm not really going as far as "crunching" my code by 1 byte if it requires a week to do so). However, when something terribly simple, like the calling convention, is ignored, I somehow feel frustrated at my output executable.

please note that I was not talking about straightforward "asm" tricks (which the compiler is good at), but about a design approach and how you perceive it with asm.

levicki wrote:
It would be a total chaos without conventions. By being annoyed by them you are again showing how disorganized you actually are.
Well depends on how you view chaos and how lazy you are to organize this chaos with comments.

For example, each function can have some comments as to what convention it uses. Obviously any good programmer will read the function's definition to correctly use it and it's parameters, even if they use conventions, they would still read it anyway to ensure that it works how they want and possibly what parameters it needs.

Secondly, this can be done by the linker auto (or semi-auto), so there's no excuse for those programmers that can't even read the definition of a function to not want it Razz

levicki wrote:
Why don't you try to do that by hand for some larger project and let us in on the hard facts? Like giving us the performance numbers before and after such "optimization"?
Did I say I was interested in "noticeable high-performance commercially code" or something like that?

A compiler who ignores this possible optimization (because the "brains" that built it seem to think it's a waste of time, sheesh) is still stupid and dumb in my book. Stupidness is not classified only by the noticeable optimizations. Wink

levicki wrote:
You know, we have a saying here in Serbia:
"Selo gori, baba se čeĊĦlja".
"Village is burning while the old woman is combing her hair".

It means you are devoting your attention to unimportant things.
I wouldn't necessarily call that unimportant. I understand normal people are not interested in perfectionism, but somehow I think it became an art of my to write code that way.

anyway I wouldn't like to be the next Microsoft with bloated OS-es either Wink

levicki wrote:
There is an optimizer which processes assembler output generated by compiler, it was mentioned earlier in this thread.
I seem to have skipped that part, could you please put a link to it Smile

levicki wrote:
The same brain writes the compiler. Actually, not one brain, dozens of brains. You seem to be holding yourself in high regard if you believe you are smarter than all of them together. High self-esteem is nice up to a certain point where it becomes pure arrogance and when it starts to inflate your own value by underestimating other's.
See above about my definition of brain-dead compilers.

Besides, about the fish stuff... obviously assembly is not useful if you are a newbie, I thought this was clear. But trust me, for me, it allows me to think from a different perspective and optimize my algorithms for the better. (not to mention that 'some' things can only be done in asm; yeah i understand small things, but I'm the perfectionist you know? Smile ).
Post 24 Nov 2007, 22:19
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2913
Location: [RSP+8*5]
bitRAKE
Coding is definately an art form for me, too. Often I find myself trying to code an algorithm in a particular fashion - maybe without using the stack, or only relative instructions, or using only one register.

Conventions are nice - like try to code all routines that use the stack to use only multiple of 32/64 bytes (including the return address). That way the stack is always aligned from program start. (Well, at least in my thread. (c: )

I coded a UTF-8 query function which confirms a byte stream conforms to the standard - guarantee a compiler wouldn't have generated the code I came up with. Not that it is particularly fast as I was trying to keep the code size down without being very slow. But the point is that all the choices were mine to use the resources of the machine (space and time) in the way I desire.

After reading the docs on SSE4.2 I got to wondering about a parser engine that used the new instructions, or a regex compiler. Should be great fun trying find ways to use these new instructions.
Post 25 Nov 2007, 06:55
View user's profile Send private message Visit poster's website Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
@The_Gray_Beast:

Tsk, tsk, tsk... this is not a proper way to discuss things.

You conveniently skipped the first part where I asked you:
Quote:
First, what is the point of being able to change the size?


And you completely ignored this part too:
Quote:
In my opinion it only shows that you haven't thought things out carefully in the beginning if you need to change sizes half-way through the development process.


Let me show you how serious discussion should look like:

The_Grey_Beast wrote:
Casting is not a speed-hit.


I never said it is -- I said that being able to cast or change types is not a code optimization.

The_Grey_Beast wrote:
The compiler does it all the time when you don't cast explicitly.


No it doesn't if it doesn't have to -- if your types are thought out properly before you start coding there is very little need for casting.

Example:
Code:

void *malloc(int size);

char v, *p;

p = (char *)malloc(1024); // you need a cast here

v = p[0]; // and mostly nowhere else if the types are the same
    


The_Grey_Beast wrote:
It's just more typing that's all (like comments), but the final .exe is the same whether you cast manually or not.


Whatever. Point is that you can change types in C/C++ too:
Code:

void *malloc(int size);

typedef char my_type, *my_type_ptr;
//typedef unsigned short my_type, *my_type_ptr;

#define my_type_size sizeof(my_type)

void somefunc(void)
{
        my_type_ptr     p;
  my_type         v;

      p = (my_type_ptr)malloc(1024 * my_type_size);
       v = p[0];
}
    


It is as simple as commenting one line of code and uncommenting the other.

The_Grey_Beast wrote:
however sometimes the compiler is too 'stupid' to know that you need a different pointer type (amount of memory read).


Give me an example.

The_Grey_Beast wrote:
What I mean by brain-dead is that the compiler knows next to nothing about your algorithms, it only processes some code. It optimizes those code based on 'general' methods, but your specific purpose might require something else, you might need to modify the algorithm to make it better, etc.


Underlined part is an assumption on your part. Real optimizing compilers (such as Intel C/C++ compiler) "know" a lot of common algorithms and optimize them just fine. Besides, it is almost always easier to use the compiler to generate the code and then fine-tune it by hand if needed than to write it from scratch.

The_Grey_Beast wrote:
but relying on your compiler for optimization is IMHO a dumb assumption.


I would dare to say equally dumb as your assumption that compilers do not analyze your algorithm.

They do, they analyze data flow, dependencies, constant propagation, they do complex loop transformations you probably wouldn't think of, they "know" new instructions before you do (SSE4.1 is already supported and used automatically in Intel C/C++ Compiler even though CPUs are not yet in stores), etc.

While I agree that the assembler is irreplaceable for some tasks from the very beginning of this discussion, you (and some others here) have to at least try to open your minds and let some fresh air in. We are not in the 80s anymore.

The_Grey_Beast wrote:
The advantages of assembly, at least for me, are that it allows me to think differently about the code


And how does C prevent you from thinking the same way? You either have that sort of thinking on all the time or you don't have it at all.

The_Grey_Beast wrote:
However, when something terribly simple, like the calling convention, is ignored, I somehow feel frustrated at my output executable.


It is not ignored. Good compiler optimizes across calls. As the matter of fact, compiler writers care about generated code performance but they need feedback because they simply can't test every corner case. I recently filed a complaint about compiler needlessly preserving unused register across inline ASM block and it will be fixed in the next release.

The_Grey_Beast wrote:
I understand normal people are not interested in perfectionism, but somehow I think it became an art of my to write code that way.


You see, it is not about perfectionism or art -- it is all about job requirements.

For example a friend of mine and I had to write a function which calculates tube section when given a set of points by the user -- it is used to slice mandibular canal for 3D viewing purposes. By nature it is computationaly intensive function (Hermite interpolation, some trigonometry, flood fill, etc).

Needless to say, our employers needed interactive frame-rate so at least 25FPS was a job requirement. It means that somewhat less than 40ms time was needed per single function call in order to accomplish required frame rate.

Should I say that we have managed 4.32ms on E6300 (dual-core) and 2.88ms on a Q6600 (quad-core)? Now comes the funny part... in C++ without a single line written in assembler!

So everyone please stop with that "black magic" and "pure art" crap which you are trying to associate with assembler -- it just doesn't work that way.

The_Grey_Beast wrote:
I seem to have skipped that part, could you please put a link to it Smile


http://www.dalsoft.com/

bitRAKE wrote:
I coded a UTF-8 query function which confirms a byte stream conforms to the standard - guarantee a compiler wouldn't have generated the code I came up with. Not that it is particularly fast as I was trying to keep the code size down without being very slow. But the point is that all the choices were mine to use the resources of the machine (space and time) in the way I desire.


Have you actually tried writing it in C/C++ and then comparing the resulting code with your own or you just "know" a compiler wouldn't have genrated the code you came up with?

/sarcasm on
I am sure you were thinking along the lines of "Woohoo! I am great! I am smart! Pure genius! Nobody else on the planet could write it like this! Just look at this acme code!" when you finished it.
/sarcasm off

bitRAKE wrote:
After reading the docs on SSE4.2 I got to wondering about a parser engine that used the new instructions, or a regex compiler. Should be great fun trying find ways to use these new instructions.


By the time you figure it out compilers will already use them throughout the code.
Post 21 Dec 2007, 17:10
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2913
Location: [RSP+8*5]
bitRAKE
levicki wrote:
Have you actually tried writing it in C/C++ and then comparing the resulting code with your own or you just "know" a compiler wouldn't have genrated the code you came up with?

/sarcasm on
I am sure you were thinking along the lines of "Woohoo! I am great! I am smart! Pure genius! Nobody else on the planet could write it like this! Just look at this acme code!" when you finished it.
/sarcasm off

bitRAKE wrote:
After reading the docs on SSE4.2 I got to wondering about a parser engine that used the new instructions, or a regex compiler. Should be great fun trying find ways to use these new instructions.


By the time you figure it out compilers will already use them throughout the code.
I write software in the langauge that brings me the most enjoyment - given this kind of agressive discourse maybe you should find another area of study.

A debugger is my favorite place to be with code: it is a quick way to get another perspective on my own, and gives me the comfort of seeing all other code on my terms. So, yes - I have seen much compiled code - my own and others (since ~1983). Yes, I know a compiler would not have devised the code I have.
Post 21 Dec 2007, 17:41
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
i don't see any link between XOR eax,eax optimisation and C,C++ coding...

don't try to sell high level language on FASM forum, it's endless, if we code in asm, it's first because WE like/love it.

there is a topic named HIGH level LANGUAGES for that. Wink
Post 21 Dec 2007, 19:31
View user's profile Send private message Visit poster's website Reply with quote
MichaelH



Joined: 03 May 2005
Posts: 402
MichaelH
levicki wrote:

So everyone please stop with that "black magic" and "pure art" crap which you are trying to associate with assembler -- it just doesn't work that way.



My assembler code is "pure art" .... pure art at the level of a three year old. My mum pins it on the wall and calls me brilliant so it must be really good Smile
Post 21 Dec 2007, 22:57
View user's profile Send private message Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
bitRAKE wrote:
Yes, I know a compiler would not have devised the code I have.


You still haven't answered any of my questions:

1. Have you tried writing that UTF-8 query function in high-level language first and compiling it with any decent compiler?

2. Have you done any benchmarks with your code and compared it to the compiler code?

3. Have you considered the time it took you to write the assembler code .vs. speed benefit (if any)?

If the answers are "no" then you simply cannot know what modern compilers would generate. You can only assume and that is what you obviously prefer to do. No point in arguing further unless you want to give some code examples of you beating the compiler.

edfed wrote:
i don't see any link between XOR eax,eax optimisation and C,C++ coding...

don't try to sell high level language on FASM forum, it's endless, if we code in asm, it's first because WE like/love it.

there is a topic named HIGH level LANGUAGES for that. Wink


I am sorry for the offtopic.

I am not selling anything -- as I already said I use assembler often it just happens that the people who are evangelizing it go on my nerves.

@MichaelH:

That's a good one Smile

If someone wants to continue this discussion please start a new thread in the proper topic and post a link here.
Post 22 Dec 2007, 13:44
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
since this is off-topic, I'm only briefly mention some things.

levicki wrote:
You conveniently skipped the first part where I asked you:
Quote:
First, what is the point of being able to change the size?
Let's say you need to read some bytes from a file (let's say compressed format), and then suddenly a bunch of 4-bytes to improve optimizations (if 4-bytes at-a-time are needed). Obviously you could still read 1-byte at a time, but it would be slower. Actually the compiler won't be able to do this without explicitly telling it because this 'optimization' is only possible during run-time (i,e it CAN'T know before what the memory WILL contain when in run-time). And note that not always will the memory be in such a 'special case', so running the app once to 'see' the memory will only produce bugs (or incorrect assumptions).

Yet, you as a programmer, know what kind of data it is, so you can fine-tune the code. And note that I didn't even talk about assembly.

levicki wrote:
And you completely ignored this part too:
Quote:
In my opinion it only shows that you haven't thought things out carefully in the beginning if you need to change sizes half-way through the development process.
You have to understand real computer code-flow. It's not just simply "all data at the beginning, then the algo" process. Jumps can be spaghetti, can re-use the data for a different purpose. Yet, spagetti code is usually indirect jumps. What's the problem? Easy, the compiler CANNOT know how the code will perform run-time UNLESS you tell him in a way. No matter how many times it analyzes the code in run-time (profile-guided optimization?) it CANNOT make a fact that the code runs ONLY in that way, because maybe he didn't test it enough for ALL cases, so it SHOULDN'T make such an assumption by design.

If you think compilers think magically and know the algorithms then perhaps you should let it write the code for you, which I don't see happening. Sure you can put how many 'knowledge' in the compiler you want (e.g general algorithms, like Z-buffers, etc..) but that won't make it any more 'creative'. You need creativity to get good code (whether you use asm or not).

I have a lot of experience with compilers and their listings when it comes to generation (not necessarily optimization). It is not THEIR creators's fault that the compilers are dumb. Being 'smart' doesn't mean to simply look up all algorithms you know in a table/book, it means to be creative. That is, to make new algorithms. As far as I'm concerned, even one SINGLE byte, or one SINGLE instruction, or anything, makes up a NEW algorithm (that is, it doesn't have to be on the large scale).

The compiler is brain-dead because all it does is this:

Code:
-> Analyze code
-> See if the code matches any algorithms in it's database (i.e the lookup things in the book thingy above)
-> If so, then the programmer probably wanted this algorithm, so optimize it according to the database
-> Otherwise, try to find some tricks in the database
-> None available? Then don't optimize code
-> If some tricks were found, apply them (this includes instruction re-orderings, etc).
-> repeat    


This pretty much qualifies it as brain-dead. Making 'good' code, like I said, means more than just simply "look-up some things in a table" (i.e algorithm optimizations, etc). It means to be creative. It means you MAKE algorithms for YOUR code, not use EXISTING algorithms (or even if you do, you MODIFY them for YOUR code). The compiler knows only about EXISTING algorithms, this is where it errs.

The compiler doesn't "think", and THAT is the part that makes it brain-dead. He doesn't "understand" what a piece of code should do, it only processes it and, let's say, 'translates' it using HIS DICTIONARY (of generic algorithms). generic algorithms are bad as far as GOOD SUPER-optimized code is concerned. And optimized means either size or speed, not only the latter btw!

levicki wrote:
Example:
Code:

void *malloc(int size);

char v, *p;

p = (char *)malloc(1024); // you need a cast here

v = p[0]; // and mostly nowhere else if the types are the same
    
Look at the beginning of this post. I admit I don't have any examples right now with me though Sad

levicki wrote:
Whatever. Point is that you can change types in C/C++ too:
Code:

void *malloc(int size);

typedef char my_type, *my_type_ptr;
//typedef unsigned short my_type, *my_type_ptr;

#define my_type_size sizeof(my_type)

void somefunc(void)
{
  my_type_ptr     p;
  my_type         v;

      p = (my_type_ptr)malloc(1024 * my_type_size);
       v = p[0];
}
    


It is as simple as commenting one line of code and uncommenting the other.
Well I use the same as the above code but with 'define' instead of 'typedef' to get more low-level.

levicki wrote:
The_Grey_Beast wrote:
however sometimes the compiler is too 'stupid' to know that you need a different pointer type (amount of memory read).


Give me an example.
You read from a supposedly byte-stream, and need individual bytes (because it's compressed data), but suddenly you, as a clever guy, figure out you can do it in SOME CASES (the algo is complex) with 4-bytes at a time instead, and how can you do it?

Secondly, what if you want your pointers to have a specific size, known to you, instead of being abstract? Declaring them as ints or longs or shorts makes it possible, while declaring them as '*' (i.e pointers) will only abstract their internal size. In case you do this in some packed formats (not necessarily files, can be RAM as well), this is very important, or offsets will be generated badly!

levicki wrote:
Underlined part is an assumption on your part. Real optimizing compilers (such as Intel C/C++ compiler) "know" a lot of common algorithms and optimize them just fine.
Read the bold word. "common" algorithms are a bad thing, as explained above, and just knowing stuff by heart doesn't qualify anyone (or any 'thing') as smart or creative, which is what is required for GOOD code. If you think otherwise, then let the compiler write the code for you, and I mean it, without any intervention.

levicki wrote:
Besides, it is almost always easier to use the compiler to generate the code and then fine-tune it by hand if needed than to write it from scratch.
I mostly agree here, and this is what I usually do anyway. But I'm re-writing the entire compiler-output a bit, to make it more friendly to the user (with FASM macros, etc.). This way I'm learning a lot and also watching my code from many viewpoints. Of course I'm doing this after I am done with the algorithm.

But sometimes re-thinking the problem in asm can make you observe things from a different scale, and enable you to consider a different approach to the solution. If you think in C, you might simply use some brute-force solution to the problem.

levicki wrote:
While I agree that the assembler is irreplaceable for some tasks from the very beginning of this discussion, you (and some others here) have to at least try to open your minds and let some fresh air in. We are not in the 80s anymore.
Actually to make a long story short, let's say I wasn't even in the 80s, or even 90s, programming in anything at all. When it comes to compilers, and any new ones included (like I said, not those from 80s) I am quite experienced with it, and in fact it is from this experience that made me realize they are not self-thinking.

Fact is, compilers are not self-thinking, and they rely solely on databases. This is bad as far as good code is concerned, because every possible program is made out of a SINGLE, UNIQUE component, not out of many as people like to think (i.e break the problem into multiple, simpler ones). If you simply used a 'common' algorithm, unchanged, then you might as well have copied the code from somewhere else.

If they really were self-thinking, then we would have AIs, and that meant they could understand the algortihm rather than simply parse it through a database. This meant two compilers might come at a different output, with their opinions on the matter, because each one thought how the algorithm coulde be improved. This would mean, hands down, that they were equally capable as humans (if they thought at the same rate, or even higher), but even a less-capable being (let's say humans) in such a case could come up with a solution that the self-thinking AI didn't even think of, but thereafter realized it (it's called inspiration). On the other hand, brain-dead compilers don't think and simply parse the code, see if it's in the database, and use that.

Generic code is bad. Problem is, this is what compilers optimize Sad

levicki wrote:
And how does C prevent you from thinking the same way? You either have that sort of thinking on all the time or you don't have it at all.
I never said C prevents it, I said knowing assembly is a big + to your thinking.

When I write C code, I always translate it into assembly (with my mind) as I write it, instantly, it became a habit out of experience. This doesn't mean I instantly write it down in assembly, but in my mind I already have the outcome of the 'goal' code.

levicki wrote:
You see, it is not about perfectionism or art -- it is all about job requirements.
Probably I didn't use the correct word. It's more about quality vs quantity. When people care only about what they think it's "practical", the individual byte, clock cycle, or atom, or watt of energy, or anything else for that matter, is lost in the process. Do *most* people care that John xyz died yesterday? No, but they do care about larger things, like 100 people dying (and even then, very few care). Problem is, if there were 100 Johns, and they died, but not all at the same time, people would care less than in the 100 people who died instantly (let's say, in a plane crash), which also appeared in the news, etc...

Do many rich people care about the individual poor? (note: individual, not group). Do they care about ants? They probably crush under their foot much more than they think (and this also applies to me Razz). If, let's say, ants were to go extinct, that would be big news, and perhaps we would do something about it. But a single individual ant is forgotten; but if you think about it, it had a really complex life on it's own.

With so many processing power, job requirements and memory, the individual xyz is forgotten, but "pure quality" includes even the "insignificant" as well. Wink


I dunno I think someone said this before on this forum, a long time ago:

Cars have so much horsepower compared to 50s that we should use square wheels, they'll run just fine Very Happy

levicki wrote:
For example a friend of mine and I had to write a function which calculates tube section when given a set of points by the user -- it is used to slice mandibular canal for 3D viewing purposes. By nature it is computationaly intensive function (Hermite interpolation, some trigonometry, flood fill, etc).

Needless to say, our employers needed interactive frame-rate so at least 25FPS was a job requirement. It means that somewhat less than 40ms time was needed per single function call in order to accomplish required frame rate.

Should I say that we have managed 4.32ms on E6300 (dual-core) and 2.88ms on a Q6600 (quad-core)? Now comes the funny part... in C++ without a single line written in assembler!

So everyone please stop with that "black magic" and "pure art" crap which you are trying to associate with assembler -- it just doesn't work that way.
What does this prove anyway? That C/C++ is useful for business or quantity-oriented commercial? This has been proven a lot until now, in fact it is known that C and C++ are probably the best commercial oriented languages (i.e useful for that purpose).

levicki wrote:
http://www.dalsoft.com/
Thanks, looks cool, I'll try it Very Happy

levicki wrote:
bitRAKE wrote:
After reading the docs on SSE4.2 I got to wondering about a parser engine that used the new instructions, or a regex compiler. Should be great fun trying find ways to use these new instructions.


By the time you figure it out compilers will already use them throughout the code.
Will they? He was talking about "designing" the code to work with those instructions, not just replacing it. Like I said, this requires creative process, whether you work in asm or not (i.e in C you need to use a more parallelizable algorithm too!).

I know this was a pretty long post, but trust me, it's brief compared to what I would otherwise (not that I have time to waste on this more anyway). Wink
Post 22 Dec 2007, 16:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17271
Location: In your JS exploiting you and your system
revolution
The_Grey_Beast wrote:
since this is off-topic, I'm only briefly ... <snip lots and lots>
Wow, that was brief? I would be impressed to see thorough and dismayed to see verbose.
Post 22 Dec 2007, 17:08
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
sorry, i said that it would take alot more if it weren't (at the end)

next time I'll prolly use a different word
Post 22 Dec 2007, 17:29
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2913
Location: [RSP+8*5]
bitRAKE
levicki wrote:
If the answers are "no" then you simply cannot know what modern compilers would generate. You can only assume and that is what you obviously prefer to do. No point in arguing further unless you want to give some code examples of you beating the compiler.
I can know because I know what my requirements are - I work for me and not someone else. You wouldn't get any satisfaction from my examples unless you understood my requires. You don't understand my requirements because you continually talk around them.

Blue is my favorite color. Compete with that.
Post 22 Dec 2007, 17:45
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
About splitting, do you agree starting the split from rugxulo's post and adding a link to point to the Intel C/C++ Compiler discussion?

I not follow this thread much so perhaps some posts that directly relates to XOR reg, reg will be sacrified by the proposed split.

Any ideas are welcomed.
Post 22 Dec 2007, 17:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17271
Location: In your JS exploiting you and your system
revolution
LocoDelAssembly wrote:
About splitting, do you agree starting the split from rugxulo's post and adding a link to point to the Intel C/C++ Compiler discussion?
Yes, I think you found the proper spot.

levicki wrote:
I strongly urge someone in power to sticky and close this thread because further discussion is pointless.
I think it might be useful to lock it also (but not to sticky it). People are just wasting time going over the same things without moving forward.
Post 22 Dec 2007, 21:08
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Done, about locking the thread if someone wants it then request it to another moderator, I would like this thread to evolve a little more.

It would be great if both parts start to back claims with code in a challenging fashion. For example the assembly team provide code and the HLL team then proofs that the compiler can replicate the code or even make it better.

There is a part hard to verify however, and this one is what happens with huge projects. For this maybe we would need to pick an open source project that is know to be optimized C/C++ code and spend some time with IDApro accounting how many misoptimizations the compiler makes. This perhaps will not prove if the compiler beats the brain or not but will gives us an estimation of how far from perfection is and also verify if the compiler does not makes misoptimizations so big that can prove that the brain still beats compiler on aspects of even huge projects.
Post 22 Dec 2007, 22:56
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17271
Location: In your JS exploiting you and your system
revolution
Actually I meant to lock the XOR EAX,EAX topic. This split thread still some life in it yet, but limited I fear.
Post 22 Dec 2007, 23:10
View user's profile Send private message Visit poster's website Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
The_Grey_Beast wrote:
Let's say you need to read some bytes from a file (let's say compressed format), and then suddenly a bunch of 4-bytes to improve optimizations (if 4-bytes at-a-time are needed).


First, WTF?!?

I mean, it turns out that reading 4 bytes instead of one is an optimization even though all those bytes (and a lot of bytes which follow them -- hint: read-ahead) are already cached either indirectly in some system buffer or explicitly by reading more than you need at the moment and parsing the data from memory?!?

Second, if I need to change the amount of bytes read in realtime than the file should contain the "size" of data somewhere in the header and I will read it and then use that as a read size. If it doesn't... well, it is a bad design and we are getting back to my first argument -- you should think about such important things before you start coding.

Third, what all this has to do with the ability to change types while compiling?!?

The_Grey_Beast wrote:
(i,e it CAN'T know before what the memory WILL contain when in run-time)


Are you trying to say that assembler somehow handles that automatically?!?

The_Grey_Beast wrote:
What's the problem? Easy, the compiler CANNOT know how the code will perform run-time UNLESS you tell him in a way.


Last time I checked asm code generated by ICC there were those neat labels and comments next to them saying something like:

Code:
        je        $B1$28        ; Prob 10%                      ;85.20
        ...
        cmp       edx, eax                                      ;92.14
        jne       $B1$25        ; Prob 62%                      ;92.14
    


Prob stands for probability of the execution of the said jump.

The_Grey_Beast wrote:
No matter how many times it analyzes the code in run-time (profile-guided optimization?) it CANNOT make a fact that the code runs ONLY in that way, because maybe he didn't test it enough for ALL cases, so it SHOULDN'T make such an assumption by design.


What are you talking about?!? Profile-guided optimization is useless if you do not profile the program using real data! Even then it can be useless if your algorithm sucks so much that no amount of profiling can fix it.

The_Grey_Beast wrote:
If you think compilers think magically and know the algorithms then perhaps you should let it write the code for you, which I don't see happening.


Let us be honest here -- today programmers are mostly reusing the same code snippets for data comrpession, for FFT/DCT/Wavelet, for image and sound processing, etc. Very small amount of new code is being written. Over time compilers have matured, and are now generating better code for those common cases than an average assembler programmer.

Moreover, majority of optimized code for those common cases is sort of "finished", i.e. there is very little (if any) room for improvement apart from inventing completely new algorithm which I don't see happening. Wink

The_Grey_Beast wrote:
As far as I'm concerned, even one SINGLE byte, or one SINGLE instruction, or anything, makes up a NEW algorithm (that is, it doesn't have to be on the large scale).


I am sorry to disappoint you, but you are not allowed to have your own definition of "new algorithm".

New algorithm is when you do something fundamentaly different -- for example if you use Chudnovsky method to calculate PI instead of Gauss-Legendre. Oh, and... to be able to say that you created a new algorithm, you would also have to be one of the Chudnovsky brothers. Reversing operands and replacing addition with subtraction to shave off a nanosecond doesn't count in real world, even compilers are capable of that nowadays.

The_Grey_Beast wrote:
The compiler is brain-dead because all it does is this:


Whoa!!! Have you actually ever considered how complex real optimizing compilers are? What you have wrote could also be applied to humans, this is how programmers think, they are stupid too:

Code:
1. Find a relevan code snippet on Google using +fastest keyword
2. No code? Find an algorithm explanation on Wikipedia
3. No algorithm? If you are smart figure it out
4. Not smart? Use brute force and tell your boss to get faster computers
    


The_Grey_Beast wrote:
This pretty much qualifies it as brain-dead. Making 'good' code, like I said, means more than just simply "look-up some things in a table" (i.e algorithm optimizations, etc).


Yet again you are assuming that compiler does not analyze the code in a meaningfull way. Is it perhaps the time to read some books?
http://www.gamedev.net/columns/books/bookdetails.asp?productid=257

The_Grey_Beast wrote:
It means to be creative.


Will you stop saying "creative"? It really goes on my nerves.

The_Grey_Beast wrote:
It means you MAKE algorithms for YOUR code


Okay. You asked for it... MAKE new algorithm for FFT and inverse FFT calculation without using any existing code or algorithm. MAKE it to be at least 50% FASTER than any of those existing algorithms.

Just so you know what you are going to have to compete with, I will benchmark your algorithm against Intel Integrated Performance Primitives Library, FFTW library, CUFFT library which runs on a GPU (I have 8800GTX mind you) and a piece of code written by Takuya Ooura which we further optimized using SIMD and threading. If you don't thread your code you will lose because I am going to test it on a quad-core CPU.

The_Grey_Beast wrote:
Well I use the same as the above code but with 'define' instead of 'typedef' to get more low-level.


typedef lets the compiler do the type checking, #define does not.
const keyword allows it to enforce read only constant propagation throughout the code, defining constants using #define does not.

The_Grey_Beast wrote:
You read from a supposedly byte-stream, and need individual bytes (because it's compressed data), but suddenly you, as a clever guy, figure out you can do it in SOME CASES (the algo is complex) with 4-bytes at a time instead, and how can you do it?


As long as it is "SOME CASES", and not "MOST CASES", it is completely irrelevant if you do it or not. Difference between reading 4 bytes instead of one probably can't be even measured, not to mention that memory bus transfers 8 bytes at a time and that data is read into L1 and L2 caches 32 or 64 or 128 bytes at a time. Whether you pull one byte out of the cache line simply doesn't matter and if you read past the line boundary in one operation you are incurring a penalty (cache line split).

The_Grey_Beast wrote:
Secondly, what if you want your pointers to have a specific size, known to you, instead of being abstract?


Then you are working against the compiler and thus against yourself. See bellow.

The_Grey_Beast wrote:
Declaring them as ints or longs or shorts makes it possible, while declaring them as '*' (i.e pointers) will only abstract their internal size.


It will abstract it for you, so you don't have to worry about the size and make it known to compiler so it can do what it needs to do. I see nothing wrong there. If you need to change pointer size for some purpose (happend to me too) you just remember to cast it through the char and everything will be fine. I admit it is a bit cumbersome to type but it happens so rarely that it doesn't really matter at least for me.

The_Grey_Beast wrote:
Read the bold word. "common" algorithms are a bad thing, as explained above


I disagree.

Moreover, do you think that you can do better scheduling than the compiler? Have you actually verified that you have gained performance by benchmarking before and after your changes? Have you checked whether the code still does the same thing numerically?

Those things are often overlooked by assembler programmers. For example, some instruction sequence may look like it could have been shorter and when you rewrite it, you end up with slower code.

The_Grey_Beast wrote:
If you think in C, you might simply use some brute-force solution to the problem.


Well I once did an optimization in assembler by spliting a loop manually into three loops. It worked a lot faster that way. Then I wrote those three loops in C and compiler generated a bit faster code than the one I wrote.

Then I fine-tuned my code by learning from the compiler generated code and managed to make it faster again but the point is I wasted a ton of time writing it and fine-tuning it -- I did it because I love to learn, but I could have as well do the same thing with loops in C, and get the same performance level without writing a single line in assembler. That would leave me with more time for other important things in life like seeing a girlfriend, walking a dog, playing guitar, taking photos, wasting time on Internet forums Wink, etc.

The_Grey_Beast wrote:
Fact is, compilers are not self-thinking


I never said they are but if you write your code following certain guidelines on writing good code, compiler will produce excellent output in 99% of the cases. Many lousy C/C++ programmers complain about compilers not being able to optimize some junk they wrote without realizing that they made the code impossible to optimize, not the compiler.

Banal example of this is declaring and initializing an array of constant values in a function without using static keyword. Then they end up asking "why my code pushes those values on the stack each time function is executed?"

The_Grey_Beast wrote:
because every possible program is made out of a SINGLE, UNIQUE component, not out of many as people like to think (i.e break the problem into multiple, simpler ones).


I disagree here too. It has always been "divide and conquer".

The_Grey_Beast wrote:
What does this prove anyway? That C/C++ is useful for business or quantity-oriented commercial?


It proves that you can get required performance level using high-level language.

The_Grey_Beast wrote:
Will they? He was talking about "designing" the code to work with those instructions, not just replacing it. Like I said, this requires creative process, whether you work in asm or not (i.e in C you need to use a more parallelizable algorithm too!).


What's there to "design"?!? Do you design your code to use CPUID? Or RDTSC? How about NOP?

Engineers who created those instructions did the design part, you will just use them in the only way possible -- like the manual says. What you will use them for is another story. If you find some new use for them, good for you, don't forget to dash all the way to USPTO.

I really hate it when people are trying to mistify their doings in order to look more interesting and powerfull in the eyes of those who know nothing or next to nothing about the subject.
Post 23 Dec 2007, 06:07
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
Hey impressive talk! Smile
But I am feeling that I need to know a lot about processor, architecture, instructions etc. even if I just wanted to write in C++ and want a fully optimized code.
So is it not possible if I completely ignore the processor specific stuff and still want an optimized C++ code?
Post 31 Dec 2007, 09:23
View user's profile Send private message Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
0.1 wrote:
Hey impressive talk! Smile
But I am feeling that I need to know a lot about processor, architecture, instructions etc. even if I just wanted to write in C++ and want a fully optimized code.
So is it not possible if I completely ignore the processor specific stuff and still want an optimized C++ code?


Your feeling is right, you need to have at least basic knowledge of how computer (particulary memory and CPU) work.

You don't need to know full instruction set but rough idea what your high-level code turns into won't hurt.

The first step is algorithmic optimization, most of the time just picking better algorithm for the task at hand gives results that are acceptable performance-wise.

However, there are situations where you have to do more than that and in those situations you need to know more about underlying platform.

Applications generally fall into two main categories -- memory intensive and computationaly intensive.

To improve speed of memory intensive applications you usually do one of the following:

- Reduce the size of dataset

If the numeric range is known in advance and there is no chance of overflow you can for example store data in memory as short instead of int or float instead of double. This effectively doubles the bandwidth at your disposal if you manage to process double amount of data by using SIMD.

- Improve memory access pattern

Many programmers with Fortran background make an error in C/C++ by reversing multidimensional array indices. For example they write:

Code:
for (int z = 0; z < zs; z++) {
    for (int y = 0; y < ys; y++) {
              for (int x = 0; x < xs; x++) {
                      a[x][y][z] = value; // instead of a[z][y][x]
                }
      }
}
    


Because of that error, memory is accessed using large stride and CPU prefetchers cannot mask slow main memory access because they cannot accomodate large strides -- they work best for linear access patterns or small strides.

- Improve data locality

This can be done in several ways, reducing dataset size is one of them. Others are packing data which will be processed together in a structure so instead of having to prefetch two streams you have only one. Proper data alignment is crucial for that. Finally, data can be blocked -- processed in smaller chunks which fit into 1/4 to 1/2 of the L2 cache.

- Use non-temporal streaming

You may have noticed that so far you could get away with general knowledge. This is where it gets a bit complicated.

Streaming means using CPU write buffers to write the data out directly to memory bypassing cacha hierarchy thus reducing cache pollution. Usually it is done by copying small amount of data (say between 1/8 and 1/2 of L1 cache one cache line (32, 64 or 128 bytes) at a time using prefechnta as a hint inside of a copy loop.

Then you can process the copied data and write it into the destination using non-temporal store (MOVNTPS/MOVNTDQ). Before using the data you wrote using temporal stores you have to execute SFENCE instruction to preserve memory coherency.

All the above can still be done from C/C++, there are intrinsics for those instructions. With time, those things become part of your programming style, you write code with all that in mind and you generally get satisfactory results.

For computationally intensive applications all of the above applies, but to get further improvements you need to go low-level. After you have found the best algorithm and performed all known numeric optimizations taking care not to change the result of computation, you profile your application using real data. How you do that is your choice, you can use time based sampling, non-intrusive event based sampling (VTune, CodeAnalyst), or your own code that profiles on a function level -- the result of profiling should be that you have found a hotspot.

After you find a hotspot, you check the compiler generated code. Only at this point you will need the full knowledge of assembler and the instruction set of the target CPU. You will also need to be familiar with various code transformation techniques modern compilers use in order to be able to figure out what the code is doing and how it is doing it.

Of course, you can ignore that code completely and try to write your own from scratch, but perhaps the compiler already wrote the code which is similar to yours and you will need to do better than that in order to get any improvements. That is why you should check it out -- so you don't waste your time. After you find a way to improve the code you profile the application again and repeat manual optimization step until you get the acceptable level of performance.

Sometimes while doing this final manual optimization step you realize that you need to redesign things on a higher level in order to gain any performance -- for example to pack the data differently in some other part of code, or to move some part of processing so it happens earlier (or later) in the chain. It is inevitable part of the process because nobody is perfect and even writing the whole thing in assembler won't save you from that.

Hope this answers your question.
Post 07 Jan 2008, 01:30
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.