flat assembler
Message board for the users of flat assembler.

Index > Main > Code optimization (AKA C vs. Asm)

Goto page 1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 11 May 2009, 17:27
Don't you think that good C compiler could arrange instructions better than you (regarding µops scheduling, U/V pairing, common code path coalescing)?

Please be constructive, no need for another flame war.

_________________
"Don't belong. Never join. Think for yourself. Peace." – Victor Stone.
Post 11 May 2009, 17:27
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 11 May 2009, 17:50
Tried it. I almost ALWAYS examine the code output, and some C compilers don't. I admit, I haven't tested any "new" ones, but as far as I know, the Intel Compiler for instance won't really help here, maybe just in vector processing, am I right?

It's not actually the fault of the compiler, but of the language. It does many unnecessary things sometimes, and it has a lack of proper tricks. You will never see special "hacks" or jumps to functions if it could serve better than a dumb old standardized call that only helps possible reverse-engineers find out its purpose easier (not to mention, slowing it down)

Don't get me wrong, I code in C a lot but I also examine the output Wink Needless to say it's one of THE best high-level languages in this regard, all others are even worse. So I can't really blame C at all, in fact I love it. (I hate OOP for instance)

Simply put, I do my share of disassembly, and C disassembly is quite predictable in some cases. Small optimizations maybe, but I count them too Cool

_________________
Previously known as The_Grey_Beast
Post 11 May 2009, 17:50
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 11 May 2009, 18:03
http://board.flatassembler.net/topic.php?t=4467
Whatever MS used to compile Win XP64 failed to optimize the RtlInitUnicodeString function which is used repeatedly by most kernel functions.

The majority of the time 'yes' an optimizing compiler can produce faster running code, but there are always corner cases (i.e. SIMD, heavily nested loops, specialized string operations).
Post 11 May 2009, 18:03
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 11 May 2009, 19:54
Compilers are made by human beings. If you are better at optimizing than the average person (who worked on the compiler's code), then you achieve better performance.
Actually the key is even deeper - you need to know what 'ticks' the compiler so you can push its right buttons. I've learned a lot with Intel C Compiler and sometimes it surprises Intel itself (on their support forum) what tricks it is (or isn't) able to resolve.

Here: http://www.devmaster.net/forums/showthread.php?t=1884&page=5
is a flaming discussion about a fundamental part of 3D-engine. In this example I could not find any more places to optimize it.

The solution was to first write it in ASM (another possibility is compiler's /S switch) then find the critical parts, optimize and finally make the C-compiler digest the code the way I want.

1) The compiler doesn't suggest you to use SSE4.1, you need to tell it!
2) The compiler reads hints but can't predict everything, so you need to pet it a little and assure that nothing bad will happen if you "#pragma vectorize" a little more Wink
3) The compiler doesn't know how to properly vectorize sequences like: subtract <1,2,3,4> from xmm1 and then <5,6,7,8> from xmm2 etc. therefore I needed some helper tables. In the assembly you see the elegance of predefined dqword xmm_0_1_2_3 and use of shuffle+shift to get the next (xmm_4_4_4_4) variable.

The compiler always uses the safest way and that is why you don't see some of the tricks and hacks. What it does very good is IMUL to shifts and adds; IDIV to shifts, adds and multiplies; etc.

Sometimes compilers inline smaller chunks of code or parts of code used only once. I've seen the push ESP / mov EBP,ESP code sequence being removed on 32-bit and now the 64-bit is already clever by design (i.e. 4 parameters are 'freed' of stack).

Intel Compiler will vectorize many tedious loads, like one of the simple methods discussed... but I still remain true to assembly Wink

[rant]
I just can't help it. C is really fast for prototyping (if you know it), but sometimes the pointers are just magic (yes, that's what I call it) and I'm better off with assembly where [mem] is memory reference and mem is label. In C there are all those castings and what-not *mem, &mem, mem[x], mem+x, (*)&mem, ... and they never work as I intuitively hope them to work.
& is not label and * is not [label], but sometimes they are Smile
[/rant]
Post 11 May 2009, 19:54
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20448
Location: In your JS exploiting you and your system
revolution 11 May 2009, 19:55
The output of any HLL (optimised or not) sucks. The only reason an HLL compiler could generate better code is if the programmer sucks even worse at asm. But then again, if one has a requirement for really high performance code then one should not be using an HLL in the first place.

It is really really hard to get an HLL compiler to create good code. If the programmer uses a very verbose programming style then the compiler would need an inordinate amount of intelligence to extract the intended algorithm to make the desired code. If the programmer is aware of the common cases that modern compilers have been taught to recognise then the programmer can help to improve the code produced by always trying to code in a way the compiler is expecting. Thus constraining the programmer into certain styles to "help" the compiler get it right.
Post 11 May 2009, 19:55
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 11 May 2009, 21:55
I also noticed on some compilers without using the 'global optimizations mode' in the linker, that making your functions "static" will make them optimize inter-calls better, avoiding the stupid standardization in trivial and small functions which may constitute a lot of overhead and some wasted bytes. Not to mention it simply feels wrong to be there when you know you could've avoided it.

for example:
Code:
static void Sort();    

you'll however only use this in the current object. You can "include" other .cpp files but the downfall is you'll have to recompile the whole thing everytime. Helps optimization, but sucks at prototyping large projects. Thankfully I haven't got a large project yet, just moderate at most, so I could afford to make one single big .obj file (multiple .cpps though).

(yes that's not a typo, I do include .cpps not .hs Razz and compile only the "parent" big .cpp which includes them, not the others, which would be pointless)

And I'm not kidding that I can distinguish if an app has been compiled with an average C compiler without hacks like above (with static and all that).

_________________
Previously known as The_Grey_Beast
Post 11 May 2009, 21:55
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 12 May 2009, 06:49
Revolution is correct that you need to know your compiler that you're writing to. Only then can you hope for optimized code. Otherwise its like playing hide-and-seek with the compiler.

Actually you started your post with these tags: µops scheduling, U/V pairing, common code path coalescing. I will comment them in order because I haven't seen these answered yet.

µops - if you read enough Anger Fog's manuals, then you have a nice birds-eye view of the whole process, but what I've seen is that even Intel Compiler can't predict with a 100% accuracy how every one of their CPUs will schedule the instructions with their OOO unit. For me its compiler output vs. my own assembly and a lot of trial and error.

U/V pairing - oh the times of Pentium!!! Smile. Back then the simple rules of two execution units were so easy for people to understand. http://www.azillionmonkeys.com/qed/tech.shtml many of this guy's sources are from that time. Nowadays there are 6 units called: 0,1,5,2,3,4. 0,1,5 for ALU, 2-mem read, 3-write address calc, 4-mem write. Not everything goes to ports 0,1,5 so you need to learn them by heart or shuffle between Agner's docs and back to your code.

common code path coalescing - I don't know what that is Smile so I better leave that to the compiler Wink
Post 12 May 2009, 06:49
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 12 May 2009, 08:08
revolution,

Pray tell me that you always write functions without ebp-based stack frame. Wink

How long (in instructions) can you hold a value in a register before it's use? Compiler can do it indefinitely long.

Even heavily commented assembly source won't help much when you need to modify it. Especially if you didn't write it yourself.

The purpose of symbolic programming languages is to make programs human-readable, isn't it?


Madis731,

"Common code path coalescing", by that I mean that most probable code sequence is streamlined. I forgot to add etc… to the end of the list of examples, sorry.

You're right about µops.

Hand-crafted code is hard to either optimize or modify. That's the problem with low-level programming: 99% of time you don't have to specify exactly how to do, you just need to say what to do.
Post 12 May 2009, 08:08
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20448
Location: In your JS exploiting you and your system
revolution 12 May 2009, 08:08
Madis731 wrote:
common code path coalescing - I don't know what that is Smile so I better leave that to the compiler Wink
Just from the name I would imagine that it is saying the expected (most probable) execution paths are all placed together. One would imagine that branches out of that path are expected to be the less used cases.
Post 12 May 2009, 08:08
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20448
Location: In your JS exploiting you and your system
revolution 12 May 2009, 08:13
baldr wrote:
revolution,

Pray tell me that you always write functions without ebp-based stack frame. Wink
Indeed, I modified the standard proc macros to support esp based frames. I even posted them somewhere on this board.
baldr wrote:
How long (in instructions) can you hold a value in a register before it's use? Compiler can do it indefinitely long.
I also sometimes hold the flags for as long as needed as well as register values. Not a difficult thing to do really. And remember that only a small portion of code needs to be written tightly, the rest can just hang there with whatever inefficiencies there are since, as tom tobias likes to say, it only takes a micro second to run so where is the utility of modifying it for speed.
Post 12 May 2009, 08:13
View user's profile Send private message Visit poster's website Reply with quote
asmcoder



Joined: 02 Jun 2008
Posts: 784
asmcoder 12 May 2009, 21:01
[content deleted]


Last edited by asmcoder on 14 Aug 2009, 14:51; edited 1 time in total
Post 12 May 2009, 21:01
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 13 May 2009, 00:14
asmcoder wrote:
c suck. it alwas will, it should be eliminated ASAP.
wow, why are you saying that? C is one of the most low-level high-level serious language ever. I understand some people might not like HLLs, but saying C should be eliminated is dumb.

If you want to eliminate something ASAP because of high-levelness, then start with other, much more bloated languages, like C#, Java, Pascal, and most OOPs (even C++ before C).

If you want to eliminate all that's far from assembly, C will be one of the LAST that would get eliminated, because it's closer to asm than any other HLL I'm aware of.

_________________
Previously known as The_Grey_Beast
Post 13 May 2009, 00:14
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 13 May 2009, 11:57
Borsuc wrote:
If you want to eliminate all that's far from assembly, C will be one of the LAST that would get eliminated, because it's closer to asm than any other HLL I'm aware of.

Actually, having both the extreme sides - like assembly and Java, is what I feel to be the most useful in my own experience. The "middle ground" is the thing I consider not so crucial.
Post 13 May 2009, 11:57
View user's profile Send private message Visit poster's website Reply with quote
asmcoder



Joined: 02 Jun 2008
Posts: 784
asmcoder 13 May 2009, 13:29
[content deleted]


Last edited by asmcoder on 14 Aug 2009, 14:51; edited 1 time in total
Post 13 May 2009, 13:29
View user's profile Send private message Reply with quote
TmX



Joined: 02 Mar 2006
Posts: 843
Location: Jakarta, Indonesia
TmX 13 May 2009, 18:09
asmcoder wrote:
this must be changed.


how ?
by rewriting UNIX in 100% assembly?

can hardly imagine that...
Post 13 May 2009, 18:09
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 17 May 2009, 14:59
baldr wrote:
Don't you think that good C compiler could arrange instructions better than you (regarding µops scheduling, U/V pairing, common code path coalescing)?
Theoretically, you could construct a compiler that would always either
A) outperform hand-written code.
B) reach the performance of handwritten code.

But this would require pretty much full knowledge of the CPU instructions, and would be an NP-complete problem... so it's not really doable, imho. And sort of not necessary, either, as long as you have the option of linking with assembly code for the speed critical parts.

The languages I use generate quite adequate code for what I use them for. Faster would of course always be better, but not super necessary.

_________________
Image - carpe noctem
Post 17 May 2009, 14:59
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 17 May 2009, 17:35
Lets make a very trivial example:
1) You tell the compiler to sum integers from 1 to 1000000 in a loop. Your code will probably equal that of compilers: both precalculate (1+1000000)/2*1000000 = 500000500000

2) You tell the same to the compiler, but you replace 1000000 with user-input.
You know that the user only inputs 1000000 and never anything else, but the compiler doesn't know. Compilers can do whatever optimizations, but it will always remain O(N). You can make it O(1) and beat it.

This is of course never the case, but there are problems like converting floating point to fixed because you know you won't need that precision, replacing 64-bit registers with 32- or even 16-bit ones because you don't need more values etc.
Post 17 May 2009, 17:35
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 17 May 2009, 21:26
Whole program optimization is difficult for an x86 coder to beat - reusing cached memory - overlapping multiple code/data paths.
Post 17 May 2009, 21:26
View user's profile Send private message Visit poster's website Reply with quote
kalambong



Joined: 08 Nov 2008
Posts: 165
kalambong 17 May 2009, 23:47
baldr wrote:
Don't you think that good C compiler could arrange instructions better than you (regarding µops scheduling, U/V pairing, common code path coalescing)?

Please be constructive, no need for another flame war.
Hmm... how about a real life example?

Scenery generators (some call Terrain generators) such as Terragen, for example, generates the scene through billions and billions of calculations, pixel by pixel.

Would such program benefit from C alone, or C with ASM?
Post 17 May 2009, 23:47
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 18 May 2009, 00:58
Madis731: yep that and a lot of other "I know" scenarios, some can even be used in C a lot in algorithms, but more possibilities in asm. Smile

@bitrake: I thought "Whole Program Optimization" was a linker's job, not compiler? Confused

_________________
Previously known as The_Grey_Beast
Post 18 May 2009, 00:58
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.