flat assembler
Message board for the users of flat assembler.
Index
> Main > Code optimization (AKA C vs. Asm) Goto page 1, 2, 3, 4, 5 Next |
Author |
|
Borsuc 11 May 2009, 17:50
Tried it. I almost ALWAYS examine the code output, and some C compilers don't. I admit, I haven't tested any "new" ones, but as far as I know, the Intel Compiler for instance won't really help here, maybe just in vector processing, am I right?
It's not actually the fault of the compiler, but of the language. It does many unnecessary things sometimes, and it has a lack of proper tricks. You will never see special "hacks" or jumps to functions if it could serve better than a dumb old standardized call that only helps possible reverse-engineers find out its purpose easier (not to mention, slowing it down) Don't get me wrong, I code in C a lot but I also examine the output Needless to say it's one of THE best high-level languages in this regard, all others are even worse. So I can't really blame C at all, in fact I love it. (I hate OOP for instance) Simply put, I do my share of disassembly, and C disassembly is quite predictable in some cases. Small optimizations maybe, but I count them too _________________ Previously known as The_Grey_Beast |
|||
11 May 2009, 17:50 |
|
r22 11 May 2009, 18:03
http://board.flatassembler.net/topic.php?t=4467
Whatever MS used to compile Win XP64 failed to optimize the RtlInitUnicodeString function which is used repeatedly by most kernel functions. The majority of the time 'yes' an optimizing compiler can produce faster running code, but there are always corner cases (i.e. SIMD, heavily nested loops, specialized string operations). |
|||
11 May 2009, 18:03 |
|
Madis731 11 May 2009, 19:54
Compilers are made by human beings. If you are better at optimizing than the average person (who worked on the compiler's code), then you achieve better performance.
Actually the key is even deeper - you need to know what 'ticks' the compiler so you can push its right buttons. I've learned a lot with Intel C Compiler and sometimes it surprises Intel itself (on their support forum) what tricks it is (or isn't) able to resolve. Here: http://www.devmaster.net/forums/showthread.php?t=1884&page=5 is a flaming discussion about a fundamental part of 3D-engine. In this example I could not find any more places to optimize it. The solution was to first write it in ASM (another possibility is compiler's /S switch) then find the critical parts, optimize and finally make the C-compiler digest the code the way I want. 1) The compiler doesn't suggest you to use SSE4.1, you need to tell it! 2) The compiler reads hints but can't predict everything, so you need to pet it a little and assure that nothing bad will happen if you "#pragma vectorize" a little more 3) The compiler doesn't know how to properly vectorize sequences like: subtract <1,2,3,4> from xmm1 and then <5,6,7,8> from xmm2 etc. therefore I needed some helper tables. In the assembly you see the elegance of predefined dqword xmm_0_1_2_3 and use of shuffle+shift to get the next (xmm_4_4_4_4) variable. The compiler always uses the safest way and that is why you don't see some of the tricks and hacks. What it does very good is IMUL to shifts and adds; IDIV to shifts, adds and multiplies; etc. Sometimes compilers inline smaller chunks of code or parts of code used only once. I've seen the push ESP / mov EBP,ESP code sequence being removed on 32-bit and now the 64-bit is already clever by design (i.e. 4 parameters are 'freed' of stack). Intel Compiler will vectorize many tedious loads, like one of the simple methods discussed... but I still remain true to assembly [rant] I just can't help it. C is really fast for prototyping (if you know it), but sometimes the pointers are just magic (yes, that's what I call it) and I'm better off with assembly where [mem] is memory reference and mem is label. In C there are all those castings and what-not *mem, &mem, mem[x], mem+x, (*)&mem, ... and they never work as I intuitively hope them to work. & is not label and * is not [label], but sometimes they are [/rant] |
|||
11 May 2009, 19:54 |
|
revolution 11 May 2009, 19:55
The output of any HLL (optimised or not) sucks. The only reason an HLL compiler could generate better code is if the programmer sucks even worse at asm. But then again, if one has a requirement for really high performance code then one should not be using an HLL in the first place.
It is really really hard to get an HLL compiler to create good code. If the programmer uses a very verbose programming style then the compiler would need an inordinate amount of intelligence to extract the intended algorithm to make the desired code. If the programmer is aware of the common cases that modern compilers have been taught to recognise then the programmer can help to improve the code produced by always trying to code in a way the compiler is expecting. Thus constraining the programmer into certain styles to "help" the compiler get it right. |
|||
11 May 2009, 19:55 |
|
Borsuc 11 May 2009, 21:55
I also noticed on some compilers without using the 'global optimizations mode' in the linker, that making your functions "static" will make them optimize inter-calls better, avoiding the stupid standardization in trivial and small functions which may constitute a lot of overhead and some wasted bytes. Not to mention it simply feels wrong to be there when you know you could've avoided it.
for example: Code: static void Sort(); you'll however only use this in the current object. You can "include" other .cpp files but the downfall is you'll have to recompile the whole thing everytime. Helps optimization, but sucks at prototyping large projects. Thankfully I haven't got a large project yet, just moderate at most, so I could afford to make one single big .obj file (multiple .cpps though). (yes that's not a typo, I do include .cpps not .hs and compile only the "parent" big .cpp which includes them, not the others, which would be pointless) And I'm not kidding that I can distinguish if an app has been compiled with an average C compiler without hacks like above (with static and all that). _________________ Previously known as The_Grey_Beast |
|||
11 May 2009, 21:55 |
|
Madis731 12 May 2009, 06:49
Revolution is correct that you need to know your compiler that you're writing to. Only then can you hope for optimized code. Otherwise its like playing hide-and-seek with the compiler.
Actually you started your post with these tags: µops scheduling, U/V pairing, common code path coalescing. I will comment them in order because I haven't seen these answered yet. µops - if you read enough Anger Fog's manuals, then you have a nice birds-eye view of the whole process, but what I've seen is that even Intel Compiler can't predict with a 100% accuracy how every one of their CPUs will schedule the instructions with their OOO unit. For me its compiler output vs. my own assembly and a lot of trial and error. U/V pairing - oh the times of Pentium!!! . Back then the simple rules of two execution units were so easy for people to understand. http://www.azillionmonkeys.com/qed/tech.shtml many of this guy's sources are from that time. Nowadays there are 6 units called: 0,1,5,2,3,4. 0,1,5 for ALU, 2-mem read, 3-write address calc, 4-mem write. Not everything goes to ports 0,1,5 so you need to learn them by heart or shuffle between Agner's docs and back to your code. common code path coalescing - I don't know what that is so I better leave that to the compiler |
|||
12 May 2009, 06:49 |
|
baldr 12 May 2009, 08:08
revolution,
Pray tell me that you always write functions without ebp-based stack frame. How long (in instructions) can you hold a value in a register before it's use? Compiler can do it indefinitely long. Even heavily commented assembly source won't help much when you need to modify it. Especially if you didn't write it yourself. The purpose of symbolic programming languages is to make programs human-readable, isn't it? Madis731, "Common code path coalescing", by that I mean that most probable code sequence is streamlined. I forgot to add etc… to the end of the list of examples, sorry. You're right about µops. Hand-crafted code is hard to either optimize or modify. That's the problem with low-level programming: 99% of time you don't have to specify exactly how to do, you just need to say what to do. |
|||
12 May 2009, 08:08 |
|
revolution 12 May 2009, 08:08
Madis731 wrote: common code path coalescing - I don't know what that is so I better leave that to the compiler |
|||
12 May 2009, 08:08 |
|
revolution 12 May 2009, 08:13
baldr wrote: revolution, baldr wrote: How long (in instructions) can you hold a value in a register before it's use? Compiler can do it indefinitely long. |
|||
12 May 2009, 08:13 |
|
asmcoder 12 May 2009, 21:01
[content deleted]
Last edited by asmcoder on 14 Aug 2009, 14:51; edited 1 time in total |
|||
12 May 2009, 21:01 |
|
Borsuc 13 May 2009, 00:14
asmcoder wrote: c suck. it alwas will, it should be eliminated ASAP. If you want to eliminate something ASAP because of high-levelness, then start with other, much more bloated languages, like C#, Java, Pascal, and most OOPs (even C++ before C). If you want to eliminate all that's far from assembly, C will be one of the LAST that would get eliminated, because it's closer to asm than any other HLL I'm aware of. _________________ Previously known as The_Grey_Beast |
|||
13 May 2009, 00:14 |
|
Tomasz Grysztar 13 May 2009, 11:57
Borsuc wrote: If you want to eliminate all that's far from assembly, C will be one of the LAST that would get eliminated, because it's closer to asm than any other HLL I'm aware of. Actually, having both the extreme sides - like assembly and Java, is what I feel to be the most useful in my own experience. The "middle ground" is the thing I consider not so crucial. |
|||
13 May 2009, 11:57 |
|
asmcoder 13 May 2009, 13:29
[content deleted]
Last edited by asmcoder on 14 Aug 2009, 14:51; edited 1 time in total |
|||
13 May 2009, 13:29 |
|
TmX 13 May 2009, 18:09
asmcoder wrote: this must be changed. how ? by rewriting UNIX in 100% assembly? can hardly imagine that... |
|||
13 May 2009, 18:09 |
|
f0dder 17 May 2009, 14:59
baldr wrote: Don't you think that good C compiler could arrange instructions better than you (regarding µops scheduling, U/V pairing, common code path coalescing)? A) outperform hand-written code. B) reach the performance of handwritten code. But this would require pretty much full knowledge of the CPU instructions, and would be an NP-complete problem... so it's not really doable, imho. And sort of not necessary, either, as long as you have the option of linking with assembly code for the speed critical parts. The languages I use generate quite adequate code for what I use them for. Faster would of course always be better, but not super necessary. _________________ - carpe noctem |
|||
17 May 2009, 14:59 |
|
Madis731 17 May 2009, 17:35
Lets make a very trivial example:
1) You tell the compiler to sum integers from 1 to 1000000 in a loop. Your code will probably equal that of compilers: both precalculate (1+1000000)/2*1000000 = 500000500000 2) You tell the same to the compiler, but you replace 1000000 with user-input. You know that the user only inputs 1000000 and never anything else, but the compiler doesn't know. Compilers can do whatever optimizations, but it will always remain O(N). You can make it O(1) and beat it. This is of course never the case, but there are problems like converting floating point to fixed because you know you won't need that precision, replacing 64-bit registers with 32- or even 16-bit ones because you don't need more values etc. |
|||
17 May 2009, 17:35 |
|
bitRAKE 17 May 2009, 21:26
Whole program optimization is difficult for an x86 coder to beat - reusing cached memory - overlapping multiple code/data paths.
|
|||
17 May 2009, 21:26 |
|
kalambong 17 May 2009, 23:47
baldr wrote: Don't you think that good C compiler could arrange instructions better than you (regarding µops scheduling, U/V pairing, common code path coalescing)? Scenery generators (some call Terrain generators) such as Terragen, for example, generates the scene through billions and billions of calculations, pixel by pixel. Would such program benefit from C alone, or C with ASM? |
|||
17 May 2009, 23:47 |
|
Borsuc 18 May 2009, 00:58
Madis731: yep that and a lot of other "I know" scenarios, some can even be used in C a lot in algorithms, but more possibilities in asm.
@bitrake: I thought "Whole Program Optimization" was a linker's job, not compiler? _________________ Previously known as The_Grey_Beast |
|||
18 May 2009, 00:58 |
|
Goto page 1, 2, 3, 4, 5 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.