flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2, 3 |
Author |
|
revolution 12 Aug 2017, 05:51
Furs wrote: I don't agree with revolution's testing for one reason: I believe in good coding practice more than tests, unless you really want to target one particular CPU. For example, AMD and Intel have vastly different CPUs. Testing on Intel doesn't mean it runs well on AMD, and vice-versa. Even Intel have different CPUs if they have vastly different microarchitectures. |
|||
![]() |
|
Furs 12 Aug 2017, 16:11
system error wrote: Vivik is asking for a simple demo code. Well, since you've already committed 10,000 strong expert words in this thread, he's getting the impression that you're an expert; exactly the effect that you wanted people to believe of you xD HLL code is not directly representable in asm -- it may be in some cases (because it's simple and optimal situation is easy to see) but not always. A naive look would think a goto, for instance, will always result in a jump (or conditional jump) but that is obviously not the case. Like I said a goto can even inline its destination (yes, GCC can duplicate basic blocks if it thinks it's worth it) with no branch at all. Or it can inline the destination in one part of the goto and have the "fall through" case actually branch to that part -- if it determines that the basic block from which the goto is issued is more "hot" than the fall through path. You simply can't translate HLL code from a compiler to asm directly. I know you always babble about me trying to sound smart, but I am a contributor to GCC (even though it's a huge project so I don't know how most of it works, nobody probably does; I only have experience in memory aliasing optimization in GIMPLE, walking virtual SSA defs/uses, and some basic RTL optimizations based on patterns) -- whether you believe that or not, I couldn't care less. revolution wrote: I think you missed the point. Tests are primarily to discover if one of wasting time worrying about things that make no discernible difference. And secondly as a way to optimise those parts that the first tests show is making a difference. I have surprised myself with thinking that some particular part of my code would be the hot section only to discover with testing that some other portion was the real bottleneck. I could've wasted a lot of time trying to optimise the wrong part. One might argue that optimising every part is best but there are only so many hours in each day so we have to prioritise. You can't know what the hot path is (unless the function is large) since you don't know how people would necessarily call it. An allocator designed to be fast doesn't mean it shouldn't be designed with absolute speed just because in your specific use case it makes no difference. For someone who actually does use it in a critical loop it makes all the difference. ![]() |
|||
![]() |
|
revolution 12 Aug 2017, 19:05
Furs wrote: But what if the person in question is developing a library? Or something with no immediate use but which might be used in hot code in the future? Perhaps a few examples: Align procedure calls and loop entries to the cache line size: Okay, good, except different CPU have different cache line sizes. Plus, this makes the code larger and can push other code out of the cache causing cache thrashing. Avoid the loop instruction: On some CPUs is makes no difference, thus is pointless. Even on CPUs where loop is slow the extra alternate instructions can cause cache problems with extra size, or crossing a cache line boundary. Use branch hints: Useless on many CPUs and only serves to bloat code. And even where they have an effect it is tiny so only the most heavily (ab)used code loops (i.e. used continuously for days and days) will see any benefit. Never use div unless there is no suitable alternative: And naturally the meaning of "suitable" is ambiguous. But anyway, some modern CPUs can do DIV in a separate unit outside of the ALU so that other ALU intensive instructions can execute simultaneously, so you might be missing out on some great optimisation opportunities. |
|||
![]() |
|
Furs 12 Aug 2017, 20:26
Hah I kind of like your link, though it doesn't really apply to such scenarios.
"Saving time" usually mean you code something as a tool for yourself. But I think nobody does that in asm (?), they probably use a scripting language (not even C/C++) or something along those lines. Well, unless you need some speed that C/C++ provide. Of course, "saving time" doesn't really apply if you distribute the application and it's not just a "productivity tool". Millions (?) of people will enjoy its performance improvements if you improve it. Also, for some interactive or realtime apps, performance isn't as much about saving time as it is about the app being useable at all. e.g. nobody wants to play a game with 10 FPS instead of 60 FPS (yea, dramatic example, but you get the point), or use a DSP effect with crackles or not even able to preview it in realtime etc. (obviously talking about the case where the speed difference exists in the first place) I mean I hate bloat personally but seriously, sometimes I see all these casual apps (like web browsers) taking so damn long to load on some PCs/tablets/whatever or for whatever reason (instead of instant) just because Joe doesn't want to work more to improve the experience of everyone who uses it. I know the saying "But programmers are far less common than users so their time is more important", but we have too much software that does the same shit, so to me it's a poor excuse. In some cases, alternative software was even born to be "lightweight" compared to another, and then in the end they resort to the exact same bloat! WTF. I'd rather have one insanely optimized software than 5 bloated crap that all do basically the same thing. If all those programmers worked on just 1 instead (and give it enough settings/options to satisfy everyone). Anyway sorry, off topic rant. ![]() |
|||
![]() |
|
revolution 12 Aug 2017, 20:54
Well the point is it is not possible to optimise for everything at once. You have to choose which system you optimise for. It is possible to make many different versions of your app for each different system, but in practice no one does that, except for some very specific software where runtime is absolutely the most important measurement (aside from correctness of course). And the link to time saving still applies for distributed code. Just adjust your "time saved" values across all systems where it runs.
If your app has a really important requirement to be fast then you have to know each system you optimise for, there aren't any shortcuts here. Don't expect to use "best practices" and get good results across the board because you will be disappointed. And your rant about browsers isn't because the fail to use branch predictions and other tiny micro adjustments, it is because of entirely different reasons. |
|||
![]() |
|
system error 12 Aug 2017, 22:25
Furs wrote: He's asking demo code for __builtin_expect, but that is not possible. Sorry, i can't write code. All I wanted to do was to impress people with my "essays" and third-party quotes so I look smart and important. No code, no proof. No nothing. I am a Circus Monkey. My job is to entertain and to impress You don't have to punish yourself like that, broh! xD |
|||
![]() |
|
vivik 30 Aug 2017, 09:27
Here is an example of likely/unlikely in action, seems to mostly rearrange if/else around. Furs said that gcc may also place all often called functions closer together, it's harder to write an example of that. Glad there are people that actually read gcc docs.
Compiled with this: g++.exe -S -masm=intel -O3 -fverbose-asm -Wall -std=c++0x -nostdlib -ffreestanding -mconsole -fno-stack-check -fno-stack-protector -mno-stack-arg-probe -fno-inline-functions -fno-exceptions -fno-asynchronous-unwind-tables -c ltalloc.cc Don't try to actually run this code, by the way. It's only half of what it should be. I used winmerge to see the difference, because I have no idea how to get diff on windows overwise. I think I managed to get diff in emacs once, but I forgot how.
|
|||||||||||||||||||||||||||||||
![]() |
|
vivik 30 Aug 2017, 09:28
>Attachment cannot be added, since the max. number of 3 Attachments in this post was achieved
|
|||||||||||
![]() |
|
vivik 03 Sep 2017, 08:30
Would be awesome if gcc generated a warning if likely/unlikely in code contradicted the actual benchmarking results. Wonder if gcc is smart enough for that.
|
|||
![]() |
|
revolution 03 Sep 2017, 08:40
vivik wrote: Would be awesome if gcc generated a warning if likely/unlikely in code contradicted the actual benchmarking results. Wonder if gcc is smart enough for that. |
|||
![]() |
|
vivik 03 Sep 2017, 09:01
@revolution
Profiling, not benchmarking, used the wrong word. But yeah, gcc can, optionally. It actually recommends to use it instead of setting likely/unlikely manually. CFLAGS_PROFILE=-g -pg -ggdb -fprofile-arcs CFLAGS_RELEASE_PROFILE=-fbranch-probabilities I would like to still have a direct control other this, but still have gcc around to show me when I made a mistake. Because to make a good profiling hints, I need to execute 100% of program before recompiling it. It looks troublesome, it requires to write special coverage tests, which will be artificial by nature and wouldn't reflect the real usage of program. |
|||
![]() |
|
Furs 03 Sep 2017, 11:55
revolution wrote: Does GCC do some internal benchmarking of procedures? How would it know the typical input patterns? |
|||
![]() |
|
vivik 05 Sep 2017, 09:26
@Furs
Hm, profiling should count every branching (every if and for), not only function calls. I say that because I seen things like loops in profiler reports. |
|||
![]() |
|
DimonSoft 05 Sep 2017, 09:58
system error wrote: ^ Wasn’t that you in another thread with the same rude and ignorant posts? You have the opposite information—feel free to share the links to the documentation/specifications. You don’t—feel free not to write something useless. |
|||
![]() |
|
vivik 05 Sep 2017, 16:38
jesus, do it in private messages
|
|||
![]() |
|
Goto page Previous 1, 2, 3 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.