flat assembler
Message board for the users of flat assembler.
Index
> Main > Use many function parameters Goto page Previous 1, 2 |
Author |
|
rugxulo 05 May 2018, 23:06
I think I read about the Website Obesity Crisis on OS News. While I haven't re-read it, I think the point was that, since machines have gotten faster with more RAM, so websites (and software) have compensated by getting more bloated. (Seriously, I pity anyone on dialup, it sounds almost totally unusable.)
Wirth's Law wrote:
I'm not denying optimizations in any form, but saying seven bytes is "huge" is wrong. It's a rounding error. It's rare, even in DOS, that you can do anything useful in seven bytes. Cluster size, alignment, inlined procedures, no good smartlinker, sloppy code, support for too many options, too much help, bloated UI, too many features, too many formats, etc. There are so many other problems beyond x86 encodings (which are somewhat compact already but could be dozens of times smaller). In fact, Forth (language) brags about much smaller code, and they're right! |
|||
05 May 2018, 23:06 |
|
rugxulo 05 May 2018, 23:16
Things That Turbo Pascal is Smaller Than (specifically, TP3 for DOS)
|
|||
05 May 2018, 23:16 |
|
Furs 05 May 2018, 23:42
Well you could always automate this with a smart macro or a plugin in a compiler (I've actually done one that encodes with enter in such cases, need to expand it a bit in other areas though).
Then it's "worth" it since it's fully automatic behind your back. A "write once, enjoy forever" kind of thing is always worth after it's written... |
|||
05 May 2018, 23:42 |
|
rugxulo 07 May 2018, 17:43
I also forgot that ENTER/LEAVE is buggy/broken in some non-mainstream emulators (Fake86, 8086tinyplus).
This reminds me of the dumb CMOV optimization that GCC always uses for -march=i686 or higher. Linus talked about its flaws once. Basically, only use it for non-performance critical code or when needing ultra smallest size. But things like this lose cpu compatibility (rarely important these days but still worth noting), plus the smaller code makes it harder to binary patch (more or less requiring a full rebuild), if needed. Several DJGPP-built .EXEs were accidentally compiled with CMOV, and that accident hurts on older computers (or even DOSBox, which is 486/586 only). Rebuilding things from scratch is rarely as easy at it should be. I guess Linux doesn't care because they probably have CMOV emulation in the kernel (can't remember, don't have any good links to that). It just seems very lazy to require an actual 686 just for dumb ol' CMOV. It's just annoying. Sure, we all want to save a few bytes, but it rarely matters, and there's too many tradeoffs. Better to keep it simple. I know lowballing with "386 only" is a bit ridiculous in this day and age, so I don't actively suggest that. But at some point the constant craving to keep up with new instructions (ahem, AVX) is a waste of time unless you really need the extra boost (or just bored/curious, of course). I'm still somewhat sympathetic, but you have to be very careful. |
|||
07 May 2018, 17:43 |
|
rugxulo 07 May 2018, 17:56
Okay, so a simple example (compiled by Oberon-M 1.2): 8086 output is 10011 bytes while 186 output is 9663 bytes. So that's only 348 bytes size difference (which is almost nothing when UPX'd, only 90 bytes). And yet the 186 code is roughly 50% slower.
A quick disassembly shows this: Quote:
So rarely does it use more than one nesting level, and surprisingly (well, to me) no nesting level of 0. So, even without the "8086" compiler option, I could just manually unnest procedures and thus avoid the penalty. That's what I've done for two other compilers that use ENTER/LEAVE. In fairness, nested procedures are nice, but with proper units/modules, they're less useful. (Of course, there are various other causes of slowdown, too. Usually overall slowdown has little to do with specific instructions. Prefer better algorithms, simpler code, less code, not recalculating values, smarter loops, buffering, short-circuiting, careful heap use, appropriately-sized types, etc.) |
|||
07 May 2018, 17:56 |
|
Furs 07 May 2018, 20:04
Well Linus is both right and wrong in respect to cmov. Even a predictable branch will waste some of the predictor tables. And also, if the branch is not predictable, it's like the branch "cmov" is stalling the CPU. I do hate that cmov doesn't have an "immediate" form tho. Seriously.
And when you don't actually need a "mov", branches or "other tricks" are usually smaller than cmov (which needs some preprocessing to put the value in a register). |
|||
07 May 2018, 20:04 |
|
Melissa 08 May 2018, 00:22
Considering that cmov was created to avoid branch, I guess that future proof programs can use it.
If it is not good now it will be in the future. Same for `gather` instruction. ON Haswell it is not at all faster then serioes of movs, but on future processors who knows? |
|||
08 May 2018, 00:22 |
|
revolution 08 May 2018, 01:18
All the talk about the speed of cmov is CPU and code dependant. Just because it was slower in a synthetic test on a CPU in 2007 doesn't mean it is still the same 11 years later on real code on a newer CPU. And it depends upon where you use it also. There is no absolute last word on speed. If one person experiences a slowdown in their code on their CPU, that may or may not mean you will also experience a slowdown in your code on your CPU. You might even see an improvement. Don't assume, test it.
|
|||
08 May 2018, 01:18 |
|
rugxulo 09 May 2018, 06:22
revolution wrote: Just because it was slower in a synthetic test on a CPU in 2007 doesn't mean it is still the same 11 years later on real code on a newer CPU. My point was that CMOV isn't worth much. We assume it's "better" by default because it's "new", which isn't true. The savings are minimal, at best. We need to be more pessimistic about "new" extensions. Make it optional! revolution wrote: Don't assume, test it. Test that newer stuff is even worth it before making 686-only binaries. That's my complaint, that so many binaries gained nothing with it, yet the incompatibility for older cpus was still there. Too much emphasis put on targeting "new" when "old" still works fine (or even better! faster!). Of course, for non-lazy developers, they can use CPUID or make separate binaries for each target. But some people don't do that (ugh). |
|||
09 May 2018, 06:22 |
|
fasmnewbie 09 May 2018, 07:15
Makes me wonder if CMOV is supported by default by ALL "mainstream" 64-bit CPUs. I think it is, but not sure. Should be.
|
|||
09 May 2018, 07:15 |
|
rugxulo 09 May 2018, 08:02
AFAIK, CMOV is on (almost??) all PPro/P6 or newer cpus since 1995! Check CPUID first (duh).
FYI, if this tells you anything: Differences between AMD64 and Intel 64 |
|||
09 May 2018, 08:02 |
|
fasmnewbie 09 May 2018, 08:38
Interesting link, rugx.
Still gives me that similar POPCNT kind of paranoia if u know what that means. |
|||
09 May 2018, 08:38 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.