flat assembler
Message board for the users of flat assembler.

Index > Main > Use many function parameters

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 05 May 2018, 23:06
I think I read about the Website Obesity Crisis on OS News. While I haven't re-read it, I think the point was that, since machines have gotten faster with more RAM, so websites (and software) have compensated by getting more bloated. (Seriously, I pity anyone on dialup, it sounds almost totally unusable.)

Wirth's Law wrote:

https://en.wikipedia.org/wiki/Niklaus_Wirth

In 1995, he popularized the adage now known as Wirth's law: "Software is getting slower more rapidly than hardware becomes faster." In his 1995 paper A Plea for Lean Software he attributes it to Martin Reiser.


I'm not denying optimizations in any form, but saying seven bytes is "huge" is wrong. It's a rounding error. It's rare, even in DOS, that you can do anything useful in seven bytes.

Cluster size, alignment, inlined procedures, no good smartlinker, sloppy code, support for too many options, too much help, bloated UI, too many features, too many formats, etc.

There are so many other problems beyond x86 encodings (which are somewhat compact already but could be dozens of times smaller). In fact, Forth (language) brags about much smaller code, and they're right!
Post 05 May 2018, 23:06
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 05 May 2018, 23:14
Furs wrote:

7 bytes is barely "barely", it's a huge amount of code.


But your sentence itself takes 55 bytes.

Furs wrote:

rugxulo wrote:
It always saves those few bytes. Doesn't matter the total amount of memory, it's a measurable thing, which is always saved. Razz



Yes, it's an objective savings, which is (usually) good ... except when you're rounding up to cluster size anyways. Besides, I'm not denying the usefulness of ENTER/LEAVE, but when you lose 8086 compatibility (I'll admit, rarely important) plus run tons slower (on newer cpus, at least mine), then it's "barely" worth it. Oberon-M 1.2 for DOS added optional "8086" support, but it's only 5% bigger output, which isn't much (esp. since 16-bit code is already small, although even that is somewhat suboptimal in normal use).

The point isn't that savings are bad or that you shouldn't do them. It's not even about the accidental tradeoffs and circumstances which lower the savings. It's that most people don't care. Hey, I'm sympathetic, but even I wouldn't (necessarily) waste time trying to save 1 kb on an 8 MB machine. Unless it really really mattered or I was ultra bored, I "might" not do it. Of course, in the quest for perfection, everything counts. But nowadays we're at the mercy of our compilers, libraries, APIs, OSes, network, and a billion other volatile dependencies.

But we have bigger problems in software than shaving a few bytes.
Post 05 May 2018, 23:14
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 05 May 2018, 23:16
Things That Turbo Pascal is Smaller Than (specifically, TP3 for DOS)
Post 05 May 2018, 23:16
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 05 May 2018, 23:42
Well you could always automate this with a smart macro or a plugin in a compiler (I've actually done one that encodes with enter in such cases, need to expand it a bit in other areas though).

Then it's "worth" it since it's fully automatic behind your back. A "write once, enjoy forever" kind of thing is always worth after it's written... Wink
Post 05 May 2018, 23:42
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 07 May 2018, 17:43
I also forgot that ENTER/LEAVE is buggy/broken in some non-mainstream emulators (Fake86, 8086tinyplus).

This reminds me of the dumb CMOV optimization that GCC always uses for -march=i686 or higher. Linus talked about its flaws once. Basically, only use it for non-performance critical code or when needing ultra smallest size.

But things like this lose cpu compatibility (rarely important these days but still worth noting), plus the smaller code makes it harder to binary patch (more or less requiring a full rebuild), if needed. Several DJGPP-built .EXEs were accidentally compiled with CMOV, and that accident hurts on older computers (or even DOSBox, which is 486/586 only). Rebuilding things from scratch is rarely as easy at it should be.

I guess Linux doesn't care because they probably have CMOV emulation in the kernel (can't remember, don't have any good links to that). It just seems very lazy to require an actual 686 just for dumb ol' CMOV.

It's just annoying. Sure, we all want to save a few bytes, but it rarely matters, and there's too many tradeoffs. Better to keep it simple. I know lowballing with "386 only" is a bit ridiculous in this day and age, so I don't actively suggest that. But at some point the constant craving to keep up with new instructions (ahem, AVX) is a waste of time unless you really need the extra boost (or just bored/curious, of course). I'm still somewhat sympathetic, but you have to be very careful.
Post 07 May 2018, 17:43
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 07 May 2018, 17:56
Okay, so a simple example (compiled by Oberon-M 1.2): 8086 output is 10011 bytes while 186 output is 9663 bytes. So that's only 348 bytes size difference (which is almost nothing when UPX'd, only 90 bytes). And yet the 186 code is roughly 50% slower.

A quick disassembly shows this:

Quote:

1 enter 0x0,0x3
1 enter 0x10e,0x2
1 enter 0x18,0x1
1 enter 0x4,0x1
1 enter 0x8,0x1
1 enter 0xc,0x1
2 enter 0x4,0x2
3 enter 0x0,0x2
8 enter 0x2,0x1
33 enter 0x0,0x1


So rarely does it use more than one nesting level, and surprisingly (well, to me) no nesting level of 0.

So, even without the "8086" compiler option, I could just manually unnest procedures and thus avoid the penalty. That's what I've done for two other compilers that use ENTER/LEAVE. In fairness, nested procedures are nice, but with proper units/modules, they're less useful.

(Of course, there are various other causes of slowdown, too. Usually overall slowdown has little to do with specific instructions. Prefer better algorithms, simpler code, less code, not recalculating values, smarter loops, buffering, short-circuiting, careful heap use, appropriately-sized types, etc.)
Post 07 May 2018, 17:56
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 07 May 2018, 20:04
Well Linus is both right and wrong in respect to cmov. Even a predictable branch will waste some of the predictor tables. And also, if the branch is not predictable, it's like the branch "cmov" is stalling the CPU. I do hate that cmov doesn't have an "immediate" form tho. Seriously.

And when you don't actually need a "mov", branches or "other tricks" are usually smaller than cmov (which needs some preprocessing to put the value in a register).
Post 07 May 2018, 20:04
View user's profile Send private message Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 125
Melissa 08 May 2018, 00:22
Considering that cmov was created to avoid branch, I guess that future proof programs can use it.
If it is not good now it will be in the future. Same for `gather` instruction. ON Haswell it is not at all faster then serioes of movs, but on future processors who knows?
Post 08 May 2018, 00:22
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20302
Location: In your JS exploiting you and your system
revolution 08 May 2018, 01:18
All the talk about the speed of cmov is CPU and code dependant. Just because it was slower in a synthetic test on a CPU in 2007 doesn't mean it is still the same 11 years later on real code on a newer CPU. And it depends upon where you use it also. There is no absolute last word on speed. If one person experiences a slowdown in their code on their CPU, that may or may not mean you will also experience a slowdown in your code on your CPU. You might even see an improvement. Don't assume, test it.
Post 08 May 2018, 01:18
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 09 May 2018, 06:22
revolution wrote:
Just because it was slower in a synthetic test on a CPU in 2007 doesn't mean it is still the same 11 years later on real code on a newer CPU.


My point was that CMOV isn't worth much. We assume it's "better" by default because it's "new", which isn't true. The savings are minimal, at best. We need to be more pessimistic about "new" extensions. Make it optional!

revolution wrote:
Don't assume, test it.


Test that newer stuff is even worth it before making 686-only binaries. That's my complaint, that so many binaries gained nothing with it, yet the incompatibility for older cpus was still there. Too much emphasis put on targeting "new" when "old" still works fine (or even better! faster!).

Of course, for non-lazy developers, they can use CPUID or make separate binaries for each target. But some people don't do that (ugh).
Post 09 May 2018, 06:22
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 09 May 2018, 07:15
Makes me wonder if CMOV is supported by default by ALL "mainstream" 64-bit CPUs. I think it is, but not sure. Should be.
Post 09 May 2018, 07:15
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 09 May 2018, 08:02
AFAIK, CMOV is on (almost??) all PPro/P6 or newer cpus since 1995! Check CPUID first (duh). Razz

FYI, if this tells you anything: Differences between AMD64 and Intel 64
Post 09 May 2018, 08:02
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 09 May 2018, 08:38
Interesting link, rugx.
Still gives me that similar POPCNT kind of paranoia if u know what that means.
Post 09 May 2018, 08:38
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.