flat assembler
Message board for the users of flat assembler.

Index > Heap > HLLs suck!

Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
Author
Thread Post new topic Reply to topic
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
DOS386 wrote:
vid wrote:
Did you ever hear about optimisation by aligning loops and funcs?
Did you ever look at FASM source ? I can't find such "optimisations" inside
Perhaps because FASM isn't particularly optimized, and written for old CPUs?

DustWolf wrote:
f0dder wrote:
Would you consider your usage scenario something that standard users of Office do, though?
I would call it a relevant real-life example. I have no idea what other users of Office tend to use it for, although it is perfectly possible they don't use it for stuff it obviously doesn't do well. Such as getting real work done, for example.
My point is that a piece of software is going to be targetted for what most people use it for - and it sounds like you are a bit on the heavy end.

I'm not a big fan of the degeneration MS Office, and I'm not a particular big fan of OpenOffice either - both are bloated and slow. At least in the case of MSO you can still save in the non-xml formats which are much faster when working with big documents (OOo has partial .doc support, but it's not exactly über-stable nor feature-complete).

DustWolf wrote:
My bracketed statements are not provided solely for the entertainment of those who agree with me. DO read the manual and it's recommendation regarding -O3 optimization. It tends to break working code (then again, I read that manual a few years back... It might have gotten lost by now. Usually the recommendation not to use -O3 is included in READMEs of software sources).
In other words, GCC is (or has been) buggy, and you use that as an attack on *languages* rather than that specific compiler? And yes, I've seen notes about O3 in various places; the most recent place are the Gentoo docs:
Quote:
Compiling all your packages with -O3 will result in larger binaries that require more memory, and will significantly increase the odds of compilation failure or unexpected program behavior (including errors). The downsides outweigh the benefits; remember the principle of diminishing returns. Using -O3 is not recommended for gcc 4.x.
...so, either GCC is broken, or a lot of linux software is. My bet is it's a combination of the two Smile

DustWolf wrote:
f0dder wrote:
String-based switch statement? care to elaborate?
Yep. May no longer be relevant as I only did that particular mistake once. In MSVC's superior optimization skills, a switch statement is compiled into an array of jumps, where the index of this list corresponds to the input value (an ingenious way to break a perfectly good optimization method for switch statements, if you ask me, but then again this is Microsoft).
Interesting, which compiler version was this? None of the VC versions I've tried support switching on strings (neither char* nor std::string). And the switch optimizations I've seen have constructed Jcc binary search trees... can't remember if the compiler constructs a directly indexed jumptable in some circumstances.

Btw, Windows tends to be compiled with a special build of the compiler... and who knows which compiler settings they use.
Post 27 Dec 2009, 12:58
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17467
Location: In your JS exploiting you and your system
revolution
DustWolf wrote:
The HLL is obviously designed in such a way that enables bugs like those. Saying that a list of real-life bugs does not apply to the language because the flawless-world theory does not imply their existence is bizarre to say the least.
Using that logic then everything sucks. Because there is always someone to screw it up in some way. What about all the crappy assemblers out there that screw up and break people's code, does that mean the assembly *language* sucks? Clearly assembly is "designed in such a way that enables bugs like those".
Post 27 Dec 2009, 13:47
View user's profile Send private message Visit poster's website Reply with quote
DustWolf



Joined: 26 Jan 2006
Posts: 373
Location: Ljubljana, Slovenia
DustWolf
f0dder wrote:
Would you consider your usage scenario something that standard users of Office do, though?
I would call it a relevant real-life example. I have no idea what other users of Office tend to use it for, although it is perfectly possible they don't use it for stuff it obviously doesn't do well. Such as getting real work done, for example.[/quote]My point is that a piece of software is going to be targetted for what most people use it for - and it sounds like you are a bit on the heavy end.[/quote]

And my point is that it's a classic example of a program gone bloated due to the way in which it was developed, where the developers can afford it and the end result is unusable in a real-life situation.

Quote:
I'm not a big fan of the degeneration MS Office, and I'm not a particular big fan of OpenOffice either - both are bloated and slow. At least in the case of MSO you can still save in the non-xml formats which are much faster when working with big documents (OOo has partial .doc support, but it's not exactly über-stable nor feature-complete).


I don't think those actually work faster, because they need to be converted first.

Quote:
And yes, I've seen notes about O3 in various places; the most recent place are the Gentoo docs:
Quote:
Compiling all your packages with -O3 will result in larger binaries that require more memory, and will significantly increase the odds of compilation failure or unexpected program behavior (including errors). The downsides outweigh the benefits; remember the principle of diminishing returns. Using -O3 is not recommended for gcc 4.x.
...so, either GCC is broken, or a lot of linux software is. My bet is it's a combination of the two Smile


I agree. I also think that it's a real life practical example of what happens when you develop programs in C and just because it happens on a different level than you can excuse, it doesn't make the problem any less relevant.

Quote:
Interesting, which compiler version was this? None of the VC versions I've tried support switching on strings (neither char* nor std::string). And the switch optimizations I've seen have constructed Jcc binary search trees... can't remember if the compiler constructs a directly indexed jumptable in some circumstances.


In all fairness, I don't remember. I just remember I did it once and was scratching my head for a while, until I ran the binary trough a debugger, to find the super-padded jump table.

Quote:
Btw, Windows tends to be compiled with a special build of the compiler... and who knows which compiler settings they use.


No doubt.

LP,
Jure
Post 27 Dec 2009, 17:39
View user's profile Send private message AIM Address Yahoo Messenger MSN Messenger Reply with quote
DustWolf



Joined: 26 Jan 2006
Posts: 373
Location: Ljubljana, Slovenia
DustWolf
revolution wrote:
DustWolf wrote:
The HLL is obviously designed in such a way that enables bugs like those. Saying that a list of real-life bugs does not apply to the language because the flawless-world theory does not imply their existence is bizarre to say the least.
Using that logic then everything sucks. Because there is always someone to screw it up in some way. What about all the crappy assemblers out there that screw up and break people's code, does that mean the assembly *language* sucks? Clearly assembly is "designed in such a way that enables bugs like those".


If FASM assembler was broken, I'd say FASM sucks.

Because C compilers are broken, I say C sucks.

Wow. Surprised

Oh... and by the way, GAS sucks.

LP,
Jure
Post 27 Dec 2009, 17:42
View user's profile Send private message AIM Address Yahoo Messenger MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
DOS386 wrote:
vid wrote:
Did you ever hear about optimisation by aligning loops and funcs?
Did you ever look at FASM source ? I can't find such "optimisations" inside

You should learn about optimizations from something else than FASM source. That would prevent embarrassing yourself by calling a optimization "crappy code".
Post 27 Dec 2009, 18:49
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
DustWolf



Joined: 26 Jan 2006
Posts: 373
Location: Ljubljana, Slovenia
DustWolf
vid wrote:
DOS386 wrote:
vid wrote:
Did you ever hear about optimisation by aligning loops and funcs?
Did you ever look at FASM source ? I can't find such "optimisations" inside

You should learn about optimizations from something else than FASM source. That would prevent embarrassing yourself by calling a optimization "crappy code".


Isn't aligning code and loops dependent on the pipeline length (or I/O speed if you use that, but you probably don't)? If yes, what exactly do you optimize your code for? 15? 30? It changes so much from CPU to CPU, I doubt it's worth it.

LP,
Jure
Post 27 Dec 2009, 18:55
View user's profile Send private message AIM Address Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Agner Fog's Optimizing Assembly wrote:
11.4 Alignment of code
Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an important
subroutine entry or jump label happens to be near the end of a 16-byte block then the
microprocessor will only get a few useful bytes of code when fetching that block of code. It
may have to fetch the next 16 bytes too before it can decode the first instructions after the
label. This can be avoided by aligning important subroutine entries and loop entries by 16.
Aligning by 8 will assure that at least 8 bytes of code can be loaded with the first instruction
fetch, which may be sufficient if the instructions are small. We may align subroutine entries
by the cache line size (typically 64 bytes) if the subroutine is part of a critical hot spot and
the preceding code is unlikely to be executed in the same context.
A disadvantage of code alignment is that some cache space is lost to empty spaces before
the aligned code entries.
In most cases, the effect of code alignment is minimal. So my recommendation is to align
code only in the most critical cases like critical subroutines and critical innermost loops.
Post 27 Dec 2009, 19:16
View user's profile Send private message Reply with quote
DustWolf



Joined: 26 Jan 2006
Posts: 373
Location: Ljubljana, Slovenia
DustWolf
Okay... so why again should one align with a superfluous instruction? Like why risk a dependency problem when you can align with nops without this danger?

LP,
Jure
Post 27 Dec 2009, 19:23
View user's profile Send private message AIM Address Yahoo Messenger MSN Messenger Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
He does say the effect is minimal and in fact, I don't see *any* part of the Windows OS being critical.

Not to mention doesn't it apply only for the *first* iteration? If so, what's the big deal? Critical loops are slow because they loop a lot of time, not because the first iteration is 1 clock cycle slower...

alignment optimization sucks.
Post 27 Dec 2009, 20:25
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
Quote:
Not to mention doesn't it apply only for the *first* iteration?

No, why should it?
Post 27 Dec 2009, 21:53
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
DustWolf wrote:
And my point is that it's a classic example of a program gone bloated due to the way in which it was developed, where the developers can afford it and the end result is unusable in a real-life situation.
In one real-life situation, yes, but not in all - it works fine for others. While saving in the XML based formats are slow (MSO as well as OOo), I've used both of the pigs to handle 100+ page documents with lots of tables and formatting. That pagecount might not be a lot in the grand scheme of things, but I wager that it's more than what a lot of people deal with.

DustWolf wrote:
f0dder wrote:
I'm not a big fan of the degeneration MS Office, and I'm not a particular big fan of OpenOffice either - both are bloated and slow. At least in the case of MSO you can still save in the non-xml formats which are much faster when working with big documents (OOo has partial .doc support, but it's not exactly über-stable nor feature-complete).
I don't think those actually work faster, because they need to be converted first.
Saving in a binary blob format is a lot faster than spitting out a huge chunk of (horribly formatted!) XML which is then zip'ed... which is how both OOXLM and ODF works. DOC might not be perfect, but at least only changed regions get flushed to disk, not the entire document stream.

Don't get me wrong, XML is (in theory - not in OOXML nor ODF) a decent exchange format. But it's not the best native format for an office suite - at least not in the way MSO and OOo work.

DustWolf wrote:
I agree. I also think that it's a real life practical example of what happens when you develop programs in C and just because it happens on a different level than you can excuse, it doesn't make the problem any less relevant.
I can't recall ever being bitten by a compiler bug doing C or C++. I know Visual C++ has had it's share, and VC6 also had a whole bunch of conformance issues (but that was 1998...), but I honestly can't recall being bitten by any. Not with GCC either. But I've had weird problems when writing quirky code or using language features I didn't fully understand at the time - C++ is a complex beast, and I'm not going to claim it's a perfect language.

DustWolf wrote:
f0dder wrote:
Interesting, which compiler version was this? None of the VC versions I've tried support switching on strings (neither char* nor std::string). And the switch optimizations I've seen have constructed Jcc binary search trees... can't remember if the compiler constructs a directly indexed jumptable in some circumstances.
In all fairness, I don't remember. I just remember I did it once and was scratching my head for a while, until I ran the binary trough a debugger, to find the super-padded jump table.
Are you sure it was Visual C++, then? And that the result was from a switch statement, and not some automatically generated or manually coded junk?
Post 27 Dec 2009, 22:03
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
vid wrote:
Quote:
Not to mention doesn't it apply only for the *first* iteration?

No, why should it?
Because after that isn't it loaded into the cache?

_________________
Previously known as The_Grey_Beast
Post 27 Dec 2009, 22:48
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1901
DOS386
vid wrote:
DOS386 wrote:
vid wrote:
Did you ever hear about optimisation by aligning loops and funcs?
Did you ever look at FASM source ? I can't find such "optimisations" inside

You should learn about optimizations from something else than FASM source. That would prevent embarrassing yourself by calling a optimization "crappy code".


NO.

1. There is more "crappy code" in the examples that just excessive aligns. Why did you "skip" the other problems ?

2. How much benefit does such aligning give ? Any real-world test ?

3. Aligning might itself speed up the loop (minimally), but the cost:

a. it increases BLOAT
b. because of a. , increased risk of near jumps rather than short -> even more BLOAT
c. because of a. and b. , less code fits into inner cache -> more accesses to outer cache and memory -> slow down
d. if paging is enabled, because of a. and b. , less code fits into one page -> more pages needed, more page table lookups -> slow down
e. won the "optimization" battle ... or not Confused

Quote:
Perhaps because FASM isn't particularly optimized, and written for old CPUs?


You are making Tomasz happy Smile
Post 28 Dec 2009, 08:14
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17467
Location: In your JS exploiting you and your system
revolution
Optimisation is a process, not a formula. Every program is different. There are no hard-and-fast rules that work for everything.

This happens quite frequently:

Original code + optimisation rule 1 = faster (yay)
Original code + optimisation rule 2 = faster (yay)
Original code + optimisation rules 1 and 2 = slower (boo)

Repeat this for all combinations of the myriad of optimisation "rules" you can find lying around the 'Net and the difficulty becomes clear.

For anything non-trivial: Test it one way, test it another way, test it a third way, and so on, until you die. And you still won't be able to guarantee the fastest code.
Post 28 Dec 2009, 08:23
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
Borsuc wrote:
Because after that isn't it loaded into the cache?

If I understand this correctly (optimization is not really my area), this is about CPU fetching opcode (from nearest cache), which happens on every instruction execution. Not about caching RAM in L1/L2 cache. I might be wrong though, but then I would be surprised for Agner and other to write what he writes.

DOS386 wrote:
1. There is more "crappy code" in the examples that just excessive aligns. Why did you "skip" the other problems ?

*MOST* of what you called "crappy code" was in fact optimization. With some others I agreed, or wasn't sure. That I didn't comment on all instances changes nothing on a fact you called optimization a "crappy code", "NOPE", and ":lol:" so many times.

Quote:
2. How much benefit does such aligning give ? Any real-world test ?

Nope. How much benefit do other optimizations you lack give? Do YOU have any real-world test on your side, when you are requesting them from me? Did you have any real-world test when you started bashing this code in the first place?

Quote:
3. Aligning might itself speed up the loop (minimally)

How do you know it is minimally? Any real world test? Or comparison to other optimizations you suggest, which supposedly aren't "minimal"?

Of course I agree there are (many) cases when this kind of optimization isn't really a best idea. But that still leaves you in embarassment of calling this optimization a "NOPE", "crappy code", and "LOL", whereas it is a incorrectly chosen optimization at best. And even now, when you know its reason, can't be sure whether it is suboptimal or not.
Post 28 Dec 2009, 11:55
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Borsuc wrote:

Not to mention doesn't it apply only for the *first* iteration?

That would hardly be the case because the first iteration is not started right after a jump, this optimization has more chances to speed up the 2nd iteration and above than the 1st one.

DustWolf wrote:

Okay... so why again should one align with a superfluous instruction? Like why risk a dependency problem when you can align with nops without this danger?

Because that way you waste less cycles (and the dependency thing is not something non-deterministic, you should be able to see if the previous instructions had written the register recently). However, note that modern processors support the multi-byte NOP and AMD in its optimization manual recommends a train of prefix 66h plus NOPs and gives you the patterns you should use for every amount of padding. (I guess that compilers don't resort frequently to any of them to make both brands evenly happy)
Post 28 Dec 2009, 16:05
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
When I view some exe in a hex editor, I see a lot of redundant 0xCCs (int3) between functions... so many in fact that I would say 25% of the freaking code is "padding".

When UPX compresses it down to a ratio of 6:1 or more it really starts to be fishy.

_________________
Previously known as The_Grey_Beast
Post 28 Dec 2009, 20:20
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
What exe, for example?
Post 28 Dec 2009, 23:22
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Actually it was a .dll, but I don't remember. Of course I'm pretty sure it wasn't just the padding which made it 6:1 ratio obviously...

_________________
Previously known as The_Grey_Beast
Post 28 Dec 2009, 23:29
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Borsuc wrote:
When I view some exe in a hex editor, I see a lot of redundant 0xCCs (int3) between functions... so many in fact that I would say 25% of the freaking code is "padding".

When UPX compresses it down to a ratio of 6:1 or more it really starts to be fishy.
Sounds like an executable built with "edit and continue" support, which is great while developing & debugging... or it could be for alignment purposes.

_________________
Image - carpe noctem
Post 30 Dec 2009, 00:20
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.