flat assembler
Message board for the users of flat assembler.

flat assembler > Main > Usefulness of assembly in modern projects

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
marywilliam



Joined: 07 Jun 2018
Posts: 10
Been reading up on compilers and hand written assembly code efficiency. It seems modern compilers are incredibly efficient at optimizing code, leaving little room for hand written assembly.

Specifically, in modern C/C++ projects, it seems gcc with -03/0fast does an amazing job at producing compact and fast assembly code!!

Could someone give a concrete example of where it is still better to use hand written assembly code in a C/C++ project (perhaps with sample code). I don't mean embedded systems or anything like that. I would like an example in a normal simple C/C++ project, for example, solving a maze using A* algorithm. Where could assembly programming still be advantageous in a project like that?
Post 27 Aug 2019, 00:20
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16778
Location: In your JS exploiting you and your system
It depends upon your goal. High level languages have both advantages and disadvantages. You can use them in places where they are advantageous, and avoid them when it is disadvantageous. There will always be specific examples of where HLL is the "right" choice. But the opposite it also true.

I think it is wrong to assume that it is an all or nothing thing. Use what makes sense for the task at hand. The trick is to know which places are suited to what code. And if no one knows how to write assembly then we miss out on discovering those places where it is best used.
Post 27 Aug 2019, 00:47
View user's profile Send private message Visit poster's website Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 70
hand written assembly is more compact, so using it for library is ok. I don't prefer inline rather standalone libraries.
Post 27 Aug 2019, 05:20
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
Here is an example no compiler would produce:
https://board.flatassembler.net/topic.php?t=19103

The conversation in that thread also alludes to other work benefiting from analysis closer to the CPU than a compiler can generate. The model of the processor used by compilers is highly restrictive but sufficient for the high-level language.

_________________
¯\(°_o)/¯ unlicense.org
Post 27 Aug 2019, 07:48
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4209
Location: 2018
i currently use fasm to code proof of concepts or to test win32/dll functions in a way far more simple and fast than in c/c++.

if i make this step in c/c++ it takes 10 times more time due to the tools, makefiles, linking etc around theses langages.

when i get all i want from the function, i go to c/c++ because the client... want's c/c++
Post 27 Aug 2019, 08:58
View user's profile Send private message Visit poster's website Reply with quote
marywilliam



Joined: 07 Jun 2018
Posts: 10
bitRAKE wrote:
Here is an example no compiler would produce:
https://board.flatassembler.net/topic.php?t=19103

The conversation in that thread also alludes to other work benefiting from analysis closer to the CPU than a compiler can generate. The model of the processor used by compilers is highly restrictive but sufficient for the high-level language.


Thanks for the link. That was a fascinating thread to read through. You obviously have a lot of deep experience using asm, so areas for optimization pop out at you. In the other thread, it was mentioned that "compiler could never figure out how to use an instruction like "popcnt". Why is that the case? Could you explain why your solution would never be possible with a compiler? Also, do you have any books or tutorials that could help in asm optimizing.
Post 27 Aug 2019, 18:05
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 327
Location: Australia
If your point of reference is C/C++, it is worth visiting https://godbolt.org to witness often-eye-watering compiler output on a function-by-function basis.

_________________
2 Ton Digital - https://2ton.com.au/
Post 27 Aug 2019, 19:46
View user's profile Send private message Reply with quote
marywilliam



Joined: 07 Jun 2018
Posts: 10
redsock wrote:
If your point of reference is C/C++, it is worth visiting https://godbolt.org to witness often-eye-watering compiler output on a function-by-function basis.


Yes, I've been using that site for the past few days to compare c/c++ and asm code. It seems the gcc compiler already produces near optimum asm code. Do you have a simple example on godbolt of a C/C++ function and its asm equivalent where the asm version is faster?
Post 27 Aug 2019, 19:51
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 572
Location: Belarus
I doubt anyone keeps a collection of such pieces of code. Like someone in the linked thread said, compiler is just a set of finite state machines and logic over an AST. This stuff might be quite tricky but is still limited to what it is programmed to be. It can produce fast code in many cases but since it doesn’t (and cannot) perform complete analysis it will never be able to be better than a professional in all cases.

Compilers have the advantage of being able to process much data fast which is somewhat useful for better register and pipeline scheduling but these kinds of optimizations generally give less performance improvement than changes in the algorithm. I.e. taking advantage of side offects of certain instructions to achieve better data throughput. Not even mentioning that, yes, like it was said in that same topic, the HLL code should be written in quite specific way to allow a compiler recognize the case for vectorization and other cool tricks.

It’s a battle between “fast but stupid” and “slow but clever”: the first one can be good enough but the second one will always win, just some time later.
Post 27 Aug 2019, 20:39
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16778
Location: In your JS exploiting you and your system
Prime95 is an example of a code base that has hand optimised assembly for each architecture. Competing HLL code to do FFTs doesn't come close to the same performance.

I am unsure why the OP wants to ignore embedded systems. They, by far, make up the bulk of all CPUs used in the world. So to wave away the most significant segment is going to skew the results terribly.
Post 27 Aug 2019, 21:35
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
marywilliam wrote:
In the other thread, it was mentioned that "compiler could never figure out how to use an instruction like "popcnt". Why is that the case? Could you explain why your solution would never be possible with a compiler? Also, do you have any books or tutorials that could help in asm optimizing.
POPCNT isn't a keyword in C/C++. So, without the intrinsic the compiler isn't going to discover the transformation.

Greater clarity might be achieved by taking a step back at the broader process of software engineering in general. Roughly, the process could be stratified into the following layers:

1. algorithmic (mathematical language)
2. coding (programming languages)
3. architecture (machine code)
4. processor? (electronics -> physics)

Most simplistically it can be stated that translation from one layer to another is an NP complexity problem. Constraints are used to find a desirable solution in a reasonable time. Assembly language can minimize the second layer effects.

One technique for discovery outside of the typical high-level process might be to find algorithms which translate ideally to the architecture. Having the effect of shunting the complexity from the algorithmic perspective.

Only familiarity with the layers will help one transcend them, imho.

_________________
¯\(°_o)/¯ unlicense.org
Post 28 Aug 2019, 06:35
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16778
Location: In your JS exploiting you and your system
bitRAKE wrote:
Greater clarity might be achieved by taking a step back at the broader process of software engineering in general. Roughly, the process could be stratified into the following layers:

1. algorithmic (mathematical language)
2. coding (programming languages)
3. architecture (machine code)
4. processor? (electronics -> physics)
Yes, I agree.

But there is the complication of the CPU designers working their way upwards through the layers and designing the machine instructions to suit the popular HLLs. Notably the x86 has ENTER and LEAVE as part of that process. Other architectures also do similarly.

Of course the irony is that now ENTER and LEAVE are often slower to use than equivalent RISCy sequences (naturally it depends upon your CPU and the exact code sequences used).
Post 28 Aug 2019, 07:17
View user's profile Send private message Visit poster's website Reply with quote
KerimF



Joined: 23 Aug 2019
Posts: 16
Location: Aleppo-Syria
I believe that, for every application on which a programmer works, he chooses an available tool/language which is 'optimum' for it (based on his experience). The more available tools and languages mean having more options to choose from.

In general, HLL (as C) is usually seen as being good enough for PC programs. After all, knowing all needed details of a PC (including its various added peripherals) is not always possible to everyone.

On the other hand, the internal hardware and functions of embedded systems (CPUs, MCUs and alike) could usually be known in great details (from their datasheets and application notes). Writing their code in assembly becomes obvious if one likes taking the most of a chip and, perhaps, to be sure about the exact timing of his every function/routine.

I mean, so I may be wrong, there is no best absolute solution for an application; instead there is always an optimum one for it that depends on the programmer’s situation.

In this thread, programmers tend to share their various experiences and this is very interesting.


Last edited by KerimF on 28 Aug 2019, 17:58; edited 1 time in total
Post 28 Aug 2019, 09:16
View user's profile Send private message Reply with quote
marywilliam



Joined: 07 Jun 2018
Posts: 10
bitRAKE wrote:
marywilliam wrote:
In the other thread, it was mentioned that "compiler could never figure out how to use an instruction like "popcnt". Why is that the case? Could you explain why your solution would never be possible with a compiler? Also, do you have any books or tutorials that could help in asm optimizing.
POPCNT isn't a keyword in C/C++. So, without the intrinsic the compiler isn't going to discover the transformation.

Greater clarity might be achieved by taking a step back at the broader process of software engineering in general. Roughly, the process could be stratified into the following layers:

1. algorithmic (mathematical language)
2. coding (programming languages)
3. architecture (machine code)
4. processor? (electronics -> physics)

Most simplistically it can be stated that translation from one layer to another is an NP complexity problem. Constraints are used to find a desirable solution in a reasonable time. Assembly language can minimize the second layer effects.

One technique for discovery outside of the typical high-level process might be to find algorithms which translate ideally to the architecture. Having the effect of shunting the complexity from the algorithmic perspective.

Only familiarity with the layers will help one transcend them, imho.


Thanks for the detailed answer. Is there a list of keywords like POPCNT that are not available in C/C++ but are in asm?

And I guess experience weighs heavily in being able to further optimize some piece of code. You clearly have been doing that for years Smile Having a list of unavailable keywords might help me program C/C++ programs into a form of assembly that would be impossible for the compiler to generate and get me thinking in a new dimension.
Post 28 Aug 2019, 17:31
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
https://en.wikipedia.org/wiki/Intrinsic_function

Many have gone the route of working within C/C++: both extending their reach within their preferred language, and those transitioning from assembly. I know the value and strength of assembly language but I'm not of the mind that it's the only way. In fact, research continues on "super optimizers" which produce machine code no programmer would have thought of. Maybe, one day, the machine code will redefine the architecture - like the way DNA forms proteins, creating sites for enzymes; and we will program in abstractions as the algorithm specifies. Or maybe we are already there.

_________________
¯\(°_o)/¯ unlicense.org
Post 29 Aug 2019, 05:42
View user's profile Send private message Visit poster's website Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 70
Quote:

research continues on "super optimizers" which produce machine code no programmer would have thought of

those writting it thought off. But assembly is same, perhaps compiler writters din't thought off Razz
Post 30 Aug 2019, 03:15
View user's profile Send private message Reply with quote
st



Joined: 12 Jul 2019
Posts: 26
Location: Russia
Just imagine some compiler generating random instruction flows. Then it checks if that machine code produces desired side-effect. Then it chooses the best combination. I suppose there is no human in the world who is able to think so much. We just want programs to work right now, so do not wait when such a super-compiler finishes its mega-work. Very Happy
Post 30 Aug 2019, 12:22
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1430
st wrote:
Then it checks if that machine code produces desired side-effect.
The thing is that for any non-trivial problem, checking all inputs is completely infeasible. And compilers are not allowed to make mistakes in corner cases.
Post 30 Aug 2019, 13:04
View user's profile Send private message Reply with quote
st



Joined: 12 Jul 2019
Posts: 26
Location: Russia
Yes, of course the above-mentioned hypothetical method has a few (NP) problems. Smile However I do not know an easy way to prove arbitrary code is the best possible one.

Moreover, why would we assume any non-trivial hand-written code is correct? It is not proven on all inputs.
Post 30 Aug 2019, 14:48
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 572
Location: Belarus
st wrote:
Moreover, why would we assume any non-trivial hand-written code is correct? It is not proven on all inputs.

Isn’t proving that is what any good programmer does as part of writing code?
Post 30 Aug 2019, 16:56
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.