flat assembler
Message board for the users of flat assembler.
Index
> Main > Assembly Language As a Service |
Author |
|
redsock 27 Oct 2019, 09:29
Random thought of the day:
Fact: Every single piece of HLL I have ever transcoded with meaning and intent to machine code ends up being a minimum 20% improvement in execution speed (for non-ring0-bound-problems). Question: What if there were a way to allow, similar to how Mr. Godbolt has done, to submit a piece of HLL, for $$, and return a >= some arbitrary threshold performance gain. User-interface-wise is the main issue here, and of course remuneration ... surely we as a community of "those who can talk to the CPU" can come up with something like this. Thoughts? Lol: edit, fixed the link finally. Last edited by redsock on 11 Dec 2019, 10:37; edited 1 time in total |
|||
27 Oct 2019, 09:29 |
|
redsock 27 Oct 2019, 09:44
I hear what you are saying, and readily acknowledge the difficulty here .... however, most of my clients fall into two neat categories:
1) Embedded SOC systems 2) x86_64 systems. I am not suggesting that my AMD 1950 versus some Core X9900X (hah) is a reasonable metric, but of all of my SSE2+ processor restrictions, if I submit an HLL bit of test code, and you or I can't beat the compiler output, well... that is what i am talking about. If we are really talking about the fine feathers of single clock cycle gains, well, that isn't Assembly Language as a Service, that is more about processor-specific optimisation. Who can optimise anything, when the target is totally opaque? |
|||
27 Oct 2019, 09:44 |
|
revolution 27 Oct 2019, 09:49
Would people just submit some one million line source and expect to get back a binary blob that is x% faster than the HLL compiled binary blob?
|
|||
27 Oct 2019, 09:49 |
|
redsock 27 Oct 2019, 09:56
Maybe some "hot code path" with a clear test case would be prerequisite... Surely it could be abused...
I have myself had many moments where I look at my total system design and think "hmm, that might have been better had I thought about that sooner" .. Assembly Language as a Service though is when I look at my profiling output and _know_ it can be better ... Maybe wishful thinking, maybe not. I would happily pay you guys a decent wage when I run myself into a wall with perf counters. |
|||
27 Oct 2019, 09:56 |
|
revolution 27 Oct 2019, 10:02
There is definitely a market for performance enhancement.
I've worked with places that have 1000s of cores running some problem and they would save a lot of money if their code could run 5% faster. |
|||
27 Oct 2019, 10:02 |
|
donn 30 Oct 2019, 22:47
Yeah, how would they consume the output? What if it were just a function, as a library? What if they wanted to version control the result and improve it over time, perhaps in some simple way that does not impact performance, would they have to resubmit?
Is there a way they could build the result with fasmg so they have control on the rebuilding? So, maybe X amount of lines of x86 are in what they receive, the rest are macros they can tinker with..? Would be cool if there was a table at the bottom that 'proved' test execution times were faster once optimized. |
|||
30 Oct 2019, 22:47 |
|
revolution 30 Oct 2019, 23:51
donn wrote: Is there a way they could build the result with fasmg so they have control on the rebuilding? |
|||
30 Oct 2019, 23:51 |
|
donn 31 Oct 2019, 00:08
Does fasmg not perform size optimizations (instruction 'flattening'?) like fasm1 does? If not, then yes agree they would be missing some performance gains.
Thought I read fasm attempted to emit the smallest instruction sizes possible pass to pass. I guess if they used fasm1, they could just receive a separate .inc file that shows externs they could link against and rename if desired. |
|||
31 Oct 2019, 00:08 |
|
revolution 31 Oct 2019, 00:14
I mean that fasm runs a lot faster than fasmg. They can both generate the same binary output, but fasm will finish much earlier. On my system fasm is about 50x faster than fasmg, so for large code the difference is very significant.
|
|||
31 Oct 2019, 00:14 |
|
donn 10 Dec 2019, 23:57
At first, I was thinking the fasmg compilation times wouldn't really matter much if most of a compilation was x86, and very few macros were used. I'm using fasmg on a bigger project now on GitHub (will post links later) and yes, I'm seeing it's slower. I also saw a fasmg user (Maoko?) ran into some compilation times of >30 seconds.
Has there been any word if the performance of fasmg will improve? Can any compilation involve optimizations or caching? Are there any posted performance tips? My Github project also involves building tests with msvc and GoogleTest, which is much slower than fasmg, but I'm also starting to realize some of the x86 instruction compilations probably use macros too, right? Not really concerned, but just curious in general. |
|||
10 Dec 2019, 23:57 |
|
Tomasz Grysztar 11 Dec 2019, 10:28
donn wrote: At first, I was thinking the fasmg compilation times wouldn't really matter much if most of a compilation was x86, and very few macros were used. donn wrote: Has there been any word if the performance of fasmg will improve? Can any compilation involve optimizations or caching? Are there any posted performance tips? The very idea of doing things this way implied a terrible performance - keep in mind that if an x86 instruction is just a macro, fasmg ends up producing hundreds of lines to process from just a single instruction, and they all need to be re-interpreted each time, because macros are at their heart just a simple textual replacement. For this reason I did in fact expect it to be even worse than it turned out. I was actually amazed when I discovered that fasmg is able to self-host in less than 10 seconds. It is perhaps this amazement that made me stick to the idea, especially because having everything in form of macros allows for many fun tricks and incredible extensibility. Nonetheless, I am working on a new sub-project (still only in design phase, under codename "calm") that may become an interesting solution to many of the problems. I hope it would allow to make something closer to fasm 2 while preserving fasmg's way of building output formats (including relocations) with customizable macros (I mentioned that problem in another thread). |
|||
11 Dec 2019, 10:28 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.