flat assembler
Message board for the users of flat assembler.

Index > Main > Assembly Language As a Service

Author
Thread Post new topic Reply to topic
redsock



Joined: 09 Oct 2009
Posts: 435
Location: Australia
redsock 27 Oct 2019, 09:29
Random thought of the day:

Fact: Every single piece of HLL I have ever transcoded with meaning and intent to machine code ends up being a minimum 20% improvement in execution speed (for non-ring0-bound-problems).

Question: What if there were a way to allow, similar to how Mr. Godbolt has done, to submit a piece of HLL, for $$, and return a >= some arbitrary threshold performance gain.

User-interface-wise is the main issue here, and of course remuneration ... surely we as a community of "those who can talk to the CPU" can come up with something like this.

Thoughts?

Lol: edit, fixed the link finally.

_________________
2 Ton Digital - https://2ton.com.au/


Last edited by redsock on 11 Dec 2019, 10:37; edited 1 time in total
Post 27 Oct 2019, 09:29
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 27 Oct 2019, 09:35
Wouldn't we also need to match our system with the buyers system to be able to test the results?

If I have a 4GB single channel AMD E350 running Linux with 100 background tasks and a buyer has a 128GB quad channel Intel i9-9970X running Windows 10 enterprise with zero other tasks, the results aren't going to be comparable.

BTW: Your goldbolt link is bad.
Post 27 Oct 2019, 09:35
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 435
Location: Australia
redsock 27 Oct 2019, 09:44
I hear what you are saying, and readily acknowledge the difficulty here .... however, most of my clients fall into two neat categories:

1) Embedded SOC systems
2) x86_64 systems.

I am not suggesting that my AMD 1950 versus some Core X9900X (hah) is a reasonable metric, but of all of my SSE2+ processor restrictions, if I submit an HLL bit of test code, and you or I can't beat the compiler output, well... that is what i am talking about.

If we are really talking about the fine feathers of single clock cycle gains, well, that isn't Assembly Language as a Service, that is more about processor-specific optimisation.

Who can optimise anything, when the target is totally opaque?

_________________
2 Ton Digital - https://2ton.com.au/
Post 27 Oct 2019, 09:44
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 27 Oct 2019, 09:49
Would people just submit some one million line source and expect to get back a binary blob that is x% faster than the HLL compiled binary blob?
Post 27 Oct 2019, 09:49
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 435
Location: Australia
redsock 27 Oct 2019, 09:56
Maybe some "hot code path" with a clear test case would be prerequisite... Surely it could be abused...

I have myself had many moments where I look at my total system design and think "hmm, that might have been better had I thought about that sooner" ..

Assembly Language as a Service though is when I look at my profiling output and _know_ it can be better ...

Maybe wishful thinking, maybe not. I would happily pay you guys a decent wage when I run myself into a wall with perf counters.

_________________
2 Ton Digital - https://2ton.com.au/
Post 27 Oct 2019, 09:56
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 27 Oct 2019, 10:02
There is definitely a market for performance enhancement.

I've worked with places that have 1000s of cores running some problem and they would save a lot of money if their code could run 5% faster.
Post 27 Oct 2019, 10:02
View user's profile Send private message Visit poster's website Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 30 Oct 2019, 22:47
Yeah, how would they consume the output? What if it were just a function, as a library? What if they wanted to version control the result and improve it over time, perhaps in some simple way that does not impact performance, would they have to resubmit?

Is there a way they could build the result with fasmg so they have control on the rebuilding? So, maybe X amount of lines of x86 are in what they receive, the rest are macros they can tinker with..?

Would be cool if there was a table at the bottom that 'proved' test execution times were faster once optimized.
Post 30 Oct 2019, 22:47
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 30 Oct 2019, 23:51
donn wrote:
Is there a way they could build the result with fasmg so they have control on the rebuilding?
If they were concerned about performance they they would probably want to use fasm.
Post 30 Oct 2019, 23:51
View user's profile Send private message Visit poster's website Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 31 Oct 2019, 00:08
Does fasmg not perform size optimizations (instruction 'flattening'?) like fasm1 does? If not, then yes agree they would be missing some performance gains.

Thought I read fasm attempted to emit the smallest instruction sizes possible pass to pass.

I guess if they used fasm1, they could just receive a separate .inc file that shows externs they could link against and rename if desired.
Post 31 Oct 2019, 00:08
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 31 Oct 2019, 00:14
I mean that fasm runs a lot faster than fasmg. They can both generate the same binary output, but fasm will finish much earlier. On my system fasm is about 50x faster than fasmg, so for large code the difference is very significant.
Post 31 Oct 2019, 00:14
View user's profile Send private message Visit poster's website Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 10 Dec 2019, 23:57
At first, I was thinking the fasmg compilation times wouldn't really matter much if most of a compilation was x86, and very few macros were used. I'm using fasmg on a bigger project now on GitHub (will post links later) and yes, I'm seeing it's slower. I also saw a fasmg user (Maoko?) ran into some compilation times of >30 seconds.

Has there been any word if the performance of fasmg will improve? Can any compilation involve optimizations or caching? Are there any posted performance tips?

My Github project also involves building tests with msvc and GoogleTest, which is much slower than fasmg, but I'm also starting to realize some of the x86 instruction compilations probably use macros too, right? Not really concerned, but just curious in general.
Post 10 Dec 2019, 23:57
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 11 Dec 2019, 10:28
donn wrote:
At first, I was thinking the fasmg compilation times wouldn't really matter much if most of a compilation was x86, and very few macros were used.
If most of your source is x86 code, then it it is actually mostly macros, because fasmg has all x86 instructions implemented in form of (rather complex) macros. Therefore you cannot have "very few macros used" when you assemble x86 instructions with fasmg - this an important detail to keep in mind.

donn wrote:
Has there been any word if the performance of fasmg will improve? Can any compilation involve optimizations or caching? Are there any posted performance tips?
I've been doing many small optimizations ever since fasmg became capable of self-hosting, and I managed to progressively get the assembly down to less than a half of what it was initially. But there is only so far we can get while keeping all instructions implemented as regular macros. Because what makes fasmg so fun for me - the fact that everything in the source text can be overridden and re-interpreted at any time - makes it next to impossible to use any fasm-like pre-parsing tricks.

The very idea of doing things this way implied a terrible performance - keep in mind that if an x86 instruction is just a macro, fasmg ends up producing hundreds of lines to process from just a single instruction, and they all need to be re-interpreted each time, because macros are at their heart just a simple textual replacement. For this reason I did in fact expect it to be even worse than it turned out. I was actually amazed when I discovered that fasmg is able to self-host in less than 10 seconds. It is perhaps this amazement that made me stick to the idea, especially because having everything in form of macros allows for many fun tricks and incredible extensibility.

Nonetheless, I am working on a new sub-project (still only in design phase, under codename "calm") that may become an interesting solution to many of the problems. I hope it would allow to make something closer to fasm 2 while preserving fasmg's way of building output formats (including relocations) with customizable macros (I mentioned that problem in another thread).
Post 11 Dec 2019, 10:28
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.