flat assembler
Message board for the users of flat assembler.
Index
> Projects and Ideas > Rewritting applications in Assembler for Benchmarking |
Author |
|
revolution 03 Nov 2008, 04:52
I always estimate about double the speed. Although take that figure with a grain of salt because there are just so many dependencies that actual generic figures are impossible.
The main thing to keep in mind is that only certain parts of a program can get a good benefit from ASM. Heavily used computation loops are a prime candidate for ASM optimisations. Fluffy stuff like GUIs and user interaction will mostly see little or no benefit from ASM optimisation. |
|||
03 Nov 2008, 04:52 |
|
roboman 03 Nov 2008, 15:38
The other thing to remember is that many programs, such as games, already do have the speed critical sections written in inline asm
|
|||
03 Nov 2008, 15:38 |
|
baldr 03 Nov 2008, 18:32
roboman,
Another thing to remember is that operating system (under control of which these programs run) do have the speed critical sections written in inline Visual Basicâ„¢, so don't expect much speed gain (unless your program is heavily CPU-bound). |
|||
03 Nov 2008, 18:32 |
|
f0dder 04 Nov 2008, 14:15
And keep in mind that large game parts are GPU and not CPU dependant nowadays...
|
|||
04 Nov 2008, 14:15 |
|
baldr 04 Nov 2008, 18:10
f0dder,
Aha! It's about GPU, "Loading..." (how do I miss it? ) |
|||
04 Nov 2008, 18:10 |
|
f0dder 05 Nov 2008, 07:36
baldr: loading screens might take time, but usually the CPU load isn't very high during load - at least not on any system I've had in the last few years. So I guess it's about disk I/O and throwing textures at the GPU?
|
|||
05 Nov 2008, 07:36 |
|
zir_blazer 05 Nov 2008, 08:40
roboman wrote: The other thing to remember is that many programs, such as games, already do have the speed critical sections written in inline asm I recall than that was done with the Wolfenstein 3D engine (Again, ID Software), but that was in a time when you had to get the most of current Hardware and not where you just go and buy faster Hardware to compensate for developers lazyness or rushed products. I doubt that more than a little minority of Software and games got special treatment for optimizations like that these days. The only developer that I know that did a relatively modern game in ASM was the oldschool Chris Sawyer ( http://www.chrissawyer.com/faq3.htm ), that I supposed that besides Rollercoaster Tycoon, should have done Transport Tycoon and Locomotion in ASM, too. Besides games, for console emulation purposes (For those that ever used it), around year 2002, NeoRageX for Neo Geo Roms (From some arcade machines) was also made in ASM and did a world of a difference in my old K6-II 500 MHz compared to the official MAME32 client (I recall than I was enthusiast at that time because Metal Slug 3. MAME32 emulated it like a Powerpoint presentation and NeoRageX allow full playability at real time), as does No$gba for running Nintendo DS games. f0dder wrote: And keep in mind that large game parts are GPU and not CPU dependant nowadays... This depends on load balancing and varies from game to game and also graphics settings. At the high end spectrum of performance GPU matters much more, but on the low end Processors usually helps a little with the exeption that you're seriously bottlenecked by the GPU even at minimal settings (IGPs). If anything, I don't think that the idea of a sort of hybrid, modular engine that allows the CPU to take care of a few extra things if there is enough processing headroom (SMP viability included, but these two ideas should be ridiculous hard to implement as would require major rework of engines that have Source Code available and lets simply not talk if you need to hack your way to attempt to improve one that you don't have the access to it). Again, another engine that I recall than could be done by Software rendering or OpenGL was Quake 2 (Not a precise case though, as they weren't the same version of the engine but for using with the CPU or the GPU, the OpenGL one was vastly superior graphically wise). f0dder wrote: baldr: loading screens might take time, but usually the CPU load isn't very high during load - at least not on any system I've had in the last few years. So I guess it's about disk I/O and throwing textures at the GPU? But the code than does that loading could be optimized and improved, too. |
|||
05 Nov 2008, 08:40 |
|
Sean4CC 16 Apr 2015, 00:39
Both gcc and the java hotspot compiler will do a better job of optimizing machine code than you can, unless you have lots and lots of time on your hands. Sometimes you can do better with a smart sequence of instructions that the compiler can't possibly work out. I found haddps works well but seems to be unknow to compilers. The java hotspot just in time compiler is much better at optimizing than gcc.
The problem with both of those is undefined behavior. For C: http://blog.regehr.org/archives/213 With java low level operations are fully defined but memory behavior is undefined when dealing with large data sets. It seems you are supposed to just use trial and error to pick memory settings. That is nonsense and can mean your code can flake at any moment. However java does have memory mapped files which is a way around the problem. If you are writing security related software or code that must work reliably then using assembly is a good way to have fully defined behavior, even if the code is actually a little slower than using a compiler. |
|||
16 Apr 2015, 00:39 |
|
gens 16 Apr 2015, 15:12
try reading some code that gcc spits out
JIT even better |
|||
16 Apr 2015, 15:12 |
|
revolution 16 Apr 2015, 15:32
Sean4CC wrote: Both gcc and the java hotspot compiler will do a better job of optimizing machine code than you can, unless you have lots and lots of time on your hands ... Compilers aren't magic. They can do okay sometimes if the algorithm can be expressed well in the language used. But they can't make use of the full CPU instruction set and will miss many little tricks and techniques. Last edited by revolution on 17 Apr 2015, 12:39; edited 1 time in total |
|||
16 Apr 2015, 15:32 |
|
randall 17 Apr 2015, 12:03
revolution wrote:
I agree with revolution. It is not that hard to beat the compiler. The most important thing in my opinion is to know the machine and use tools (iaca, vtune). Recently I have been reading a lot about Haswell, AVX, AVX2 and FMA. I have written simple raymarching demo in ASM and C (using intrinsics). Hand written ASM version is about 11% faster than C version compiled with MSVC 2013 (in a simple test scene). ASM version is here if someone is interested: https://github.com/michal-z/qjulia Requires AVX2 and FMA capable CPU. This is WIP, my main goal is to render Quaternion Julia Sets in real-time on the CPU. |
|||
17 Apr 2015, 12:03 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.