flat assembler
Message board for the users of flat assembler.
Index
> Main > HLL compilers generate better code than hand written asm? Goto page Previous 1, 2, 3, 4, 5, 6, 7 Next |
Author |
|
HaHaAnonymous 11 Mar 2015, 05:27
That is true. I tested the same program with an "Intel(R) Atom(TM) CPU N270" CPU:
Code: 11531ms ; xor 11759ms ; mov It ran the same code (in almost 12 seconds o_O). |
|||
11 Mar 2015, 05:27 |
|
revolution 11 Mar 2015, 05:39
Long gone are the simple days of counting CPU instructions and knowing how long it will take to run. Current CPUs run code in an indeterminate time.
Note that long term tests (more than ~10ms) will usually require getting an average of many runs to help eliminate the effect of periodic OS tasks that affect the timings. |
|||
11 Mar 2015, 05:39 |
|
HaHaAnonymous 11 Mar 2015, 18:25
Another thing is that compilers often generate unnecessary instructions (even at the highest possible optimization levels):
Code: 40: 55 push ebp 41: 89 e5 mov ebp,esp 43: 83 ec 04 sub esp,0x4 46: 89 5d fc mov DWORD PTR [ebp-0x4],ebx 49: b8 19 00 00 00 mov eax,0x19 4e: 50 push eax 4f: e8 fc ff ff ff call test123 54: 5b pop ebx 55: 50 push eax 56: e8 fc ff ff ff call test123 5b: 5b pop ebx 5c: 50 push eax 5d: e8 fc ff ff ff call test123 62: 5b pop ebx 63: 50 push eax 64: e8 fc ff ff ff call test123 69: 5b pop ebx 6a: 50 push eax 6b: e8 fc ff ff ff call test123 70: 5b pop ebx 71: 89 c2 mov edx,eax 73: 01 d0 add eax,edx 75: 8b 5d fc mov ebx,DWORD PTR [ebp-0x4] 78: c9 leave 79: c3 ret While it could do this (if any error please let me know): Code: push $19 call test123 push eax call test123 push eax call test123 push eax call test123 push eax call test123 add esp,$04*5 add eax,eax ret Or even this: Code: push $19 call test123 mov [esp],eax call test123 mov [esp],eax call test123 mov [esp],eax call test123 mov [esp],eax call test123 add esp,$04 add eax,eax ret Unless, for some obscure reason the compiler generated code was "smarter" than the "hand written asm" above... o_O |
|||
11 Mar 2015, 18:25 |
|
AsmGuru62 11 Mar 2015, 20:22
Or even this:
Code:
mov edi, test123
...
call edi
...
call edi
...
call edi
...
call edi
...
call edi
|
|||
11 Mar 2015, 20:22 |
|
revolution 12 Mar 2015, 01:46
HaHaAnonymous: Which compiler are you using? Every compiler does different things. Some are crap. Some are slightly better than crap.
|
|||
12 Mar 2015, 01:46 |
|
HaHaAnonymous 12 Mar 2015, 01:54
AsmGuru62
That would require an additional "push-pop" but will end up using less bytes, I suppose... Quote:
I am using Free Pascal Compiler (FPC). Which is not known for being the best (not the worst as well), but this is the only HLL I know enough to write something other than "Hello World". Now another "real world" comparison, regular human fails again to generate faster code with HLL: Code: e0: 55 push ebp e1: 89 e5 mov ebp,esp e3: 83 ec 24 sub esp,0x24 e6: 89 5d dc mov DWORD PTR [ebp-0x24],ebx e9: 89 75 e0 mov DWORD PTR [ebp-0x20],esi ec: 89 7d e4 mov DWORD PTR [ebp-0x1c],edi ef: 8b 5d 08 mov ebx,DWORD PTR [ebp+0x8] f2: b1 00 mov cl,0x0 f4: 83 fb 02 cmp ebx,0x2 f7: 72 47 jb 140 f9: 89 5d fc mov DWORD PTR [ebp-0x4],ebx fc: 8b 45 fc mov eax,DWORD PTR [ebp-0x4] ff: 89 45 f0 mov DWORD PTR [ebp-0x10],eax 102: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0 109: df 6d f0 fild QWORD PTR [ebp-0x10] 10c: d9 fa fsqrt 10e: df 7d e8 fistp QWORD PTR [ebp-0x18] 111: 9b fwait 112: 8b 7d e8 mov edi,DWORD PTR [ebp-0x18] 115: be 02 00 00 00 mov esi,0x2 11a: 39 f7 cmp edi,esi 11c: 72 20 jb 13e 11e: 4e dec esi 11f: 90 nop 120: 46 inc esi 121: 89 d8 mov eax,ebx 123: ba 00 00 00 00 mov edx,0x0 128: f7 f6 div esi 12a: 85 d2 test edx,edx 12c: 75 0c jne 13a 12e: 39 de cmp esi,ebx 130: 75 0e jne 140 132: 39 de cmp esi,ebx 134: 75 04 jne 13a 136: b1 01 mov cl,0x1 138: eb 06 jmp 140 13a: 39 f7 cmp edi,esi 13c: 77 e2 ja 120 13e: b1 01 mov cl,0x1 140: 88 c8 mov al,cl 142: 8b 5d dc mov ebx,DWORD PTR [ebp-0x24] 145: 8b 75 e0 mov esi,DWORD PTR [ebp-0x20] 148: 8b 7d e4 mov edi,DWORD PTR [ebp-0x1c] 14b: c9 leave 14c: c3 ret This hand written assembly alternative was written to have the exactly same behavior: Code: 0: 83 7c 24 04 02 cmp DWORD PTR [esp+0x4],0x2 5: 72 36 jb 3d 7: 55 push ebp 8: 50 push eax 9: 8b 6c 24 0c mov ebp,DWORD PTR [esp+0xc] d: db 44 24 0c fild DWORD PTR [esp+0xc] 11: d9 fa fsqrt 13: db 1c 24 fistp DWORD PTR [esp] 16: b9 02 00 00 00 mov ecx,0x2 1b: 31 d2 xor edx,edx 1d: 89 e8 mov eax,ebp 1f: f7 f1 div ecx 21: 85 d2 test edx,edx 23: 74 0d je 32 25: 83 c1 01 add ecx,0x1 28: 3b 0c 24 cmp ecx,DWORD PTR [esp] 2b: 76 ee jbe 1b 2d: b0 01 mov al,0x1 2f: 5d pop ebp 30: 5d pop ebp 31: c3 ret 32: 39 cd cmp ebp,ecx 34: 75 05 jne 3b 36: b0 01 mov al,0x1 38: 5d pop ebp 39: 5d pop ebp 3a: c3 ret 3b: 5d pop ebp 3c: 5d pop ebp 3d: 31 c0 xor eax,eax 3f: c3 ret What is difference between them, and why the hand written code runs almost 5 times faster o_O!? I really hope there are unnecessary steps that I am unaware of in the high level code, otherwise it is pretty shameful. And there are people who still believe assembly code is bug prone... I prefer to believe otherwise. Code: 1590ms ; compiler generated 335ms ; "hand written asm" EDIT: It seems I used an incorrect loop and the compiler was redoing the fsqrt without the need (stupid)... The newer is faster now. Code: 342ms ; compiler generated 335ms ; "hand written asm" EDIT2: Now with an unrolled loop in the "hand written asm" I could get even more speed: Code: 0: 83 7c 24 04 02 cmp DWORD PTR [esp+0x4],0x2 5: 0f 82 83 00 00 00 jb 8e b: 55 push ebp c: 50 push eax d: 8b 6c 24 0c mov ebp,DWORD PTR [esp+0xc] 11: db 44 24 0c fild DWORD PTR [esp+0xc] 15: d9 fa fsqrt 17: db 1c 24 fistp DWORD PTR [esp] 1a: b9 02 00 00 00 mov ecx,0x2 1f: 31 d2 xor edx,edx 21: 89 e8 mov eax,ebp 23: f7 f1 div ecx 25: 85 d2 test edx,edx 27: 74 57 je 80 29: 83 c1 01 add ecx,0x1 2c: 3b 0c 24 cmp ecx,DWORD PTR [esp] 2f: 77 48 ja 79 31: 31 d2 xor edx,edx 33: 89 e8 mov eax,ebp 35: f7 f1 div ecx 37: 85 d2 test edx,edx 39: 74 45 je 80 3b: 83 c1 01 add ecx,0x1 3e: 3b 0c 24 cmp ecx,DWORD PTR [esp] 41: 77 36 ja 79 43: 31 d2 xor edx,edx 45: 89 e8 mov eax,ebp 47: f7 f1 div ecx 49: 85 d2 test edx,edx 4b: 74 33 je 80 4d: 83 c1 01 add ecx,0x1 50: 3b 0c 24 cmp ecx,DWORD PTR [esp] 53: 77 24 ja 79 55: 31 d2 xor edx,edx 57: 89 e8 mov eax,ebp 59: f7 f1 div ecx 5b: 85 d2 test edx,edx 5d: 74 21 je 80 5f: 83 c1 01 add ecx,0x1 62: 3b 0c 24 cmp ecx,DWORD PTR [esp] 65: 77 12 ja 79 67: 31 d2 xor edx,edx 69: 89 e8 mov eax,ebp 6b: f7 f1 div ecx 6d: 85 d2 test edx,edx 6f: 74 0f je 80 71: 83 c1 01 add ecx,0x1 74: 3b 0c 24 cmp ecx,DWORD PTR [esp] 77: 76 a6 jbe 1f 79: b0 01 mov al,0x1 7b: 5d pop ebp 7c: 5d pop ebp 7d: c3 ret 7e: 90 nop 7f: 90 nop 80: 39 cd cmp ebp,ecx 82: 75 08 jne 8c 84: b0 01 mov al,0x1 86: 5d pop ebp 87: 5d pop ebp 88: c3 ret 89: 90 nop 8a: 90 nop 8b: 90 nop 8c: 5d pop ebp 8d: 5d pop ebp 8e: 31 c0 xor eax,eax 90: c3 ret Code: 342ms ; compiler generated 302ms ; "hand written asm" Good enough for a newbie assembly coder, now imagine what more experienced coders could achieve... Compilers are very far from the best possible code. o_O |
|||
12 Mar 2015, 01:54 |
|
edfed 12 Mar 2015, 08:25
does hll compiler coders write better code than asm compiler coders?
|
|||
12 Mar 2015, 08:25 |
|
TmX 12 Mar 2015, 09:29
edfed wrote: does hll compiler coders write better code than asm compiler coders? Better is a fuzzy term. Maybe you should use terms like "faster", "shorter", "cleaner" instead. In general, could be yes, could be no. Anyway, every time I take a look at assembly listings generated by the compiler, I get dizzy. On the other hand, my aim is not the most compact/fastest code. As long as it's maintainable and perforrms reasonable well then it's good enough to me. |
|||
12 Mar 2015, 09:29 |
|
revolution 12 Mar 2015, 09:41
I'm not so sure that either "faster" or "cleaner" are unambiguous. The only really objective judgement that I can think of is "shorter". But "shorter" is pretty useless for making any qualitative assessment of code.
|
|||
12 Mar 2015, 09:41 |
|
TmX 12 Mar 2015, 09:53
"Faster" as in execution time, not development time.
I agree "cleaner" actually is also fuzzy. Usually it means: - not involving reduncany parts - descriptive names - the program flow in general is easy to understand - the role of each variable is easy to understand - how each function/method interact with each other/variable is easy to understand - etc Kinda subjective, right? |
|||
12 Mar 2015, 09:53 |
|
revolution 12 Mar 2015, 09:59
"Faster" is a different value for each system. Some systems run A faster and others systems run B faster, so which is "best" A or B?
|
|||
12 Mar 2015, 09:59 |
|
TmX 12 Mar 2015, 13:59
Well I forget that.
Hmm, what about measuring which code is faster most of the time? |
|||
12 Mar 2015, 13:59 |
|
revolution 12 Mar 2015, 14:03
System A runs code A faster all of the time. System B runs code B faster all of the time. How would you define "most of the time" there?
|
|||
12 Mar 2015, 14:03 |
|
TmX 12 Mar 2015, 14:55
Sorry, bad wordings.
English is not my native language Let's say you have 100 different machines to test. It's possible that code A runs faster on some machines, while B does on other machines. If, for example, on most machines code A runs faster that B, I think we can safely conclude that A is faster. Well that's my idea |
|||
12 Mar 2015, 14:55 |
|
JohnFound 12 Mar 2015, 17:20
Well, in order to be precise, it is always possible to write such code A and code B, so on all possible machines code A will be faster than B (or vice versa).
I mean, you guys always assume that code A and B are optimized to some extent. But it is not always the case. Also, IMO, the bigger possible mistake is that all these benchmarks and code comparisons, in most cases compare too small chunks of code, assuming that if every small chunk of the program is optimal, the whole program will be optimal as well. But this assumption is not true at all. |
|||
12 Mar 2015, 17:20 |
|
HaHaAnonymous 12 Mar 2015, 19:24
Quote:
I thought compilers greatest strength was in single routines and similar because of all the additional garbage code they introduce in a whole program (e.g., making "10 jumps" just to call a function, calling many more functions to execute a trivial task when just a 4 instructions would get the job done, unnecessary memory to register/register to memory operations, and others I am not aware of...). In other words (ignorantly speaking), the compilers advantage (if any) tends to decrease in whole programs. Not to mention the fact that is (in my opinion) much harder to write good performance code in a HLL than in (x86) assembly. And I hope you did not forget to push the optimization button otherwise you will get one of worst possible machine code sequence. |
|||
12 Mar 2015, 19:24 |
|
revolution 12 Mar 2015, 23:58
JohnFound wrote: Well, in order to be precise, it is always possible to write such code A and code B, so on all possible machines code A will be faster than B (or vice versa). JohnFound wrote: I mean, you guys always assume that code A and B are optimized to some extent. But it is not always the case. JohnFound wrote: Also, IMO, the bigger possible mistake is that all these benchmarks and code comparisons, in most cases compare too small chunks of code, assuming that if every small chunk of the program is optimal, the whole program will be optimal as well. But this assumption is not true at all. |
|||
12 Mar 2015, 23:58 |
|
HaHaAnonymous 13 Mar 2015, 03:36
Quote:
OK, so let's consider code generation is like chess... And it came to a degree of sophistication and complexity that humans are not able to beat the computers at such task anymore. But I will still continue writing imperfect and inefficient assembly code for every architecture I can get my hands on as the speed differences are negligible when compared to code generated by "state of the art" compilers, so negligible that is completely non sense to lose all the benefits of writing programs completely in assembly code where you can customize every bit and downgrade to a HLL/compiler. Believe or not, HLL users often use the same statement to explain why they do not code in assembly or do it only for speed critical parts when they cannot make their compilers to generate an acceptable output. Do I use an HLL? I use, but it is more for emergency cases than anything else because... What if I need to write some code for a completely unknown architecture(e.g., SPARC, ARM, MIPS)/OS(e.g., Windows, MAC, OS/2, OS X Solaris, FreeBSD, DOS o_O) or where speed is critical (i.e. development time or I have to write the last program of my life)? That is why it is important (to me) to not forget it. And that is why I stick to only one (I would end up forgetting the others anyway since my brain do not like to keep knowledge I do not use (that is why practicing is really important, I almost forgot x86 when I stop coding for almost 1 month o_O)). I guess I talk too much. D: Sorry if you wasted your time reading this (what you can potentially consider bullshit). |
|||
13 Mar 2015, 03:36 |
|
JohnFound 13 Mar 2015, 06:53
HaHaAnonymous wrote: OK, so let's consider code generation is like chess... It is much more complex than chess. And that is why, the compilers can't optimize the whole program - it needs understanding. On small chunks such understanding is possible to be hard-coded by using heuristics, because the meaning of a small chunks of code is often limited. revolution wrote: I think what you are saying there is that B is not optimised at all. The code B might be optimized for size, instead of for speed. Sometimes the size optimizations are also faster, but not always. And here comes the question - is changing the algorithm can be considered optimization? _________________ Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9 |
|||
13 Mar 2015, 06:53 |
|
Goto page Previous 1, 2, 3, 4, 5, 6, 7 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.