flat assembler
Message board for the users of flat assembler.
Index
> Main > HLL compilers generate better code than hand written asm? Goto page Previous 1, 2, 3, 4, 5, 6, 7 Next |
Author |
|
revolution 19 Mar 2015, 23:54
l_inc wrote: I think those who stubbornly reject these tools and continue to do trivial tasks by hand are doomed to stay aside from the evolution and stagnate. |
|||
19 Mar 2015, 23:54 |
|
l_inc 20 Mar 2015, 00:05
revolution
I assume you missed my P.S. By that sentence I meant only those who stubbornly reject, which is pretty much what the sentence states. As for the other reasonings read the P.S. _________________ Faith is a superposition of knowledge and fallacy |
|||
20 Mar 2015, 00:05 |
|
Tyler 20 Mar 2015, 01:05
l_inc, you said it better than I did, but that's basically my opinion on the subject.
Though I'll admit I never made it to the point of actually being able to outdo a compiler. For me, assembly was a useful step in my learning. Unavoidable, even, given that I didn't like not knowing how HLLs could work. I only spent enough time with it to gain a basic understanding of how most things could be done, then I moved to C. (And staid there until, one day, I was writing a linked list/vector library and it hit me that classes would be really nice.) I never thought about caches, pipelines, or most other lower level stuff until my MIPS class at Uni. |
|||
20 Mar 2015, 01:05 |
|
HaHaAnonymous 20 Mar 2015, 01:17
Quote:
I think those who must care about this are not the hobbyists, casual or solitary/isolated coders (HaHaAnonymous fits in this category) but those who work for some random corporation writing software for money. Or simply coding for money... Where the competition is really fast and the end users do not give a star for actual quality but speed (speed of development). Or simply those who aim to be rich($$$$$) or make some money by writing code where the latest development technologies cannot be ignored if they want to succeed. Quote:
But I do (practice and learn), it is harder (many rules and "bla-bla-bla"). The reason more than one is not needed (for me at least). Unless you chose a crap one that does not support as many things as possible (e.g. python, lua, C#, visual basic...). These are just HaHaAnonymous' opinions and it may be out of reality, or not. D: I apologize for any inconvenience, if any. |
|||
20 Mar 2015, 01:17 |
|
Tyler 20 Mar 2015, 01:58
HaHaAnonymous wrote:
E.g. finding this ODE system and plotting 10,000 different initial conditions (using Python). http://thardin.name/1_1.png http://thardin.name/1_1_zoom.mp4 (Watch in VLC and use + key to speed it up to x4.) Or this nbody simulation using Python with OpenCL. (Speed these up by x8.) http://thardin.name/nbody-square.mkv (Simple, symmetric system.) http://thardin.name/nbody-perturb.mkv (Simple system, with small perturbation.) Or a 3D plotter in Python. (In case you can't tell, it's super easy to be productive in Python.) And a game of life thingy in C++/Gtk. A thread pool in C++. A 2048 solver in C++. And I'm doing a fluid simulator in CS for my senior project. All of this was less that 1000 lines. (Most of them A LOT less.) Well, my capstone will probably be longer, but that's because I'm being "extrinsically motivated" to continue it. I think the most I ever did in ASM was a crappy attempt at a boot sector calculator, an LF->CRLF converter, and a lot of crappy attempts at OS dev. (And lots of toy programs, like prime finder/lister/checker, factoring, Collatz, etc.) |
|||
20 Mar 2015, 01:58 |
|
Tyler 20 Mar 2015, 02:14
DISCLAIMER: None of that is meant to imply you can't accomplish things in asm. Fasm alone is 1000x cooler than anything I've made. Or Roller Coaster Tycoon #1. Or many other awesome asm project.
But those things take real dedication over a very extended time period. I'm just trying to work within my constraints. |
|||
20 Mar 2015, 02:14 |
|
l_inc 20 Mar 2015, 02:19
Tyler
I'm glad I managed to give a correct notion of my viewpoint after all. Quote: Though I'll admit I never made it to the point of actually being able to outdo a compiler It has already been mentioned that compilers are very bad at automatic vectorization. So I guess a couple of days of getting familiar with SSE and an appropriate task will give you a pleasure of outdoing any C compiler. I had a class on advanced computer architectures and made a vectorization over 13 times faster then gcc's compilation output was, which was close to the highest score for that homework. The compiler intrinsics allowed however for a similar performance boost, but still a bit less impressive. HaHaAnonymous Quote: I think those who must care about this are not the hobbyists, casual or solitary/isolated coders (HaHaAnonymous fits in this category) but those who work for some random corporation That's not true. As a hobbyist you might wanna find a tool sufficiently expressive for your ideas. If your ideas are limited to making an as-small-as / as-fast-as possible whatever, then you'll probably be happy with a sole assembly compiler. If you want a cool web-server, and then you want a nice game of your dream, and then you want a yet-another-one-whatever and all of it with your own hands and crossplatform and ASAP, cause tomorrow you'll have a dozen more of projects, then you'd probably need a HLL that allows you to express your ideas more efficiently without being forced to think whether you store your counter into ecx because it could then be used by the loop instruction or into ebx because it then won't be corrupted by an intermediate call to a library function. In that respect your tools and methods might be similar to those of a commercial corporation. Whatever goals and whatever tools you choose, remember that your life is finite. _________________ Faith is a superposition of knowledge and fallacy |
|||
20 Mar 2015, 02:19 |
|
m3ntal 21 Mar 2015, 17:27
Algorithm is everything.
revolution: Quote: A lot of the work I do requires complete knowledge and justification of what the CPU is doing, as per the spec. Please criticize yourself for a change. Please take one look at your code then look at mine. You should be asking for advice instead of imposing it. Here: REVOLUTION'S ARM MACROS Code: macro def_ustring labl,[string] {common labl dU string} macro def_astring labl,[string] {common labl dB string} macro apscall function,[parameter] { common local pcount,tempcount,found,.skip,.size,param,last_value,temp,size,instr,i_s,msize virtual nop temp=$-$$ end virtual if temp<>4 halt ;APSCALL macro NOT usable in thumb mode end if if ~ parameter eq if .size b .skip end if temp=$ tempcount=0 reverse local ..arg found equ no match i[like]za,:parameter: \{ found equ \} match =no:*ustring,found:parameter \{ def_ustring ..arg,ustring,0 found equ \} match =no:some=,more,found:parameter \{ def_astring ..arg,parameter,0 found equ \} match =no,found \{ if parameter eqtype '' def_astring ..arg,parameter,0 end if \} tempcount=tempcount+1 common pcount=tempcount align 4 .size=$-temp if .size .skip: end if lastvalue=1 shl 63 tempcount=0 reverse if tempcount<(pcount-4) found equ no define param parameter match [address],parameter \{ LDR lr,[address] lastvalue=1 shl 63 str lr,[sp,-tempcount*4-4] found equ yes \} irp i_s,b:byte:byte,sb:sbyte:byte,h:hword:hword,sh:shword:hword,:word:word \{ match instr:msize:size,LDR\#i_s \\{ match =msize[address],parameter \\\{ instr lr,size[address] lastvalue=1 shl 63 str lr,[sp,-tempcount*4-4] found equ yes \\\}\\}\} match =addr address,param \{ lea lr,[address] lastvalue=1 shl 63 str lr,[sp,-tempcount*4-4] found equ yes \} match value =no,parameter found \{ if defined ..arg lastvalue=1 shl 63 ADD lr,pc,..arg-$-8 str lr,[sp,-tempcount*4-4] else if value eqtype r0 str value,[sp,-tempcount*4-4] else if value eqtype 0 virtual dw value load temp word from $-4 end virtual if temp <> lastvalue MOV lr,value end if lastvalue=temp str lr,[sp,-tempcount*4-4] else MOV lr,value lastvalue=1 shl 63 str lr,[sp,-tempcount*4-4] end if \} end if rept 4 p:0 \{\reverse if tempcount=pcount-p-1 found equ no match [address],parameter \\{ LDR r\#p,[address] found equ yes \\} irp i_s,b:byte:byte,sb:sbyte:byte,h:hword:hword,sh:shword:hword,:word:word \\{ match instr:msize:size,LDR\\#i_s \\\{ match =msize[address],parameter \\\\{ instr r\#p,size[address] found equ yes \\\\}\\\}\\} match =addr address,param \\{ lea r\#p,[address] found equ yes \\} match value =no,parameter found \\{ if ~ defined ..arg & value eqtype 0 virtual dw value load temp word from $-4 end virtual end if if defined ..arg ADD r\#p,pc,..arg-$-8 else if value eqtype 0 & lastvalue = temp MOV r\#p,lr else if ~ r\#p eq value MOV r\#p,value end if \\} end if \} tempcount=tempcount+1 common else pcount=0 end if if pcount>4 sub sp,sp,(pcount-4)*4 end if if defined _#function & _#function-$-8<4096 & _#function-$-8>-4096 mov lr,pc ldr pc,[pc,_#function-$-8] else bl function end if if pcount>4 add sp,sp,(pcount-4)*4 end if } Code: ;;;;;;;;;;; LEA: LOAD EFFECTIVE ADDRESS ;;;;;;;;;; ; lea r1, [r2] ; lea r1, [r2+r3] ; lea r1, [r2+10000000h] ; lea r1, [r2-20000000h] ; lea r1, [r2*3] ; lea r1, [r2*4] ; lea r1, [r2*5] ; lea r1, [r2*10] ; lea r1, [r2+r3*4] ; lea r1, [r4+r7*2+30000000h] ; lea r2, 40000000h ; lea r3, [50000000h] ; lea r4, [60000000h+r7] ; lea r5, [70000000h+r7*8] macro lea [p] { common define ?s 0 match r=,[x], p \{ ; r,[?] if x is.r32? ; r,[r] mov r, x else ; r,[?] match =0 \ a+b*c, ?s x \\{ match n+i, c \\\{ ; a+b*c+i addms r, a, b, n if use.ror? add r, i else ldr r12, =i add r, a, r12 end if define ?s 1 \\\} if ?s eq 0 if a is.r32? \ ; r+r*c & b is.r32? addms r, a, b, c else if b is.r32? \ ; i+r*c & a is.i? if use.ror.a? mov r, a addms r, r, b, c else ldr r12, =i add r, a, r12 end if else 'Error' end if end if define ?s 1 \\} match =0 \ a+b, ?s x \\{ if a is.r32? \ ; r+i & b is.i? if use.ror.b? add r, a, b else ldr r, =b add r, a, r end if else if \ b is.r32? & \ ; i+r a is.i? if use.ror.a? add r, b, a else ldr r, =a add r, b, r end if else ; assume add r, a, b ; r=a+b end if define ?s 1 \\} match =0 \ a-b, ?s x \\{ ; r-i if a is.r32? \ & b is.i? if use.ror.b? sub r, a, b else ldr r, =b sub r, a, r end if else ; ?,? 'Error' end if define ?s 1 \\} match =0 \ a*b, ?s x \\{ ; r=a*n if b eq 4 mov r, a, lsl 2 ; r=a*4 else if b eq 2 mov r, a, lsl 1 ; r=a*2 else if b eq 3 add r, a, a, lsl 1 ; r=a*3 else if b eq 5 add r, a, a, lsl 2 ; r=a*5 else if b eq 10 add r, a, a, lsl 2 ; r=a*10 add r, r else ; *? 'Error' end if define ?s 1 \\} if ?s eq 0 ; r=[i] ldr r, =x end if end if define ?s 1 \} match =0 \ ; no [] a=,b, ?s p \{ if a is.r32? ; r,? if b is.r32? ; r,r mov a, b else if b is.i? ; r,i ldr a, =b else ; r,? 'Error' end if else ; ?,? 'Error' end if define ?s 1 \} if ?s eq 0 'Error' end if } Last edited by m3ntal on 27 Mar 2015, 18:20; edited 2 times in total |
|||
21 Mar 2015, 17:27 |
|
m3ntal 21 Mar 2015, 18:09
Listen, whenever someone compliments me, I always respond with self-criticism: "No, I'm not that good", but what is "that good"? I know in my mind that I'll never be "that good". No such thing. I set my standards way above what I could ever reach.
|
|||
21 Mar 2015, 18:09 |
|
l_inc 22 Mar 2015, 00:18
m3ntal
Quote: I agree with most of the things you say except "vectorization" (multi-byte copy) is easy and can be automated. Vectorization is not multi-byte copy. Vectorization is a transformation of an algorithm that works on some data into an algorithm that works on parallel flows resulting from a clever split of that data. A compiler needs to find that split by recognizing the similarity in processing of different parts of that data, which is very hard to automatize especially for a general case. You can have a look at some slides about it here. _________________ Faith is a superposition of knowledge and fallacy |
|||
22 Mar 2015, 00:18 |
|
HaHaAnonymous 24 Mar 2015, 19:14
[ Post removed by author. ]
Last edited by HaHaAnonymous on 25 Mar 2015, 00:33; edited 1 time in total |
|||
24 Mar 2015, 19:14 |
|
AsmGuru62 24 Mar 2015, 22:13
Probably try it without MOVZX.
Also, suspiciosly high # of labels. EDIT: I had fun reading it! Not a waste. |
|||
24 Mar 2015, 22:13 |
|
redsock 24 Mar 2015, 23:39
that was a fun example HahaAnonymous, threw one together myself to see how long it'd take me: 2m35s
Code: format ELF64 ; two arguments: rdi == ptr to string, esi == nonzero length of same ; returns high word of eax == count of numbers, low word == count of letters public getcount getcount: xor eax, eax add rdi, rsi neg rsi .loop: movzx ecx, byte [rdi+rsi] add eax, [ecx*4+.table] add rsi, 1 jnz .loop ret .table: repeat 256 c = % - 1 if (c >= 'A' & c <= 'Z') | (c >= 'a' & c <= 'z') dd 0x1 else if (c >= '0' & c <= '9') dd 0x10000 else dd 0 end if end repeat public _start _start: mov rdi, .teststr mov esi, .teststrlen call getcount int3 nop mov eax, 60 ; exit xor edi, edi ; return code syscall .teststr db 'here are some letters, here are some numbers 823482389041209' .teststrlen = $ - .teststr |
|||
24 Mar 2015, 23:39 |
|
HaHaAnonymous 25 Mar 2015, 00:48
[ Post removed by author. ]
Last edited by HaHaAnonymous on 25 Mar 2015, 23:52; edited 1 time in total |
|||
25 Mar 2015, 00:48 |
|
redsock 25 Mar 2015, 00:58
HaHaAnonymous wrote: redsock Whoaoaoa man, take it easy on yourself... I didn't mean to offend you in any way, your earlier post with example actually was fun, and made me stop and consider the implications of what you were saying, which all told is a good thing here on the board. In fact, over my early lunch here I was coding the same argument up in C in a variety of different ways to see what gcc -O3 does with the same thing. Lighten up! I enjoyed your thought experiment. |
|||
25 Mar 2015, 00:58 |
|
redsock 25 Mar 2015, 01:27
Back to the point of the topic, and using HaHaAnonymous' example of counting the numbers and letters in a string, here's my $0.02 on the subject:
Since my C compiler can't work out my intent, even if I am willing to let it sit there and chew on my source for a very long time, this code is an interesting and simple example of exactly what I mean. So here we have the "poor man's simple version", which is to say, I wrote the getcount function exactly without much thought/care about _how_ it does it. Consider the following C code: Code: unsigned getcount(const char *s, int len) { unsigned ret = 0; for (int i = 0; i < len; i++) if ((s[i] >= 'A' && s[i] <= 'Z') || (s[i] >= 'a' && s[i] <= 'z')) ret += 0x1; else if (s[i] >= '0' && s[i] <= '9') ret += 0x10000; return ret; } Code: getcount: .LFB15: .cfi_startproc test esi, esi jle .L6 xor edx, edx xor eax, eax jmp .L5 .p2align 4,,10 .p2align 3 .L9: add rdx, 1 add eax, 1 cmp esi, edx jle .L8 .L5: movzx ecx, BYTE PTR [rdi+rdx] mov r8d, ecx and r8d, -33 sub r8d, 65 cmp r8b, 25 jbe .L9 sub ecx, 48 lea r8d, [rax+65536] cmp cl, 9 cmovbe eax, r8d add rdx, 1 cmp esi, edx jg .L5 .L8: rep ret .L6: xor eax, eax ret .cfi_endproc Code: unsigned getcount(const char *s, int len) { static const unsigned table[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x10000, 0x10000, 0x10000, 0x10000, 0x10000, 0x10000, 0x10000, 0x10000, 0x10000, 0x10000, 0, 0, 0, 0, 0, 0, 0, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0, 0, 0, 0, 0, 0, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; unsigned ret = 0; const char *e = s + len; while (s < e) ret += table[*s++]; return ret; } Code: getcount: .LFB15: .cfi_startproc movsx rsi, esi xor eax, eax add rsi, rdi cmp rdi, rsi jae .L4 .p2align 4,,10 .p2align 3 .L3: add rdi, 1 movsx rdx, BYTE PTR [rdi-1] add eax, DWORD PTR table.2189[0+rdx*4] cmp rdi, rsi jne .L3 rep ret .L4: rep ret .cfi_endproc Perhaps a better question is: Should we expect gcc to produce a table-based version of the original C function, or walk the const char * in a different fashion? I think it, as with most HLL compilers, must not stray too far from the original programmer's choices. Good C produces pretty decent assembler IMO, bad C does not. EDIT: first one wasn't in masm=intel for some reason, modified accordingly. |
|||
25 Mar 2015, 01:27 |
|
m3ntal 27 Mar 2015, 18:04
A highly experienced C+ASM programmer can write better code in C than an average programmer can write in ASM.
l_inc: In the past, I've used parallel arithmetic (MMX/XMM+) when they were first released for about 3-4 months, mainly inline ASM in VC6, but in the last 7+ years, I've been using mostly portable (.386) instructions that can be converted to other CPUs easily. (Sorry, too drunk last post, edited). HaHa: None of that's true. You're not "stupid". It just takes time, practice and dedication to learn programming. Never had any serious problem with you, always thought you were funny. When we get upset, it's only for a minute then life goes on. |
|||
27 Mar 2015, 18:04 |
|
m3ntal 27 Mar 2015, 18:43
Tyler: Exactly. Knowing and doing are 2 different things. Programming is not all about being able to memorize and utter scriptures. We must BE a good programmer ourselves. Knowledge is not everything, either. You've gotta be clever, inventive, imaginative and these are things that can't be taught.
|
|||
27 Mar 2015, 18:43 |
|
Tyler 28 Mar 2015, 03:28
m3ntal wrote: Tyler: Exactly. Knowing and doing are 2 different things. Programming is not all about being able to memorize and utter scriptures. We must BE a good programmer ourselves. Knowledge is not everything, either. You've gotta be clever, inventive, imaginative and these are things that can't be taught. I guess that is one of the draws of assembly. In assembly, almost nothing is trivial and almost everything provides an opportunity to make it interesting, just by trying to write it as optimized as possible... I had forgotten this in all my time away from assembly, but remember now that it was the reason I enjoyed it. |
|||
28 Mar 2015, 03:28 |
|
Goto page Previous 1, 2, 3, 4, 5, 6, 7 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.