flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
AsmGuru62 31 Mar 2015, 17:00
If you make few macros like this:
... Code: mov al, [int8] cbw push ax mov si, sp fiadd word [si] And, of course do not forget to restore stack. |
|||
![]() |
|
HaHaAnonymous 31 Mar 2015, 21:07
Quote:
But wouldn't the additional code alone require even more memory? Please, explain if possible... Thank you! |
|||
![]() |
|
revolution 01 Apr 2015, 10:10
Make it a function, not a macro, if it is used in many places. Otherwise if it is in some loop and only used in that one place then it makes no difference.
|
|||
![]() |
|
Kuemmel 01 Apr 2015, 15:37
@HaHaAnonymous:
I have something like 80 single bytes of necessary data that I need to process with some x87 code. So I would need additional 80 bytes of data for the word structure, so a small loop with load, CBW, store etc. covers much less space than 80 bytes, as reserving the extra memory for the 16 bit data is "free". I'll see...when I'm ready with the code I'll post it here. |
|||
![]() |
|
tthsqe 02 Apr 2015, 05:17
Hey Kuemmel!
have not seed you here for a long time. How's it going? Have you tried any 3D fractals yet? |
|||
![]() |
|
Kuemmel 02 Apr 2015, 15:34
Hi tthsqe,
indeed, just not really much time for coding. When I'm coding it's mostly under Risc OS on ARM dev boards for fun (e.g. on a Pandaboard). But got kind of curious on 256 byte DOS intros, as there are such nice productions out there on pouet. For RiscOS/ARM I ported a Mandelbulb GL shader language code to assembler, but it's still no way realtime...nowdays everything is GLSL coding regarding graphics and 3D (like what you see on www.shadertoy.com). But I'm not really a GLSL coder, just collecting ideas from there. These graphic cards nowdays are so powerfull...you can't keep up with even multi core assembler code doing those things in the CPU. |
|||
![]() |
|
bitRAKE 03 Apr 2015, 02:10
Code: fiadd word [di] ; load constant times 256 scasb ; next byte fmul [de_scale] ; 1/256 ![]() |
|||
![]() |
|
randall 03 Apr 2015, 08:45
Kuemmel wrote: These graphic cards nowdays are so powerfull...you can't keep up with even multi core assembler code doing those things in the CPU. Intel KNL is on the horizon: http://www.zdnet.com/article/intels-next-big-thing-knights-landing/ According to the article: "Knights Landing will deliver 3 teraflops double-precision and 6 teraflops single-precision." |
|||
![]() |
|
HaHaAnonymous 03 Apr 2015, 16:41
Quote:
You can have a GPU that will achieve the same performance for a tiny fraction of the price. But if you are rich enough, dive in... It is all I can say. D: I apologize for any inconveniences I may have caused. |
|||
![]() |
|
randall 03 Apr 2015, 18:20
HaHaAnonymous wrote:
3 teraflops double-precision GPU (NVIDIA K80) is also very expensive. |
|||
![]() |
|
Kuemmel 04 Apr 2015, 08:23
Code: fiadd word [di] ; load constant times 256 scasb ; next byte fmul [de_scale] ; 1/256 Nice idea ! "scasb"...more obscure opcodes that I still got to learn on their usage ...looks like compared to my ARM-stuck brain that the art in x86 is to find the "one" weird opcode that fits ![]() How "cheap" would I get my data address into "di" ? Just like lea di,data_label ? |
|||
![]() |
|
revolution 04 Apr 2015, 08:27
LEA would use an extra byte over MOV IIRC.
Also what is the advantage of "scasb" over "inc di"? |
|||
![]() |
|
bitRAKE 04 Apr 2015, 23:03
revolution wrote: Also what is the advantage of "scasb" over "inc di"? It would be nice to see the inner loop code. We can all have some fun pondering alternatives. ![]() My last size coding was just some sillyness. http://www.pouet.net/prod.php?which=61073 |
|||
![]() |
|
Kuemmel 14 Apr 2015, 20:43
...finally got some code running. It's basically a drawing of 12 cubic bezier lines in the form of the good old acorn logo represented as many dots (so it displays like an outline, no space for a line algo). Of course a bit stupid to also draw horizontal or vertical lines as bezier...but it's just an exercise o get me started on that weird but fun DOS 16bit coding. I'm down to 236 bytes
![]() ![]() For sure I missed out lots of size optimizations. If you spot some, just tell, any comment welcome. I could also provide a BASIC-code for explanation if you need. I'm still thinking about those string commands and rep loops...can they be of use here, for example in my byte to word copy routine ? Or is "in" and "out" usefull here ? Code: org 100h use16 max_bez_lines = 12 ;12 bezier lines max_bez_dots = 255 ;represented as 255 dots...no space for a line algo start: push 0a000h ;vga pop es mov al,13h ;mode 13, 320x200 int 10h ;copy routine to copy all the bezier data and 2 more variables from ;byte to word data for FPU access mov di,bez_b mov cl,(max_bez_lines*8+2) ;is ch always zero ? seems to be... L0: mov al,byte[di] mov word[di+bx+(max_bez_lines*8+2)],ax ;ah also seems to be zero... inc di inc bx ;bx is zero at .com start loop L0 mov di,count_w ;could be replaced by add offset, but not shorter mov bl,((max_bez_lines-1)*16) ;bh is zero already L1: mov word[di],cx ;cx is always zero here mov dx,max_bez_dots L2: fild word[di] ;get counter fidiv word[di-2] ;t = counter/max_segments fld1 ;get 1.0 fsub st0,st1 ;at=1.0-t | t fld st0 ;at | at | t fmul st0,st0 ;at*at | at | t fmul st0,st1 ;at*at*at | at | t fld st2 ;t | at*at*at | at | t fmul st0,st0 ;t*t | at*at*at | at | t fmul st0,st3 ;t*t*t | at*at*at | at | t fild word[di-4] ;3 | t*t*t | at*at*at | at | t fmul st0,st3 ;3*at ... fmul st0,st4 ;3*at*t ... fmul st3,st0 ;3*at*t | t*t*t | at*at*at | 3*at*at*t | t fmulp st4,st0 ;t*t*t | at*at*at | 3*at*at*t | 3*at*t*t lea si,[di+bx-196] ;(max_bez_lines*8+2)*2] mov cl,4 fldz fldz ;init nx | ny L3: fstp st6 ;store latest nx,ny from iteration same position each time fstp st6 ;b0 | b1 | b2 | b3 | nx | ny fld st0 ;b0 | b0 | b1 | b2 | b3 | nx | ny fimul word[si] ;b0*sx | b0 | b1 | b2 | b3 | nx | ny faddp st5,st0 ;b0 | b1 | b2 | b3 | nx_new | ny fimul word[si+2] ;b0*sy | b1 | b2 | b3 | nx_new | ny faddp st5,st0 ;b1 | b2 | b3 | nx_new | ny_new fld st4 fld st4 ;nx_new | ny_new | b1 | b2 | b3 add si,4 loop L3 fistp word[di+2] ;get x fistp word[di+4] ;get y imul si,word[di+4],320 add si,word[di+2] ;screen address is x + y * 320 mov byte[es:si],2 ;store pixel with green colour inc byte[di] ;inc counter for FPU dec dx jnz L2 sub bx,16 jns L1 L5: mov ah,01h ;wait for keyboard int 16h jz L5 mov ax,03h int 10h ret ;exit bez_b db 199,98, 120, 98,123, 4,196, 4 db 199,98, 120, 98,120, 98,199, 98 db 156,143,118,103,118,126,139,140 db 201,103,118,103,118,103,201,103 db 163,143,201,103,201,126,180,140 db 156,147,156,143,156,143,156,147 db 163,149,163,143,163,143,163,149 db 117,137,156,147,146,145,133,141 db 208,166,163,149,179,155,193,160 db 198,170,116,140,150,150,174,158 db 117,137,116,140,116,140,117,137 db 208,166,198,170,198,170,208,166 three_b db 3 max_seg_b db max_bez_dots bez_w dw 12*8 dup ? three_w dw 1 dup ? max_seg_w dw 1 dup ? count_w dw 1 dup ? coord dw 2 dup ? |
|||
![]() |
|
Kuemmel 21 May 2015, 18:21
...released my final version and some extra procedural graphics on pouet =>
http://www.pouet.net/prod.php?which=65630 |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.