flat assembler
Message board for the users of flat assembler.
Index
> DOS > Circle algorithm Goto page Previous 1, 2 |
Author |
|
bogdanontanu 11 Mar 2005, 16:37
Bresenham's circle algorithm
======================= This draws a circle from 0 to PI/4 by symetry you can draw all 8 points actually because of y extending down and inc/dec setup it is between 3*pi/2 and 7*pi/4 but you ca figure that out by yourself :p Code: ; setup t_crt = 2 - 3*radius x_crt = 1 y_crt = radius loop_circle: test t_crt,t_crt js move_on_y ; only/always move on x t = t + 4*x + 6 jmp next_pixel move_on_y: ; when we move on y t = t + 4*(x-y) + 10 dec y_crt next_pixel: ; draw the pixels here 8 pixels one for each octant inc x_crt cmp y_crt,x_crt ja loop_circle I agree that when carefully counting there are 2 compares but one of them is the default one at the end of the loop... after all we do need a loop to draw many points :p and in one path there are: +1 addition + one shift + one addition with a constant while on the other path there is: +1 addition + one shift + one substraction + one addition with a constant Of course it can be optimized with LEA for older CPU's It also depends how many calculations are actually made. Bresenham make exactly the number of pixels needed (but then the same does Taylor?) Of course using one IMUL or MUL for drawing pixels will make things extra slow but that is not the fault of the algorithm... is the fault of the programmer _________________ "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction." |
|||
11 Mar 2005, 16:37 |
|
Octavio 11 Mar 2005, 20:13
l1:
add bp,si inc si lea dx,[bp+di-rd/2] dec bx cmp dx,ax ja l1 sub di,cx dec cx inc ax cmp si,cx jna l1 11 instructions without the putpixels routine. loop_circle: test t_crt,t_crt js move_on_y ; only/always move on x t = t + 4*x + 6 jmp next_pixel move_on_y: ; when we move on y t = t + 4*(x-y) + 10 dec y_crt next_pixel: ; draw the pixels here 8 pixels one for each octant inc x_crt cmp y_crt,x_crt ja loop_circle This is not optimized assembly code but i think that at least 11 32bits instructions are needed and more in 16bits , why should it be faster? Quote:
that is not fault of the algorithm because is not part of the algorithm, but i can´t blame you ,because you didn´t post your faster code. |
|||
11 Mar 2005, 20:13 |
|
Matrix 11 Mar 2005, 23:43
ok, here goes fsincos again
read carefully Code: ;Circle 005F by Z - MATRIX org 256 mov al,13h ; ah=0 so -1 bytes int 10h push $a000 pop es finit ; init FPU call set320x200x256 mainloop: mov ecx,200 ; y call randomz3 mov bx,ax mov ecx,20 ; radius call randomz3 mov dx,ax mov ecx,256 ; color call randomz3 mov ch,al push cx mov ecx,320 ; x call randomz3 pop cx pushad call circle005f popad push eax xor ax,ax call bkeycheck cmp al,27 pop eax jnz mainloop call breadkey call set80x25t int 20h ;exit circle005f: ; ax=x bx=y ch=col dx=radius mov [x12],ax mov [y12],bx xor bx,bx mov [rad],dx fldz fstp dword [degcnt] fld1 fild dword [rad] fmul dword [prec] fdivp st1,st0 fld st0 fmul st,st0 fld1 fsubrp st1,st fsqrt fpatan fstp dword [cirad] .go: fld dword [degcnt] fld dword [cirad] faddp st1,st0 fst dword [degcnt] fsincos fimul dword [rad] ; (x) fist dword [tmpxnc] fmul dword [xcorrection] ; ( 1.2 ) fist dword [tmpx] fiadd dword [x12] ; ( cos ) fistp dword [xz12] fimul dword [rad] ; (y) fist dword [tmpy] mov bx,di fmul dword [xcorrection] ; ( 1.2 ) fistp dword [tmpycc] fild dword [tmpy] fiadd dword [y12] ; ( sin ) fistp dword [yz12] fwait mov eax,[yz12] lea edi,[4*eax+eax] shl di,6 add di,[xz12] cmp bx,di je .skip mov [es:di],ch sub di,word [tmpx] sub di,word [tmpx] mov [es:di],ch mov eax,[tmpy] lea eax,[4*eax+eax] shl ax,6 sub di,ax sub di,ax mov [es:di],ch add di,word [tmpx] add di,word [tmpx] mov [es:di],ch mov bx,di mov eax,[y12] add eax,[tmpxnc] lea edi,[4*eax+eax] shl di,6 add di,[x12] add di,[tmpycc] mov [es:di],ch sub di,word [tmpycc] sub di,word [tmpycc] mov [es:di],ch mov eax,[y12] sub eax,[tmpxnc] lea edi,[4*eax+eax] shl di,6 add di,[x12] sub di,[tmpycc] mov [es:di],ch add di,[tmpycc] add di,[tmpycc] mov [es:di],ch mov di,bx .skip: fld dword [pi_4] fcomp dword [degcnt] fstsw ax and ah,1001b jz .go ret degcnt: dd 0 cirad : dd ?;0.07 xcorrection : dd 1.2 prec: dd 4.0 ; prec 4, 666*2*2 circles per second (radius 99, xcorrection 1.2) rad : dd ? x12 : dd ? y12 : dd ? xz12 : dd ? yz12 : dd ? tmpx : dd ? tmpy : dd ? tmpxnc : dd ? tmpycc : dd ? pi_4: dd 0.785398163397448309615660845819876 set320x200x256: mov ax,$13 int $10 ret set80x25t: mov ax,$07 int $10 ret ;cls320x200x256: ;push $a000 ;pop es ;xor eax,eax ;mov di,ax ;mov cx,$3e80 ;rep stosd ;ret breadkey: ;returns: AH = BIOS scan code AL = ASCII character note: enhanced mov ah,$10 int $16 ret initrandomz: ; modifies edx, eax rdtsc ret randomz3: ; Z3 ecx=range transparent push ebx ; number returned is: 0 <= n < ecx push edx mov ebx,eax rdtsc mov dx,cx mov cl,al xor cl,ah ror ch,4 xor bl,ch and cx,$f rol bx,cl btc bx,cx xor bx,ax ror eax,16 xor bx,ax ror eax,16 mov cl,al xor cl,ah ror ch,4 xor bl,ch and cx,$f ror bx,cl btc bx,cx mov cl,bl .minorloop: rol eax,cl and cx,$f btc ax,cx loopw .minorloop mov cl,bh .majorloop: ror eax,cl and cl,$f btc ax,cx loopw .majorloop mov cx,dx mul ecx mov eax,edx pop edx pop ebx ret bkeycheck: ;returns: AH = BIOS scan code AL = ASCII character note: enhanced mov ah,0x11 ;Return: ZF set if no keystroke available, ZF clear if keystroke available int 0x16 ;only checks buffer without removing key ret |
|||
11 Mar 2005, 23:43 |
|
bogdanontanu 12 Mar 2005, 02:02
Yup i did not posted the optimized code, mainly because i was working to an general non-optimized version.
I do not really feel like tweaking instructions now, but i will do the teste in SolarOS using your code above and whatever i can squezee from my code when i feel like doing it. When i have a version i will post it here, So i guess you are after 16bits code? I was thinking more like 32bits but doh... |
|||
12 Mar 2005, 02:02 |
|
Octavio 12 Mar 2005, 10:20
bogdanontanu wrote: Yup i did not posted the optimized code, mainly because i was working to an general non-optimized version. I have already tested both versions with optimized 32bit code, the average time without draw pixels is 5.5 cpu clocks by iteration for both algorithms, so it was true that bresenham is not faster, but it has the advantage of using less variables,and a simpler initialization. |
|||
12 Mar 2005, 10:20 |
|
bogdanontanu 12 Mar 2005, 10:52
Aha, good to know
I will post my result when i have the time to make the tests |
|||
12 Mar 2005, 10:52 |
|
Dilshod 14 Mar 2005, 12:13
bogdanontanu,
You want to say that this is the Brasanhams Algorithm? I dont think so. Code: radius = 50 ' setup tcrt = 2 - 3 * radius xcrt = 1 ycrt = radius cic: IF tcrt > 0 THEN GOTO OnY tcrt = tcrt + 2 * xcrt + 6 GOTO nextp OnY: tcrt = tcrt + 4 * (xcrt - ycrt) + 10 ycrt = ycrt - 1 nextp: PSET (xcrt + 160, ycrt + 100) PSET (-xcrt + 160, ycrt + 100) PSET (xcrt + 160, -ycrt + 100) PSET (-xcrt + 160, -ycrt + 100) PSET (ycrt + 160, xcrt + 100) PSET (-ycrt + 160, xcrt + 100) PSET (ycrt + 160, -xcrt + 100) PSET (-ycrt + 160, -xcrt + 100) xcrt = xcrt + 1 IF xcrt < ycrt THEN GOTO cic try this program and my program yourself, and youll understand. May be I have any wring here in the code. and this code has more arithmetic operations then main, Is it? and How I said Brasanhams Algorithm not much differs main. |
|||
14 Mar 2005, 12:13 |
|
bogdanontanu 14 Mar 2005, 14:16
Yes that is the Bresenham algorithm
Ok, i will try and then i will understand Apparently Octavio already tested them... Anyway i will also test it myself to be sure. |
|||
14 Mar 2005, 14:16 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.