flat assembler
Message board for the users of flat assembler.
Index
> Main > Bresenham's linedrawing algorithm in ASM [SOLVED] Goto page Previous 1, 2 |
Author |
|
Madis731 03 Aug 2006, 11:26
AAAAAAAAAAAAAAAAHHHHHHHHHHH so sweeet http://enos.itcollege.ee/~mkalme/PAHN/MenuetOS/untitled.PNG
every SINGLE blue line is in its place (A) nothing's off... and I think you want to know what was wrong: Code: lea rdx,[rix+rdi] test rdx,rdx jne @f neg rdi neg rix @@: This was missing from the part where dx>dy I had to put a name to the jump because the new code introduced a jump Look why @@s are bad Code: cmp rex,rfx jc .ver2 xor rsi,rsi ; xinc1 = 0 xor rjx,rjx ; yinc2 = 0 ;=>RIGHT HERE mov rkx,rex ; den = deltax mov rlx,rex shr rlx,1 ; num = deltax/2 mov rbp,rfx ; numadd = deltay mov rdx,rex ; numpixels = deltax jmp .endloc .ver2: THANKs for eberybody who contributed! You got me closer to the answer and finally its here...now I'll try to make this code fragment more transparent and fit it somewhere in the code...meaning - I will optimize a bit Last edited by Madis731 on 03 Aug 2006, 11:37; edited 2 times in total |
|||
03 Aug 2006, 11:26 |
|
shoorick 03 Aug 2006, 11:33
|
|||
03 Aug 2006, 11:33 |
|
Madis731 03 Aug 2006, 11:59
ARGH you won't believe how angry I am with MYSELF !!!!
I wrote down RSI, RDI for Xs and RIX, RJX for Ys, but EVERYWHERE else I used them as RSI, RIX for Xs and RDI, RJX for Ys. I wondered why this fragment of code fitted everywhere and then I finally just optimized it to: Code:
xchg rdi,rix
C'mon - everyone can only DREAM about this kind of optimization. THEN I remembered what Shoorick said and Voila! there it was - rsi and rdi standing as if they where Xs Okay - the WORKING unoptimized, but small and fast code Code: mov rex,rcx ;rbx=x2 sub rex,rax ;rax=x1 sbb rhx,rhx xor rex,rhx sub rex,rhx ;rex=abs(rcx-rax) mov rfx,rdx ;rdx=y2 sub rfx,rbx ;rcx=y1 sbb rhx,rhx xor rfx,rhx sub rfx,rhx ;rfx=abs(rdx-rbx) or rsi,-1 or rix,-1 cmp rcx,rax jc .X2notlessthanX1 neg rsi neg rix .X2notlessthanX1: or rdi,-1 or rjx,-1 cmp rdx,rbx jc .Y2notlessthanY1 neg rdi neg rjx .Y2notlessthanY1: cmp rex,rfx jc .moreYthanX xor rsi,rsi ; xinc1 = 0 xor rjx,rjx ; yinc2 = 0 mov rkx,rex ; den = deltax mov rlx,rex shr rlx,1 ; num = deltax/2 mov rbp,rfx ; numadd = deltay mov rdx,rex ; numpixels = deltax jmp .endadjust .moreYthanX: xor rdi,rdi ; xinc2 = 0 xor rix,rix ; yinc1 = 0 mov rkx,rfx ; den=deltay mov rlx,rfx shr rlx,1 ; num=deltay/2 mov rbp,rex ; numadd = deltax mov rdx,rfx ; numpixels = deltay .endadjust: .linedraw: push rax rbx rcx rdx rex rfx rsi rdi rix rjx mov rex,rbx add rex,1 mov rdx,rax mov rcx,rbx mov rbx,rax mov rax,38 mov rfx,255 int 60h pop rjx rix rdi rsi rfx rex rdx rcx rbx rax add rlx,rbp ; num += numadd cmp rlx,rkx ; num >= den jc @f sub rlx,rkx ; num -= den add rax,rsi add rbx,rdi @@: add rax,rix add rbx,rjx sub rdx,1 jnc .linedraw Talk about Optimized to death (http://board.flatassembler.net/topic.php?t=5584) |
|||
03 Aug 2006, 11:59 |
|
r22 03 Aug 2006, 21:51
Slightly faster absolute value using compare and move instruction
Code: mov reg2, reg1 neg reg1 cmovl reg1, reg2 ;;;; cmovb if using UNsigned integers |
|||
03 Aug 2006, 21:51 |
|
Tomasz Grysztar 04 Aug 2006, 04:04
r22 wrote:
But the absolute value doesn't make much sense with the UNsinged integers, does it? |
|||
04 Aug 2006, 04:04 |
|
Madis731 04 Aug 2006, 07:20
Not the UNsigned integers themself, but the difference of two unsigned ones...
@r22: Did you test it on AMD/Intel? I've heard that on AMD the conditional move is slow and cmp/jcc is faster :S |
|||
04 Aug 2006, 07:20 |
|
Tomasz Grysztar 04 Aug 2006, 08:54
Madis731 wrote: Not the UNsigned integers themself, but the difference of two unsigned ones... For difference it doesn't really matter whether it is of signed or unsigned values. |
|||
04 Aug 2006, 08:54 |
|
r22 04 Aug 2006, 12:10
Very true, BUT the comment would be valid IF it was used for a min/max function. That's exactly why I don't comment code, half the time I just type nonsense after ;'s.
The AMD64 optimization guide shows the cmov reg,reg as having a 1 clock latency and I've tested it on my AMD64 x2 3800+ and it's shown to be slightly faster than the carry method. |
|||
04 Aug 2006, 12:10 |
|
Madis731 05 Aug 2006, 14:48
@Tomasz: Of course - you are right - I wasn't thinking clearly. It does NOT matter! And more importantly you can even use it on signed values (i.e. if one or both of the source operands are negative in this sence).
|
|||
05 Aug 2006, 14:48 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.