flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
Tomasz Grysztar
Alternatively:
Code: putpixel320x200x256: ; al=color, bx=x, cx=y push $A000 pop es mov di,cx ; y shl cx,2 add di,cx ; 5y shl di,6 ; 320y add di,bx stosb ret |
|||
![]() |
|
Matrix
Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul.
![]() optimize this if u can : Code: cls320x200x256: push $a000 pop es xor eax,eax mov di,ax mov cx,$3e80 rep stosd ret Last edited by Matrix on 07 Sep 2004, 02:34; edited 1 time in total |
|||
![]() |
|
Tomasz Grysztar
Matrix wrote: i agree with u because shl is slightly faster than mul. However this generally doesn't hold on modern processors. |
|||
![]() |
|
Tomasz Grysztar
If you give such procedures to a beginner, it would be in a good manner to preserve and restore the ES register. On the other hand, for advanced users there shouldn't be ES setting code at all, the programmer should set up the ES to $A000 only once for the whole drawing process - this will make those procedures faster.
Also, some blitting procedure would be in my opinion much more useful in general case. I will put a new version of the "kelvar engine" example on this website soon, where one can find some nice blitters (for VESA modes, too). |
|||
![]() |
|
Octavio
Matrix wrote: Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul. Yes i can, replace 'xor ax,ax' by 'xor eax,eax' a faster way is to disable the GPU ,this doubles the bandwith in some videocards.And also set a multi-plane videomode that allows you to set 32 pixels at a time. |
|||
![]() |
|
neonz
Matrix wrote: Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul. Well, this will be faster for Pentium and newer CPUs: Code: cls320x200x256: xor eax, eax push $A000 pop es xor di, di mov cx, $3E80 rep stosd ret In my code, "pop es" and "xor di, di" instructions will execute simultaneously on P5+ CPUs and you will save 2 CPU cycles ![]() |
|||
![]() |
|
Matrix
Oh sorry for that, i meant EAX, it was just 3AM for me
![]() Say, u know somethin' too. ![]() MATRIX |
|||
![]() |
|
Matrix
Privalov is right,
you should insert this line at the begining of code push es and you should insert this line at the ending of the code pop es this way you won't be surprised if somehow your program hangs, cause your procedure will be "transparent" - won't change es and this is also for ds - cause you might wanna change it too when working with strings. just take a look at the movsb, movsw, movsd, stosb, stosw, stosd section of your handbook ![]() MATRIX |
|||
![]() |
|
Matrix
Hy,
i'm back with some code Code: putpixel320x200x256n: ; bx=x, ax=y, cl=color push es ; MATRIX PUTPIXEL 19/20 bytes push $A000 ; yeah, of cource it is nice you do this once in your program pop es ; memory usage is very slow cwd ; ax to eax, this is not needed if you put x in eax via movzx lea di,[4*eax+eax] ; 5y shl di,6 ; 5y*64 add di,bx mov [es:di],cl pop es ret putpixel320x200x256: ; es=segment, bx=x, ax=y, cl=color ; MATRIX PUTPIXEL 13/14 bytes cwd ; ax to eax, this is not needed if you put x in eax via movzx lea di,[4*eax+eax] ; 5y shl di,6 ; 5y*64 add di,bx mov [es:di],cl ret MATRIX Last edited by Matrix on 15 Oct 2004, 20:26; edited 2 times in total |
|||
![]() |
|
Matrix
lets clear the screen
Code: cls320x200x256s: ; 18 bytes push es xor eax, eax push $A000 pop es xor di, di mov cx, $3E80 rep stosd pop es ret cls320x200x256n: ; 20 bytes mov bx,es ; 2 bytes mov ax,$a000 ; this will be faster because it is not using stack mov es,ax cbw cwd mov di,ax ;xor di,di ;move is simpler then xor mov cx, $3E80 rep stosd mov es,bx ; 2 bytes ret cls320x200x256: ;es=segment; 12 bytes xor eax,eax xor di,di mov cx,$3E80 rep stosd ret coloredcls320x200x256: ;es=segment, al=color ; 19 bytes mov ah,al mov cx,ax shl eax,16 mov ax,cx xor di,di mov cx,$3E80 rep stosd ret MATRIX |
|||
![]() |
|
Slai
maybe this putpixelVGA code is faster ? the clock cycles are for 80486, and probably are not very correct
![]() Code: macro pxl1 x,y,col { mov ax, y ; 1 mov bx, x ; 1 xchg ah, al ; 3 add bx, ax ; 1 shr ax, 2 ; 3? add bx, ax ; 1 mov al, col ; 1 mov [es:bx], al }; 8? |
|||
![]() |
|
Madis731
You can roughly divide by three the clockcycles from 486 to current Pentiums.
Code: macro pxl1 x,y,col { mov ax, y ; 1 uop @ port 2 - CLK 1 mov bx, x ; 1 uop @ port 2 - CLK 2 (port 2 full) xchg ah, al ; 3 uops @ port 01 - CLK 2,3 add bx, ax ; 1 uop @ port 01 - CLK 3 shr ax, 2 ; 1 uop @ port 1+ 4 latency - CLK 4,5 add bx, ax ; 1 uop @ port 01 - CLK 6 mov al, col ; 1 uop @ port 2 - CLK 6 mov [es:bx], al }; 1 uop @ port 4 - CLK 6 6 clocks exactly when you are LUCKY - this means that you must start a clock on your "MOV AX,Y" instruction and can expect 6th clock to end at "MOV [ES:BX], al |
|||
![]() |
|
Hayden
heres is a very fast pixel proc that i was made aware of last year...
Code: ; Very fast pixel proc for mode 13h - 32-bit p/m code ; btw, #A0000 / 8 = #14000 macro PutPixel x, y, c { mov ebx, x mov edx, y mov cl, c lea edx, [edx + edx*4] lea edx, [edx*8 + 14000h] mov [ebx + edx*8], cl } _________________ New User.. Hayden McKay. |
|||
![]() |
|
edfed
more fast is this, no?
Code: ;es is the video memory or video buffer as you want ;x and y are contiguous dwords in memory ;al=color putpxl: mov edi,screen13h.xl imul edi,[Y] add edi,[X] stosb ret Last edited by edfed on 30 Dec 2008, 18:07; edited 1 time in total |
|||
![]() |
|
rain_storm
"add edi,[X]" ?? Dont you mean "add eax,[X]"
|
|||
![]() |
|
vid
no, he doesn't
![]() |
|||
![]() |
|
rugxulo
edfed, I would assume using imul and stosb would be a good amount slower than Hayden's method. But feel free to prove me wrong!
![]() |
|||
![]() |
|
edfed
imul exists to be used so i use this
and it is more evolutive nananananèreu if you look at the new pentiums timings you'll see that imul is fast and only five instructions for a pixel. yes!!!! |
|||
![]() |
|
Sahrian
edfed, I'm sorry for you but rugxulo is right. The problem is not imul, but stosb. imul is slow on older CPUs also.
|
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.