flat assembler
Message board for the users of flat assembler.
Index
> DOS > Hey, i'm back, 320X200 MODE STUFF?, use them with care! :) Goto page 1, 2 Next |
Author |
|
Tomasz Grysztar 04 Sep 2004, 23:06
Alternatively:
Code: putpixel320x200x256: ; al=color, bx=x, cx=y push $A000 pop es mov di,cx ; y shl cx,2 add di,cx ; 5y shl di,6 ; 320y add di,bx stosb ret |
|||
04 Sep 2004, 23:06 |
|
Matrix 04 Sep 2004, 23:24
Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul.
optimize this if u can : Code: cls320x200x256: push $a000 pop es xor eax,eax mov di,ax mov cx,$3e80 rep stosd ret Last edited by Matrix on 07 Sep 2004, 02:34; edited 1 time in total |
|||
04 Sep 2004, 23:24 |
|
Tomasz Grysztar 04 Sep 2004, 23:36
Matrix wrote: i agree with u because shl is slightly faster than mul. However this generally doesn't hold on modern processors. |
|||
04 Sep 2004, 23:36 |
|
Tomasz Grysztar 05 Sep 2004, 08:04
If you give such procedures to a beginner, it would be in a good manner to preserve and restore the ES register. On the other hand, for advanced users there shouldn't be ES setting code at all, the programmer should set up the ES to $A000 only once for the whole drawing process - this will make those procedures faster.
Also, some blitting procedure would be in my opinion much more useful in general case. I will put a new version of the "kelvar engine" example on this website soon, where one can find some nice blitters (for VESA modes, too). |
|||
05 Sep 2004, 08:04 |
|
Octavio 05 Sep 2004, 13:22
Matrix wrote: Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul. Yes i can, replace 'xor ax,ax' by 'xor eax,eax' a faster way is to disable the GPU ,this doubles the bandwith in some videocards.And also set a multi-plane videomode that allows you to set 32 pixels at a time. |
|||
05 Sep 2004, 13:22 |
|
neonz 05 Sep 2004, 15:22
Matrix wrote: Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul. Well, this will be faster for Pentium and newer CPUs: Code: cls320x200x256: xor eax, eax push $A000 pop es xor di, di mov cx, $3E80 rep stosd ret In my code, "pop es" and "xor di, di" instructions will execute simultaneously on P5+ CPUs and you will save 2 CPU cycles . I moved "xor eax, eax" to beginning of code, as instructions with operand size prefix (32bit instructions in 16bit code and 16bit instructions in 32bit code) can pair only if they are 1 byte instructions (like "push eax") or loaded from cache. You need "xor eax, eax" not "xor ax, ax" because you are using "rep stosd" not "rep stosw". |
|||
05 Sep 2004, 15:22 |
|
Matrix 05 Sep 2004, 16:01
Oh sorry for that, i meant EAX, it was just 3AM for me .
Say, u know somethin' too. MATRIX |
|||
05 Sep 2004, 16:01 |
|
Matrix 05 Sep 2004, 21:02
Privalov is right,
you should insert this line at the begining of code push es and you should insert this line at the ending of the code pop es this way you won't be surprised if somehow your program hangs, cause your procedure will be "transparent" - won't change es and this is also for ds - cause you might wanna change it too when working with strings. just take a look at the movsb, movsw, movsd, stosb, stosw, stosd section of your handbook MATRIX |
|||
05 Sep 2004, 21:02 |
|
Matrix 15 Oct 2004, 19:57
Hy,
i'm back with some code Code: putpixel320x200x256n: ; bx=x, ax=y, cl=color push es ; MATRIX PUTPIXEL 19/20 bytes push $A000 ; yeah, of cource it is nice you do this once in your program pop es ; memory usage is very slow cwd ; ax to eax, this is not needed if you put x in eax via movzx lea di,[4*eax+eax] ; 5y shl di,6 ; 5y*64 add di,bx mov [es:di],cl pop es ret putpixel320x200x256: ; es=segment, bx=x, ax=y, cl=color ; MATRIX PUTPIXEL 13/14 bytes cwd ; ax to eax, this is not needed if you put x in eax via movzx lea di,[4*eax+eax] ; 5y shl di,6 ; 5y*64 add di,bx mov [es:di],cl ret MATRIX Last edited by Matrix on 15 Oct 2004, 20:26; edited 2 times in total |
|||
15 Oct 2004, 19:57 |
|
Matrix 15 Oct 2004, 20:08
lets clear the screen
Code: cls320x200x256s: ; 18 bytes push es xor eax, eax push $A000 pop es xor di, di mov cx, $3E80 rep stosd pop es ret cls320x200x256n: ; 20 bytes mov bx,es ; 2 bytes mov ax,$a000 ; this will be faster because it is not using stack mov es,ax cbw cwd mov di,ax ;xor di,di ;move is simpler then xor mov cx, $3E80 rep stosd mov es,bx ; 2 bytes ret cls320x200x256: ;es=segment; 12 bytes xor eax,eax xor di,di mov cx,$3E80 rep stosd ret coloredcls320x200x256: ;es=segment, al=color ; 19 bytes mov ah,al mov cx,ax shl eax,16 mov ax,cx xor di,di mov cx,$3E80 rep stosd ret MATRIX |
|||
15 Oct 2004, 20:08 |
|
Slai 08 Mar 2006, 17:41
maybe this putpixelVGA code is faster ? the clock cycles are for 80486, and probably are not very correct
Code: macro pxl1 x,y,col { mov ax, y ; 1 mov bx, x ; 1 xchg ah, al ; 3 add bx, ax ; 1 shr ax, 2 ; 3? add bx, ax ; 1 mov al, col ; 1 mov [es:bx], al }; 8? |
|||
08 Mar 2006, 17:41 |
|
Madis731 09 Mar 2006, 09:30
You can roughly divide by three the clockcycles from 486 to current Pentiums.
Code: macro pxl1 x,y,col { mov ax, y ; 1 uop @ port 2 - CLK 1 mov bx, x ; 1 uop @ port 2 - CLK 2 (port 2 full) xchg ah, al ; 3 uops @ port 01 - CLK 2,3 add bx, ax ; 1 uop @ port 01 - CLK 3 shr ax, 2 ; 1 uop @ port 1+ 4 latency - CLK 4,5 add bx, ax ; 1 uop @ port 01 - CLK 6 mov al, col ; 1 uop @ port 2 - CLK 6 mov [es:bx], al }; 1 uop @ port 4 - CLK 6 6 clocks exactly when you are LUCKY - this means that you must start a clock on your "MOV AX,Y" instruction and can expect 6th clock to end at "MOV [ES:BX], al |
|||
09 Mar 2006, 09:30 |
|
Hayden 25 Jun 2007, 02:07
heres is a very fast pixel proc that i was made aware of last year...
Code: ; Very fast pixel proc for mode 13h - 32-bit p/m code ; btw, #A0000 / 8 = #14000 macro PutPixel x, y, c { mov ebx, x mov edx, y mov cl, c lea edx, [edx + edx*4] lea edx, [edx*8 + 14000h] mov [ebx + edx*8], cl } _________________ New User.. Hayden McKay. |
|||
25 Jun 2007, 02:07 |
|
edfed 05 Oct 2007, 00:42
more fast is this, no?
Code: ;es is the video memory or video buffer as you want ;x and y are contiguous dwords in memory ;al=color putpxl: mov edi,screen13h.xl imul edi,[Y] add edi,[X] stosb ret Last edited by edfed on 30 Dec 2008, 18:07; edited 1 time in total |
|||
05 Oct 2007, 00:42 |
|
rain_storm 06 Oct 2007, 17:07
"add edi,[X]" ?? Dont you mean "add eax,[X]"
|
|||
06 Oct 2007, 17:07 |
|
vid 06 Oct 2007, 21:56
no, he doesn't
|
|||
06 Oct 2007, 21:56 |
|
rugxulo 07 Oct 2007, 02:59
edfed, I would assume using imul and stosb would be a good amount slower than Hayden's method. But feel free to prove me wrong!
|
|||
07 Oct 2007, 02:59 |
|
edfed 08 Oct 2007, 18:09
imul exists to be used so i use this
and it is more evolutive nananananèreu if you look at the new pentiums timings you'll see that imul is fast and only five instructions for a pixel. yes!!!! |
|||
08 Oct 2007, 18:09 |
|
Sahrian 09 Oct 2007, 15:07
edfed, I'm sorry for you but rugxulo is right. The problem is not imul, but stosb. imul is slow on older CPUs also.
|
|||
09 Oct 2007, 15:07 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.