Hey, i'm back, 320X200 MODE STUFF?, use them with care! :)

Index > DOS > Hey, i'm back, 320X200 MODE STUFF?, use them with care! :)

Goto page 1, 2 Next

Author

Thread

Matrix

Joined: 04 Sep 2004
Posts: 1164
Location: Overflow

Matrix 04 Sep 2004, 22:55

Very fast routines for X mode

Code:

set320x200x256:
mov ax,$13
int $10
ret

set80x25t:
mov ax,$03
int $10
ret

putpixel320x200x256a0: ; al=color bx=x cx=y, 17 bytes
push $a000
pop es
push ax
mov ax,320
mul cx
add ax,bx
mov di,ax
pop ax
stosb
ret

however i have made a better version with mul 2004.10.05

Code:

putpixel320x200x256a1: ; al=color bx=x cx=y , uses mul 18 bytes
push es
mov ax,320
mul cx
mov di,ax
mov cx,$a000
mov es,cx
add di,bx
stosb
pop es
ret

putpixel320x200x256a2: ; bx=x, dx=y, cl=color, uses mul 20 bytes
push es
mov ax,320
mul dx
mov di,ax
mov dx,$a000
mov es,dx
add di,bx
mov [es:di],cl
pop es
ret

MATRIX

Last edited by Matrix on 15 Oct 2004, 21:02; edited 2 times in total

04 Sep 2004, 22:55

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 04 Sep 2004, 23:06

Alternatively:

Code:

putpixel320x200x256:
; al=color, bx=x, cx=y
        push    $A000
        pop     es
        mov     di,cx   ; y
        shl     cx,2
        add     di,cx   ; 5y
        shl     di,6    ; 320y
        add     di,bx
        stosb
        ret

04 Sep 2004, 23:06

Matrix

Joined: 04 Sep 2004
Posts: 1164
Location: Overflow

Matrix 04 Sep 2004, 23:24

Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul. Smile

optimize this if u can :

Code:

cls320x200x256:
push $a000
pop es
xor eax,eax
mov di,ax
mov cx,$3e80
rep stosd
ret

Last edited by Matrix on 07 Sep 2004, 02:34; edited 1 time in total

04 Sep 2004, 23:24

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 04 Sep 2004, 23:36

Matrix wrote:

i agree with u because shl is slightly faster than mul.

However this generally doesn't hold on modern processors.

04 Sep 2004, 23:36

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 05 Sep 2004, 08:04

If you give such procedures to a beginner, it would be in a good manner to preserve and restore the ES register. On the other hand, for advanced users there shouldn't be ES setting code at all, the programmer should set up the ES to $A000 only once for the whole drawing process - this will make those procedures faster.

Also, some blitting procedure would be in my opinion much more useful in general case. I will put a new version of the "kelvar engine" example on this website soon, where one can find some nice blitters (for VESA modes, too).

05 Sep 2004, 08:04

Octavio

Joined: 21 Jun 2003
Posts: 366
Location: Spain

Octavio 05 Sep 2004, 13:22

Matrix wrote:

Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul.
optimize this if u can :
Code:
cls320x200x256:
push $a000
pop es
xor ax,ax
mov di,ax
mov cx,$3e80
rep stosd
ret    

Yes i can, replace 'xor ax,ax' by 'xor eax,eax'
a faster way is to disable the GPU ,this doubles the bandwith in some videocards.And also set a multi-plane videomode that allows you
to set 32 pixels at a time.

05 Sep 2004, 13:22

neonz

Joined: 02 Aug 2003
Posts: 62
Location: Latvia

neonz 05 Sep 2004, 15:22

Matrix wrote:

Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul.
optimize this if u can :
Code:
cls320x200x256:
push $a000
pop es
xor ax,ax
mov di,ax
mov cx,$3e80
rep stosd
ret    

Well, this will be faster for Pentium and newer CPUs:

Code:

cls320x200x256:
   xor eax, eax
   push $A000
   pop es
   xor di, di
   mov cx, $3E80
   rep stosd
   ret

In my code, "pop es" and "xor di, di" instructions will execute simultaneously on P5+ CPUs and you will save 2 CPU cycles Smile

. I moved "xor eax, eax" to beginning of code, as instructions with operand size prefix (32bit instructions in 16bit code and 16bit instructions in 32bit code) can pair only if they are 1 byte instructions (like "push eax") or loaded from cache. You need "xor eax, eax" not "xor ax, ax" because you are using "rep stosd" not "rep stosw".

05 Sep 2004, 15:22

Matrix

Joined: 04 Sep 2004
Posts: 1164
Location: Overflow

Matrix 05 Sep 2004, 16:01

Oh sorry for that, i meant EAX, it was just 3AM for me Smile

.
Say, u know somethin' too. Smile

MATRIX

05 Sep 2004, 16:01

Matrix

Joined: 04 Sep 2004
Posts: 1164
Location: Overflow

Matrix 05 Sep 2004, 21:02

Privalov is right,
you should insert this line at the begining of code
push es

and you should insert this line at the ending of the code
pop es

this way you won't be surprised if somehow your program hangs,
cause your procedure will be "transparent" -
won't change es

and this is also for ds - cause you might wanna change it too when working with strings.
just take a look at the movsb, movsw, movsd, stosb, stosw, stosd section of your handbook Smile

MATRIX

05 Sep 2004, 21:02

Matrix

Joined: 04 Sep 2004
Posts: 1164
Location: Overflow

Matrix 15 Oct 2004, 19:57

Hy,
i'm back with some code

Code:

putpixel320x200x256n: ; bx=x, ax=y, cl=color
push    es           ; MATRIX PUTPIXEL 19/20 bytes
push    $A000 ; yeah, of cource it is nice you do this once in your program
pop     es    ; memory usage is very slow
cwd ; ax to eax, this is not needed if you put x in eax via movzx
lea di,[4*eax+eax] ; 5y
shl di,6 ; 5y*64
add di,bx
mov [es:di],cl
pop     es
ret

putpixel320x200x256: ; es=segment, bx=x, ax=y, cl=color
                     ; MATRIX PUTPIXEL 13/14 bytes
cwd ; ax to eax, this is not needed if you put x in eax via movzx
lea di,[4*eax+eax] ; 5y
shl di,6 ; 5y*64
add di,bx
mov [es:di],cl
ret

MATRIX

Last edited by Matrix on 15 Oct 2004, 20:26; edited 2 times in total

15 Oct 2004, 19:57

Matrix

Joined: 04 Sep 2004
Posts: 1164
Location: Overflow

Matrix 15 Oct 2004, 20:08

lets clear the screen

Code:

cls320x200x256s: ; 18 bytes
push es
   xor eax, eax
   push $A000
   pop es
   xor di, di
   mov cx, $3E80
   rep stosd
pop es
   ret

cls320x200x256n: ; 20 bytes
mov bx,es ; 2 bytes
mov ax,$a000    ; this will be faster because it is not using stack
mov es,ax
cbw
cwd
mov di,ax ;xor di,di ;move is simpler then xor
mov cx, $3E80
rep stosd
mov es,bx ; 2 bytes
ret

cls320x200x256: ;es=segment; 12 bytes
xor eax,eax
xor di,di
mov cx,$3E80
rep stosd
ret

coloredcls320x200x256: ;es=segment, al=color ; 19 bytes
mov ah,al
mov cx,ax
shl eax,16
mov ax,cx
xor di,di
mov cx,$3E80
rep stosd
ret

MATRIX

15 Oct 2004, 20:08

Slai

Joined: 11 Jan 2006
Posts: 40
Location: NY/Bulgaria

Slai 08 Mar 2006, 17:41

maybe this putpixelVGA code is faster ? the clock cycles are for 80486, and probably are not very correct Smile

Code:

 macro pxl1 x,y,col {
        mov  ax, y        ; 1
        mov  bx, x        ; 1
        xchg ah, al       ; 3
        add  bx, ax       ; 1
        shr  ax, 2        ; 3?
        add  bx, ax       ; 1
        mov  al, col      ; 1
        mov  [es:bx], al }; 8?

08 Mar 2006, 17:41

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 09 Mar 2006, 09:30

You can roughly divide by three the clockcycles from 486 to current Pentiums.

Code:

 macro pxl1 x,y,col {
        mov  ax, y        ; 1 uop @ port 2 - CLK 1
        mov  bx, x        ; 1 uop @ port 2 - CLK 2 (port 2 full)
        xchg ah, al       ; 3 uops @ port 01 - CLK 2,3
        add  bx, ax       ; 1 uop @ port 01 - CLK 3
        shr  ax, 2        ; 1 uop @ port 1+ 4 latency - CLK 4,5
        add  bx, ax       ; 1 uop @ port 01 - CLK 6
        mov  al, col      ; 1 uop @ port 2 - CLK 6
        mov  [es:bx], al }; 1 uop @ port 4 - CLK 6

6 clocks exactly when you are LUCKY - this means that you must start a clock on your "MOV AX,Y" instruction and can expect 6th clock to end at "MOV [ES:BX], al

09 Mar 2006, 09:30

Hayden

Joined: 06 Oct 2005
Posts: 132

Hayden 25 Jun 2007, 02:07

heres is a very fast pixel proc that i was made aware of last year...

Code:

; Very fast pixel proc for mode 13h - 32-bit p/m code
; btw, #A0000 / 8 = #14000

macro PutPixel x, y, c
{
    mov ebx, x
    mov edx, y
    mov cl,  c
    lea edx, [edx + edx*4]
    lea edx, [edx*8 + 14000h]
    mov [ebx + edx*8],  cl
}

_________________
New User.. Hayden McKay.

25 Jun 2007, 02:07

edfed

Joined: 20 Feb 2006
Posts: 4350
Location: Now

edfed 05 Oct 2007, 00:42

more fast is this, no?

Code:

;es is the video memory or video buffer as you want
;x and y are contiguous dwords in memory
;al=color
putpxl:
  mov edi,screen13h.xl
  imul edi,[Y]
  add edi,[X]
  stosb  
  ret

Last edited by edfed on 30 Dec 2008, 18:07; edited 1 time in total

05 Oct 2007, 00:42

rain_storm

Joined: 05 Apr 2007
Posts: 67
Location: Ireland

rain_storm 06 Oct 2007, 17:07

"add edi,[X]" ?? Dont you mean "add eax,[X]"

06 Oct 2007, 17:07

vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia

vid 06 Oct 2007, 21:56

no, he doesn't Wink

06 Oct 2007, 21:56

rugxulo

Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)

rugxulo 07 Oct 2007, 02:59

edfed, I would assume using imul and stosb would be a good amount slower than Hayden's method. But feel free to prove me wrong! Laughing

07 Oct 2007, 02:59

edfed

Joined: 20 Feb 2006
Posts: 4350
Location: Now

edfed 08 Oct 2007, 18:09

imul exists to be used so i use this
and it is more evolutive
nananananèreu

if you look at the new pentiums timings you'll see that imul is fast

and only five instructions for a pixel. yes!!!!

_________________
Smile

fool gitlab fasmstuff
gitlab foolstuff

08 Oct 2007, 18:09

Sahrian

Joined: 17 Mar 2007
Posts: 16

Sahrian 09 Oct 2007, 15:07

edfed, I'm sorry for you but rugxulo is right. The problem is not imul, but stosb. imul is slow on older CPUs also.

09 Oct 2007, 15:07

Goto page 1, 2 Next

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum