flat assembler
Message board for the users of flat assembler.

Index > DOS > Hey, i'm back, 320X200 MODE STUFF?, use them with care! :)

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 04 Sep 2004, 22:55
Very fast routines for X mode

Code:
set320x200x256:
mov ax,$13
int $10
ret

set80x25t:
mov ax,$03
int $10
ret

putpixel320x200x256a0: ; al=color bx=x cx=y, 17 bytes
push $a000
pop es
push ax
mov ax,320
mul cx
add ax,bx
mov di,ax
pop ax
stosb
ret    


however i have made a better version with mul 2004.10.05

Code:
putpixel320x200x256a1: ; al=color bx=x cx=y , uses mul 18 bytes
push es
mov ax,320
mul cx
mov di,ax
mov cx,$a000
mov es,cx
add di,bx
stosb
pop es
ret

putpixel320x200x256a2: ; bx=x, dx=y, cl=color, uses mul 20 bytes
push es
mov ax,320
mul dx
mov di,ax
mov dx,$a000
mov es,dx
add di,bx
mov [es:di],cl
pop es
ret
    


MATRIX


Last edited by Matrix on 15 Oct 2004, 21:02; edited 2 times in total
Post 04 Sep 2004, 22:55
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8358
Location: Kraków, Poland
Tomasz Grysztar 04 Sep 2004, 23:06
Alternatively:
Code:
putpixel320x200x256:
; al=color, bx=x, cx=y
        push    $A000
        pop     es
        mov     di,cx   ; y
        shl     cx,2
        add     di,cx   ; 5y
        shl     di,6    ; 320y
        add     di,bx
        stosb
        ret    
Post 04 Sep 2004, 23:06
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 04 Sep 2004, 23:24
Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul. Smile
optimize this if u can :

Code:
cls320x200x256:
push $a000
pop es
xor eax,eax
mov di,ax
mov cx,$3e80
rep stosd
ret    


Last edited by Matrix on 07 Sep 2004, 02:34; edited 1 time in total
Post 04 Sep 2004, 23:24
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8358
Location: Kraków, Poland
Tomasz Grysztar 04 Sep 2004, 23:36
Matrix wrote:
i agree with u because shl is slightly faster than mul. Smile

However this generally doesn't hold on modern processors.
Post 04 Sep 2004, 23:36
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8358
Location: Kraków, Poland
Tomasz Grysztar 05 Sep 2004, 08:04
If you give such procedures to a beginner, it would be in a good manner to preserve and restore the ES register. On the other hand, for advanced users there shouldn't be ES setting code at all, the programmer should set up the ES to $A000 only once for the whole drawing process - this will make those procedures faster.

Also, some blitting procedure would be in my opinion much more useful in general case. I will put a new version of the "kelvar engine" example on this website soon, where one can find some nice blitters (for VESA modes, too).
Post 05 Sep 2004, 08:04
View user's profile Send private message Visit poster's website Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio 05 Sep 2004, 13:22
Matrix wrote:
Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul. Smile
optimize this if u can :

Code:
cls320x200x256:
push $a000
pop es
xor ax,ax
mov di,ax
mov cx,$3e80
rep stosd
ret    


Yes i can, replace 'xor ax,ax' by 'xor eax,eax'
a faster way is to disable the GPU ,this doubles the bandwith in some videocards.And also set a multi-plane videomode that allows you
to set 32 pixels at a time.
Post 05 Sep 2004, 13:22
View user's profile Send private message Visit poster's website Reply with quote
neonz



Joined: 02 Aug 2003
Posts: 62
Location: Latvia
neonz 05 Sep 2004, 15:22
Matrix wrote:
Okay, i see someone knows somethin' here, i agree with u because shl is slightly faster than mul. Smile
optimize this if u can :

Code:
cls320x200x256:
push $a000
pop es
xor ax,ax
mov di,ax
mov cx,$3e80
rep stosd
ret    


Well, this will be faster for Pentium and newer CPUs:

Code:
cls320x200x256:
   xor eax, eax
   push $A000
   pop es
   xor di, di
   mov cx, $3E80
   rep stosd
   ret
    


In my code, "pop es" and "xor di, di" instructions will execute simultaneously on P5+ CPUs and you will save 2 CPU cycles Smile. I moved "xor eax, eax" to beginning of code, as instructions with operand size prefix (32bit instructions in 16bit code and 16bit instructions in 32bit code) can pair only if they are 1 byte instructions (like "push eax") or loaded from cache. You need "xor eax, eax" not "xor ax, ax" because you are using "rep stosd" not "rep stosw".
Post 05 Sep 2004, 15:22
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 05 Sep 2004, 16:01
Oh sorry for that, i meant EAX, it was just 3AM for me Smile.
Say, u know somethin' too. Smile
MATRIX
Post 05 Sep 2004, 16:01
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 05 Sep 2004, 21:02
Privalov is right,
you should insert this line at the begining of code
push es

and you should insert this line at the ending of the code
pop es

this way you won't be surprised if somehow your program hangs,
cause your procedure will be "transparent" -
won't change es

and this is also for ds - cause you might wanna change it too when working with strings.
just take a look at the movsb, movsw, movsd, stosb, stosw, stosd section of your handbook Smile


MATRIX
Post 05 Sep 2004, 21:02
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 15 Oct 2004, 19:57
Hy,
i'm back with some code

Code:
putpixel320x200x256n: ; bx=x, ax=y, cl=color
push    es           ; MATRIX PUTPIXEL 19/20 bytes
push    $A000 ; yeah, of cource it is nice you do this once in your program
pop     es    ; memory usage is very slow
cwd ; ax to eax, this is not needed if you put x in eax via movzx
lea di,[4*eax+eax] ; 5y
shl di,6 ; 5y*64
add di,bx
mov [es:di],cl
pop     es
ret

putpixel320x200x256: ; es=segment, bx=x, ax=y, cl=color
                     ; MATRIX PUTPIXEL 13/14 bytes
cwd ; ax to eax, this is not needed if you put x in eax via movzx
lea di,[4*eax+eax] ; 5y
shl di,6 ; 5y*64
add di,bx
mov [es:di],cl
ret
    


MATRIX


Last edited by Matrix on 15 Oct 2004, 20:26; edited 2 times in total
Post 15 Oct 2004, 19:57
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 15 Oct 2004, 20:08
lets clear the screen
Code:
cls320x200x256s: ; 18 bytes
push es
   xor eax, eax
   push $A000
   pop es
   xor di, di
   mov cx, $3E80
   rep stosd
pop es
   ret

cls320x200x256n: ; 20 bytes
mov bx,es ; 2 bytes
mov ax,$a000    ; this will be faster because it is not using stack
mov es,ax
cbw
cwd
mov di,ax ;xor di,di ;move is simpler then xor
mov cx, $3E80
rep stosd
mov es,bx ; 2 bytes
ret

cls320x200x256: ;es=segment; 12 bytes
xor eax,eax
xor di,di
mov cx,$3E80
rep stosd
ret

coloredcls320x200x256: ;es=segment, al=color ; 19 bytes
mov ah,al
mov cx,ax
shl eax,16
mov ax,cx
xor di,di
mov cx,$3E80
rep stosd
ret
    


MATRIX
Post 15 Oct 2004, 20:08
View user's profile Send private message Visit poster's website Reply with quote
Slai



Joined: 11 Jan 2006
Posts: 40
Location: NY/Bulgaria
Slai 08 Mar 2006, 17:41
maybe this putpixelVGA code is faster ? the clock cycles are for 80486, and probably are not very correct Smile
Code:
 macro pxl1 x,y,col {
        mov  ax, y        ; 1
        mov  bx, x        ; 1
        xchg ah, al       ; 3
        add  bx, ax       ; 1
        shr  ax, 2        ; 3?
        add  bx, ax       ; 1
        mov  al, col      ; 1
        mov  [es:bx], al }; 8?    
Post 08 Mar 2006, 17:41
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 09 Mar 2006, 09:30
You can roughly divide by three the clockcycles from 486 to current Pentiums.
Code:
 macro pxl1 x,y,col {
        mov  ax, y        ; 1 uop @ port 2 - CLK 1
        mov  bx, x        ; 1 uop @ port 2 - CLK 2 (port 2 full)
        xchg ah, al       ; 3 uops @ port 01 - CLK 2,3
        add  bx, ax       ; 1 uop @ port 01 - CLK 3
        shr  ax, 2        ; 1 uop @ port 1+ 4 latency - CLK 4,5
        add  bx, ax       ; 1 uop @ port 01 - CLK 6
        mov  al, col      ; 1 uop @ port 2 - CLK 6
        mov  [es:bx], al }; 1 uop @ port 4 - CLK 6
    

6 clocks exactly when you are LUCKY - this means that you must start a clock on your "MOV AX,Y" instruction and can expect 6th clock to end at "MOV [ES:BX], al
Post 09 Mar 2006, 09:30
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Hayden



Joined: 06 Oct 2005
Posts: 132
Hayden 25 Jun 2007, 02:07
heres is a very fast pixel proc that i was made aware of last year...

Code:
; Very fast pixel proc for mode 13h - 32-bit p/m code
; btw, #A0000 / 8 = #14000

macro PutPixel x, y, c
{
    mov ebx, x
    mov edx, y
    mov cl,  c
    lea edx, [edx + edx*4]
    lea edx, [edx*8 + 14000h]
    mov [ebx + edx*8],  cl
}
    

_________________
New User.. Hayden McKay.
Post 25 Jun 2007, 02:07
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 05 Oct 2007, 00:42
more fast is this, no?
Code:
;es is the video memory or video buffer as you want
;x and y are contiguous dwords in memory
;al=color
putpxl:
  mov edi,screen13h.xl
  imul edi,[Y]
  add edi,[X]
  stosb  
  ret
    


Last edited by edfed on 30 Dec 2008, 18:07; edited 1 time in total
Post 05 Oct 2007, 00:42
View user's profile Send private message Visit poster's website Reply with quote
rain_storm



Joined: 05 Apr 2007
Posts: 67
Location: Ireland
rain_storm 06 Oct 2007, 17:07
"add edi,[X]" ?? Dont you mean "add eax,[X]"
Post 06 Oct 2007, 17:07
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 06 Oct 2007, 21:56
no, he doesn't Wink
Post 06 Oct 2007, 21:56
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 07 Oct 2007, 02:59
edfed, I would assume using imul and stosb would be a good amount slower than Hayden's method. But feel free to prove me wrong! Laughing
Post 07 Oct 2007, 02:59
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 08 Oct 2007, 18:09
imul exists to be used so i use this
and it is more evolutive
nananananèreu

if you look at the new pentiums timings you'll see that imul is fast

and only five instructions for a pixel. yes!!!!
Post 08 Oct 2007, 18:09
View user's profile Send private message Visit poster's website Reply with quote
Sahrian



Joined: 17 Mar 2007
Posts: 16
Sahrian 09 Oct 2007, 15:07
edfed, I'm sorry for you but rugxulo is right. The problem is not imul, but stosb. imul is slow on older CPUs also.
Post 09 Oct 2007, 15:07
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.