flat assembler
Message board for the users of flat assembler.

Index > DOS > First mmx optimization, wrong results

Author
Thread Post new topic Reply to topic
Zahariash



Joined: 23 Nov 2004
Posts: 2
Location: Polad
Zahariash 23 Nov 2004, 18:18
I tried to optimize some piece of code using mmx. Its siple fire demo.

That code:

Code:
;ds:si points to last pixel (8bit)
;cx - pixel count
 xor ax,ax
.pixelki:
  mov al,[ds:si]            
      mov bl,[ds:si+dl_x-1]
   add ax,bx
   mov bl,[ds:si+dl_x]
     add ax,bx
   mov bl,[ds:si+dl_x+1]
   add ax,bx
   shr ax,2

        cmp ax,[temp_max]       
    ja .skok
    cmp ax,[temp_min]         
  jb .skok
    dec ax
      dec ax
.skok:
        inc ax

  mov [ds:si],al          
        dec si
      loop .pixelki
    


i rewrote to this:
Code:
      pxor mm0,mm0
        
.pixelki:                                       
    ;mov al,[ds:si]             
    movq mm1,[ds:si]
        punpcklbw mm1,mm0

       ;mov bl,[ds:si+dl_x-1]
  movq mm2,[ds:si+dl_x-1-3]
       punpcklbw mm2,mm0
   
    ;add ax,bx
  paddw mm1,mm2

   ;mov bl,[ds:si+dl_x]
    movq mm2,[ds:si+dl_x-3]
 punpcklbw mm2,mm0
   
    ;add ax,bx
  paddw mm1,mm2

   ;mov bl,[ds:si+dl_x+1]
  movq mm2,[ds:si+dl_x+1-3]
       punpcklbw mm2,mm0
   
    ;add ax,bx
  paddw mm1,mm2
       
    ;shr ax,2
   psrlw mm1,2                             

        movq qword [buf],mm1
        ;mov ax, word [buf]     

        mov bx,8
.kolor:
     sub bx,2
    mov ax,word [buf+bx]
        cmp ax,[temp_max]            
       ja .skok
    cmp ax,[temp_min]            
       jb .skok
    dec ax
      dec ax
.skok:
        inc ax
      mov [ds:si],al            
      dec si
      cmp bx,0
    ja .kolor

       sub cx,3                        
    loop .pixelki
    


Sorry for polish labels.

What is wrong in this code? It gives wrong results.
I just started coding in assebler, so be patiens, please

My first post here...
Post 23 Nov 2004, 18:18
View user's profile Send private message Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 23 Nov 2004, 20:21
hy Zahariash

nice try for the first time, however i don't see any adventages in converting that simple piece of code to MMX, in some cases using SIMD (simple instruction on multiple data) helps on speed, like adding 1 to each pixel, or multiplying each pixel with 3 ...

also note that al is a byte (8bit) register, and MM0 is a qword (64bit) register
Post 23 Nov 2004, 20:21
View user's profile Send private message Visit poster's website Reply with quote
Zahariash



Joined: 23 Nov 2004
Posts: 2
Location: Polad
Zahariash 24 Nov 2004, 17:42
I was trying to rewrite this code from label '.pixelki' to end using mmx, and then change generation of fire to something more interesting (some multiply, shifting) operations. This code should generate four pixels for each loop, so some speed improovements will be visible, I think.

After line 'psrlw mm1,2', mm1 should looks like this:
Code:
           [ds:si-3]          [ds:si-2]          [ds:si-1]            [ds:si]
|??????|pixel color|??????|pixel color|??????|pixel color|??????|pixel color|
    


Am I wrong?
So why it generates wrong results?

In this moment speed isn't so important as correct results.

So where is a bug?
Post 24 Nov 2004, 17:42
View user's profile Send private message Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 27 Nov 2004, 21:07
well i don't see the rest, but with this you should first try to make basic optimizations, without MMX, because the slowest thing is to access memory, reading a 4x1 bytes is much slower than reading 1x4 bytes (dword)

Code:
;ds:si points to last pixel (8bit) 
;cx - pixel count 
xor ax,ax
.pixelki: 
        mov al, byte [ds:si]         
        mov ebx,dword [ds:si+dl_x-1] 
        rol ebx,16
        xor bh,bh
        add ax,bx 
        xor bl,bl
        rol ebx,8
        add ax,bx 
        xor bl,bl
        rol ebx,8
        add ax,bx 
        shr ax,2 

        cmp ax,[temp_max]        
        ja .skok 
        cmp ax,[temp_min]          
        jb .skok 
        sub ax,2
.skok: 
        inc ax 
        mov [ds:si],al       
        dec si 
        loop .pixelki 
    

i dont know if i could use other register from this snipplet, but it might be improved even more
Post 27 Nov 2004, 21:07
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.