flat assembler
Message board for the users of flat assembler.
Index
> DOS > First mmx optimization, wrong results |
Author |
|
Matrix 23 Nov 2004, 20:21
hy Zahariash
nice try for the first time, however i don't see any adventages in converting that simple piece of code to MMX, in some cases using SIMD (simple instruction on multiple data) helps on speed, like adding 1 to each pixel, or multiplying each pixel with 3 ... also note that al is a byte (8bit) register, and MM0 is a qword (64bit) register |
|||
23 Nov 2004, 20:21 |
|
Zahariash 24 Nov 2004, 17:42
I was trying to rewrite this code from label '.pixelki' to end using mmx, and then change generation of fire to something more interesting (some multiply, shifting) operations. This code should generate four pixels for each loop, so some speed improovements will be visible, I think.
After line 'psrlw mm1,2', mm1 should looks like this: Code: [ds:si-3] [ds:si-2] [ds:si-1] [ds:si] |??????|pixel color|??????|pixel color|??????|pixel color|??????|pixel color| Am I wrong? So why it generates wrong results? In this moment speed isn't so important as correct results. So where is a bug? |
|||
24 Nov 2004, 17:42 |
|
Matrix 27 Nov 2004, 21:07
well i don't see the rest, but with this you should first try to make basic optimizations, without MMX, because the slowest thing is to access memory, reading a 4x1 bytes is much slower than reading 1x4 bytes (dword)
Code: ;ds:si points to last pixel (8bit) ;cx - pixel count xor ax,ax .pixelki: mov al, byte [ds:si] mov ebx,dword [ds:si+dl_x-1] rol ebx,16 xor bh,bh add ax,bx xor bl,bl rol ebx,8 add ax,bx xor bl,bl rol ebx,8 add ax,bx shr ax,2 cmp ax,[temp_max] ja .skok cmp ax,[temp_min] jb .skok sub ax,2 .skok: inc ax mov [ds:si],al dec si loop .pixelki i dont know if i could use other register from this snipplet, but it might be improved even more |
|||
27 Nov 2004, 21:07 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.