flat assembler
Message board for the users of flat assembler.

Index > Main > Bitmap Blue Channel

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Andy



Joined: 17 Oct 2011
Posts: 26
Andy
Hi guys,

I'm trying to write a simple code to extract blue channel from a bitmap. My code is this:
Code:
mov esi,dword[esp+4]    ; PixelData 
mov ecx,dword[esp+8]    ; Width * Height (pixels number)    
NextPixel:              
mov eax,[esi]           ; mov in eax one pixel
mov dl,al               ; save blue color in dl
mov eax,0               ; eax = 0
shr eax,16              ; rotate eax 16 bits to right
mov ah,255              ; mov in ah alpha channel
shl eax,16              ; rotate eax 16 bits to left
mov al,dl               ; mov in al blue color
mov [esi],eax           ; save pixel
add esi,4               ; esi += 1
loop NextPixel          ; if ecx> 0 them jump to next pixel; ecx -= 1
mov eax,1               ; eax = 1
ret                     ; return
    


This code it works but can be write using less operations?
Post 12 May 2012, 17:35
View user's profile Send private message Reply with quote
idle



Joined: 06 Jan 2011
Posts: 360
Location: Ukraine
idle
Code:
...
NextPixel:
and dword[esi],$00'ff'00'00 ;= 255 shl 16 
add esi,4
loop NextPixel
    

shl/r work like a meet chopper - NOT a circle
hi Andy, i recognize myself in you, good luck!
Post 12 May 2012, 18:20
View user's profile Send private message Reply with quote
Andy



Joined: 17 Oct 2011
Posts: 26
Andy
Thanks guys, I'll try to understand all your code.

Quote:
is it zeroing the other channels?

Yes, in this case was for blue channel. Actually this code it's just to generate raw binary code and then run it from a HLL.

Br,
Andy
Post 12 May 2012, 18:54
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
idle wrote:
Code:
...
NextPixel:
and dword[esi],$00'ff'00'00 ;= 255 shl 16 
add esi,4
loop NextPixel
    



This would be executed faster I think, by use of one more register (ebx).
But the data is modified in descending order (not ascending) but this is maybe not important.

Code:
...
mov ebx,$00ff0000
sub esi,4
NextPixel:
and dword[esi+ecx*4],ebx
loop NextPixel
    


By the way, this code does not exactly do the same as the code from TE.
It sets alpha channel to 255 (0ffh) too. This need a second register (edx) to avoid immediates during loop.

To do this the example need one additional line:

Code:
...
mov ebx,$00ff0000
mov edx,$ff000000
sub esi,4
NextPixel:
and dword[esi+ecx*4],ebx
or dword[esi+ecx*4],edx
loop NextPixel
    


This could be an alternative to the code above with two register more (eax is used by lodsd, edi needed for stosd). The execution of memory area is ascending (with cld). Not sure if execution would be faster than code above. Maybe have to experimentate.

Code:
cld
mov ebx,$00ff0000
mov edx,$ff000000
mov edi,esi
NextPixel:
lodsd
and eax,ebx
or eax,edx
stosd
loop NextPixel  
    
Post 12 May 2012, 22:10
View user's profile Send private message Send e-mail Reply with quote
idle



Joined: 06 Jan 2011
Posts: 360
Location: Ukraine
idle
shutdownall, great!
last variant gives ~1/3 better results to the above one
Code:
format pe gui 4.0
include 'win32ax.inc'


section '' code import readable writable
library kernel32,'kernel32.dll',\
        user32,'user32.dll'

include 'api\kernel32.inc'
include 'api\user32.inc'


entry $
rept 2 ?:2{
        invoke  GetTickCount
        push    eax                   ;time in
        push    10000                 ;etc counter
     @@:mov     esi,base
        mov     ecx,(base.-base)/4
        mov     ebx,$00ff0000
        mov     edx,$ff000000
        call    variant#?             ;variant2/3
        dec     dword[esp]
        jnz     @b
        pop     eax                   ;kill counter
        invoke  GetTickCount          ;+time out
        neg     dword[esp]            ;-time in
        add     [esp],eax   }         ;=time
        invoke  wsprintfA,lpOut,lpFmt ,,
        add     esp,8*4
        invoke  MessageBoxA,0,lpOut,0,0
        invoke  ExitProcess,eax

  variant2:
        sub     esi,4
    .NextPixel:
        and     [esi+ecx*4],ebx
        or      [esi+ecx*4],edx
        loop    .NextPixel
        ret

  variant3:
        cld
        mov     edi,esi
    .NextPixel:
        lodsd
        and     eax,ebx
        or      eax,edx
        stosd
        loop    .NextPixel
        ret


base: file '1.asm'
 .:
lpFmt db 'variant3: %u ms',10,'variant2: %u ms',0
lpOut rb MAX_PATH
    
Post 13 May 2012, 00:13
View user's profile Send private message Reply with quote
idle



Joined: 06 Jan 2011
Posts: 360
Location: Ukraine
idle
can we make rept(and others') counter count downwards: e.g. 3,2..
Post 13 May 2012, 00:17
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
idle wrote:
can we make rept(and others') counter count downwards: e.g. 3,2..
Yes. Use reverse inside the definition:
Code:
rept ... { reverse
  ...
}    
Post 13 May 2012, 00:24
View user's profile Send private message Visit poster's website Reply with quote
LostCoder



Joined: 07 Mar 2012
Posts: 22
LostCoder
You can try change
Code:
loop .NextPixel    
to
Code:
dec ecx
jnz .NextPixel    
because loop instruction is quite slow on newer processors. At least it is true for my Intel i5-430M. Also sometimes it is better to change
Code:
dec ecx    
to
Code:
sub ecx,1    
Post 14 May 2012, 13:05
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
You won't need to change the direction of counting if you change the direction with STD and CLD. At loop entry you should always know how much to count and you can always count down by setting the ecx to this count first.

And yes, the LOOP-instruction is usually slow. On the other hand REPx STOSx/MOVSx/CMPSx are faster when you meet certain criteria.
http://www.agner.org/optimize/optimizing_assembly.pdf (16.10 String instructions) "REP MOVSD and REP STOSD are quite fast if the repeat count is not too small."
Post 16 May 2012, 10:01
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
I tested myself and was surprised.
The difference between loop and jz is not very big. jnz is about 10% faster,
But the difference between dec ecx and sub ecx,1 is very big.
sub ecx,1 is about 75% faster than (about 25% of all-in-all execution time only) in this loop:

Code:
NextPixel:
lodsd
and eax,ebx
or eax,edx
stosd
;loop NextPixel
;dec ecx
sub ecx,1
jnz NextPixel 
    
Post 16 May 2012, 11:51
View user's profile Send private message Send e-mail Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Not the only way to go faster is with the help of SSE/AVX.
Code:
;init xmm1 to 00FF0000h x 4
;and  xmm2 to FF000000h x 4

nextPixelBlock:
  movdqa xmm0,[rsi]
  pand   xmm0,xmm1
  por    xmm0,xmm2
  movdqa [rsi],xmm0
  add    rsi,16
  sub    rcx,16
  jnz    nextPixelBlock
    
Post 16 May 2012, 12:12
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
Madis731 wrote:
Not the only way to go faster is with the help of SSE/AVX.
Assuming the target CPU supports it.
Post 16 May 2012, 12:15
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Well, AVX is a relatively new thing, but safe to assume that SSE is supported.
I tested SSE, and some other approaches taken from previous posts.

i5 650 @ 3.2GHz
Variant1 - 921 ms
Variant2 - 2839 ms
Variant3 - 5663 ms
Variant4 - 6567 ms
Code:
  variant1:
        sub     esi,16
        shl     ecx,2
        movdqa  xmm1,dqword[hex00FF0000]
        movdqa  xmm2,dqword[hexFF000000]
    .NextPixelBlock:
        movdqa  xmm0,[esi+ecx]
        pand    xmm0,xmm1
        por     xmm0,xmm2
        movdqa  [esi+ecx],xmm0
        sub     ecx,16
        jnz     .NextPixelBlock
        ret

  variant2:
        sub     esi,4
    .NextPixel:
        mov     eax,[esi+ecx*4]
        and     eax,ebx
        or      eax,edx
        mov     [esi+ecx*4],eax
        sub     ecx,1
        jnz     .NextPixel
        ret

  variant3:
        sub     esi,4
    .NextPixel:
        and     [esi+ecx*4],ebx
        or      [esi+ecx*4],edx
        loop    .NextPixel
        ret

  variant4:
        cld
        mov     edi,esi
    .NextPixel:
        lodsd
        and     eax,ebx
        or      eax,edx
        stosd
        loop    .NextPixel
        ret
    

_________________
My updated idol Very Happy http://www.agner.org/optimize/


Last edited by Madis731 on 16 May 2012, 12:51; edited 1 time in total
Post 16 May 2012, 12:32
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
Madis731 wrote:
... but safe to assume that SSE is supported.
Erm, are you sure? I think not. The OP never mentioned the expect target(s) for the code.
Post 16 May 2012, 12:42
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Smile Only Andy will know that, but
wiki wrote:
...designed by Intel and introduced in 1999 in their Pentium III series...
Its as safe as conditional move or shifts greater than 1 etc.
EDIT: ...or there's always MMX Wink
Post 16 May 2012, 12:52
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
shutdownall, what CPU have you used? Pentium IV?
Post 16 May 2012, 13:03
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
LocoDelAssembly wrote:
shutdownall, what CPU have you used? Pentium IV?


Description:
Filesize: 6.91 KB
Viewed: 6494 Time(s)

Zwischenablage99.jpg


Post 16 May 2012, 14:10
View user's profile Send private message Send e-mail Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
Madis, why didn't try this one ?
Should be somewhere nearer to your SSE timings.

shutdownall wrote:


Code:
NextPixel:
lodsd
and eax,ebx
or eax,edx
stosd
;loop NextPixel
;dec ecx
sub ecx,1
jnz NextPixel 
    
Post 16 May 2012, 14:12
View user's profile Send private message Send e-mail Reply with quote
bzdashek



Joined: 15 Feb 2012
Posts: 147
Location: Tolstokvashino, Russia
bzdashek
shutdownall wrote:
Madis, why didn't try this one ?
Should be somewhere nearer to your SSE timings.

Isn't lodsd slower than mov eax,dword[esi] ? I'm not sure, hence the question.
Post 16 May 2012, 18:26
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
bzdashek wrote:

Isn't lodsd slower than mov eax,dword[esi] ? I'm not sure, hence the question.


On my computer I tried and lodsd is quite faster.
Post 16 May 2012, 19:07
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.