flat assembler
Message board for the users of flat assembler.
Index
> Main > Bitmap Blue Channel Goto page 1, 2 Next |
Author |
|
idle 12 May 2012, 18:20
Code: ... NextPixel: and dword[esi],$00'ff'00'00 ;= 255 shl 16 add esi,4 loop NextPixel shl/r work like a meet chopper - NOT a circle hi Andy, i recognize myself in you, good luck! |
|||
12 May 2012, 18:20 |
|
Andy 12 May 2012, 18:54
Thanks guys, I'll try to understand all your code.
Quote: is it zeroing the other channels? Yes, in this case was for blue channel. Actually this code it's just to generate raw binary code and then run it from a HLL. Br, Andy |
|||
12 May 2012, 18:54 |
|
shutdownall 12 May 2012, 22:10
idle wrote:
This would be executed faster I think, by use of one more register (ebx). But the data is modified in descending order (not ascending) but this is maybe not important. Code: ... mov ebx,$00ff0000 sub esi,4 NextPixel: and dword[esi+ecx*4],ebx loop NextPixel By the way, this code does not exactly do the same as the code from TE. It sets alpha channel to 255 (0ffh) too. This need a second register (edx) to avoid immediates during loop. To do this the example need one additional line: Code: ... mov ebx,$00ff0000 mov edx,$ff000000 sub esi,4 NextPixel: and dword[esi+ecx*4],ebx or dword[esi+ecx*4],edx loop NextPixel This could be an alternative to the code above with two register more (eax is used by lodsd, edi needed for stosd). The execution of memory area is ascending (with cld). Not sure if execution would be faster than code above. Maybe have to experimentate. Code: cld mov ebx,$00ff0000 mov edx,$ff000000 mov edi,esi NextPixel: lodsd and eax,ebx or eax,edx stosd loop NextPixel |
|||
12 May 2012, 22:10 |
|
idle 13 May 2012, 00:13
shutdownall, great!
last variant gives ~1/3 better results to the above one Code: format pe gui 4.0 include 'win32ax.inc' section '' code import readable writable library kernel32,'kernel32.dll',\ user32,'user32.dll' include 'api\kernel32.inc' include 'api\user32.inc' entry $ rept 2 ?:2{ invoke GetTickCount push eax ;time in push 10000 ;etc counter @@:mov esi,base mov ecx,(base.-base)/4 mov ebx,$00ff0000 mov edx,$ff000000 call variant#? ;variant2/3 dec dword[esp] jnz @b pop eax ;kill counter invoke GetTickCount ;+time out neg dword[esp] ;-time in add [esp],eax } ;=time invoke wsprintfA,lpOut,lpFmt ,, add esp,8*4 invoke MessageBoxA,0,lpOut,0,0 invoke ExitProcess,eax variant2: sub esi,4 .NextPixel: and [esi+ecx*4],ebx or [esi+ecx*4],edx loop .NextPixel ret variant3: cld mov edi,esi .NextPixel: lodsd and eax,ebx or eax,edx stosd loop .NextPixel ret base: file '1.asm' .: lpFmt db 'variant3: %u ms',10,'variant2: %u ms',0 lpOut rb MAX_PATH |
|||
13 May 2012, 00:13 |
|
idle 13 May 2012, 00:17
can we make rept(and others') counter count downwards: e.g. 3,2..
|
|||
13 May 2012, 00:17 |
|
revolution 13 May 2012, 00:24
idle wrote: can we make rept(and others') counter count downwards: e.g. 3,2.. Code: rept ... { reverse ... } |
|||
13 May 2012, 00:24 |
|
LostCoder 14 May 2012, 13:05
You can try change
Code: loop .NextPixel Code: dec ecx jnz .NextPixel Code: dec ecx Code: sub ecx,1 |
|||
14 May 2012, 13:05 |
|
Madis731 16 May 2012, 10:01
You won't need to change the direction of counting if you change the direction with STD and CLD. At loop entry you should always know how much to count and you can always count down by setting the ecx to this count first.
And yes, the LOOP-instruction is usually slow. On the other hand REPx STOSx/MOVSx/CMPSx are faster when you meet certain criteria. http://www.agner.org/optimize/optimizing_assembly.pdf (16.10 String instructions) "REP MOVSD and REP STOSD are quite fast if the repeat count is not too small." |
|||
16 May 2012, 10:01 |
|
shutdownall 16 May 2012, 11:51
I tested myself and was surprised.
The difference between loop and jz is not very big. jnz is about 10% faster, But the difference between dec ecx and sub ecx,1 is very big. sub ecx,1 is about 75% faster than (about 25% of all-in-all execution time only) in this loop: Code: NextPixel: lodsd and eax,ebx or eax,edx stosd ;loop NextPixel ;dec ecx sub ecx,1 jnz NextPixel |
|||
16 May 2012, 11:51 |
|
Madis731 16 May 2012, 12:12
Not the only way to go faster is with the help of SSE/AVX.
Code: ;init xmm1 to 00FF0000h x 4 ;and xmm2 to FF000000h x 4 nextPixelBlock: movdqa xmm0,[rsi] pand xmm0,xmm1 por xmm0,xmm2 movdqa [rsi],xmm0 add rsi,16 sub rcx,16 jnz nextPixelBlock |
|||
16 May 2012, 12:12 |
|
revolution 16 May 2012, 12:15
Madis731 wrote: Not the only way to go faster is with the help of SSE/AVX. |
|||
16 May 2012, 12:15 |
|
Madis731 16 May 2012, 12:32
Well, AVX is a relatively new thing, but safe to assume that SSE is supported.
I tested SSE, and some other approaches taken from previous posts. i5 650 @ 3.2GHz Variant1 - 921 ms Variant2 - 2839 ms Variant3 - 5663 ms Variant4 - 6567 ms Code: variant1: sub esi,16 shl ecx,2 movdqa xmm1,dqword[hex00FF0000] movdqa xmm2,dqword[hexFF000000] .NextPixelBlock: movdqa xmm0,[esi+ecx] pand xmm0,xmm1 por xmm0,xmm2 movdqa [esi+ecx],xmm0 sub ecx,16 jnz .NextPixelBlock ret variant2: sub esi,4 .NextPixel: mov eax,[esi+ecx*4] and eax,ebx or eax,edx mov [esi+ecx*4],eax sub ecx,1 jnz .NextPixel ret variant3: sub esi,4 .NextPixel: and [esi+ecx*4],ebx or [esi+ecx*4],edx loop .NextPixel ret variant4: cld mov edi,esi .NextPixel: lodsd and eax,ebx or eax,edx stosd loop .NextPixel ret Last edited by Madis731 on 16 May 2012, 12:51; edited 1 time in total |
|||
16 May 2012, 12:32 |
|
revolution 16 May 2012, 12:42
Madis731 wrote: ... but safe to assume that SSE is supported. |
|||
16 May 2012, 12:42 |
|
Madis731 16 May 2012, 12:52
Only Andy will know that, but
wiki wrote: ...designed by Intel and introduced in 1999 in their Pentium III series... EDIT: ...or there's always MMX |
|||
16 May 2012, 12:52 |
|
LocoDelAssembly 16 May 2012, 13:03
shutdownall, what CPU have you used? Pentium IV?
|
|||
16 May 2012, 13:03 |
|
shutdownall 16 May 2012, 14:10
LocoDelAssembly wrote: shutdownall, what CPU have you used? Pentium IV?
|
||||||||||
16 May 2012, 14:10 |
|
shutdownall 16 May 2012, 14:12
Madis, why didn't try this one ?
Should be somewhere nearer to your SSE timings. shutdownall wrote:
|
|||
16 May 2012, 14:12 |
|
bzdashek 16 May 2012, 18:26
shutdownall wrote: Madis, why didn't try this one ? Isn't lodsd slower than mov eax,dword[esi] ? I'm not sure, hence the question. |
|||
16 May 2012, 18:26 |
|
shutdownall 16 May 2012, 19:07
bzdashek wrote:
On my computer I tried and lodsd is quite faster. |
|||
16 May 2012, 19:07 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.