flat assembler
Message board for the users of flat assembler.

Index > Main > what is faster?

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 26 Feb 2011, 02:11
Code:
    putpixelf:
        ; in(x,y,color) : (eax,ebx,edx)
        mov edi,[ModeInfoBlock.PhysBasePtr]
        mov ecx,ebx
        shl ebx,10      ; y * 1024  (y1)
        shl ecx,08      ; y * 256   (y2)
        add ebx,ecx     ; y1 + y2   (y3)
        add ebx,eax     ; y3 + x    (r)
        mov ecx,ebx
        shl ebx,01      ; r * 3(bytes)
        add ebx,ecx
        mov dword[edi+ebx],edx
        ret ; (1024 * y + 256 * y + x) * 3

    putpixel:
        ; in(x,y,color) : (ebx,eax,ecx)
        mov edi,[ModeInfoBlock.PhysBasePtr]
        mov ebp,ebx
        mov edx,1280
        mul edx         ; 1280 * y
        add eax,ebp     ; y + x     (r)
        mov ecx,03
        mul ecx         ; r * 3
        mov dword[edi+eax],ecx
        ret ; (1280 * y + x) * 3    

putpixelf or putpixel ? and is it possible to optimize more?

_________________
Sorry if bad english.
Post 26 Feb 2011, 02:11
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 26 Feb 2011, 07:59
Teehee wrote:
what is faster?
Short answer: It depends

Long answer: We don't know. What CPU? What RAM timings? What mobo? What video card? What OS? What is in cache? How many times do you call it? etc. etc. etc.

Helpful answer: If you can't notice any change in your program's runtime then it doesn't matter which one you use.
Post 26 Feb 2011, 07:59
View user's profile Send private message Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 26 Feb 2011, 10:56
hmm.. interesting, rev.. i just ask bc i have heard that MUL instruction is slower than shift.
Post 26 Feb 2011, 10:56
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 26 Feb 2011, 11:11
it's true on some X86 models, and false on others.

depends on the parralelisation.
implementation of the instruction (wired mul is very fast, iterative mul is slow)...
etc...
then, only one way to know in your case, compare the execution time of both, with the same amount of pixels, on the same machine.
Post 26 Feb 2011, 11:11
View user's profile Send private message Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 26 Feb 2011, 11:49
im doing loops to fill my screen with pixels but i need to go byte-by-byte bc my buffer has 3 bytes color. There is a way to perform a faster looping than byte-to-byte?

Code:
    clrscr:
        ; in(color) : (dl)
        mov edi,[ModeInfoBlock.PhysBasePtr]
        mov ecx,[src_lenght]
    @@: mov byte[edi+ecx],dl
        dec ecx
        jnz @b
        ret    
Post 26 Feb 2011, 11:49
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 26 Feb 2011, 12:08
using putpixes is not a good idea especially in order to fill big areas with the same collor.
Use dword chunks of data. For 24 bit color this is not very convenient, but still can be implemented - use 4pixel array, that is exactly 3 dwords load them in 3 registers and then fill the area on dword basis.
Post 26 Feb 2011, 12:08
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 26 Feb 2011, 12:15
yep.

use a double buffer in 32 bits.
and just transfer pixels for 24 bpp with dwords. Very Happy

basically, you can create a virtual screen, of any resolution (very big is possible), in 32 bpp, and only vsync will do the refresh to the target resolution and bpp.
Post 26 Feb 2011, 12:15
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 26 Feb 2011, 12:21
Teehee: There is really no way we can answer your question. There are just far too many variables involved. It depends. edfed gave you good advice: "only one way to know in your case, compare the execution time of both, with the same amount of pixels, on the same machine."

Also, see my previous post Razz

Now with edfed's advice in mind, and my "answers" in mind, you could try using stosb. It may, or may not, be faster for you. But until you try it. there is no way to tell just by looking at the source code.
Post 26 Feb 2011, 12:21
View user's profile Send private message Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 26 Feb 2011, 15:18
i think i got it.

Question: In a big asm system is it better to pass parameters by push or by registers?

Code:
push eax
call something

something:
  cmp [esp-4],0
  je $
  ret 4    
Code:
mov eax,XXX
call something

something:
  cmp eax,0
  je $
  ret    
Post 26 Feb 2011, 15:18
View user's profile Send private message Reply with quote
cod3b453



Joined: 25 Aug 2004
Posts: 618
cod3b453 26 Feb 2011, 15:31
Do you only want to support one specific resolution?


Other possible optimisations include:

- lea eax,[2*eax+eax] for multiply by 3 (Question)

- Using MMX or XMM registers (for 24 bit you could use 6 or 3 respectively to fill 16 pixels at a time without trashing other pixels, these would also be aligned accessess)

- Configuring the LFB to be a write combining memory region using MTRRs [requires CPU support]
Post 26 Feb 2011, 15:31
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 26 Feb 2011, 18:18
Teehee wrote:
Question: In a big asm system is it better to pass parameters by push or by registers? "
You don't want to do "big asm systems" in the first place Smile

But that snarky comment aside, what is important is being consistent in what you do - use one of the existing calling conventions, whether it be STDCALL, FASTCALL, CDECL. If register vs. stack parameter passing does much of a difference, you're designing functions at a wrong granularity level.

PutPixel is something you really shouldn't spend your time on - heck, I'd say a graphics library shouldn't even include this function! It's shoot-me-in-the-face bad for performance. Design larger primitives, and preferably use GPU acceleration if possible, we're not in the early 90'es anymore Smile
Post 26 Feb 2011, 18:18
View user's profile Send private message Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 26 Feb 2011, 18:22
i'm just making my poor OS. So im trying to follow some pattern Razz
Post 26 Feb 2011, 18:22
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 26 Feb 2011, 18:45
double buffer fixed resolution and bpp > vsync > resolution and bpp convert to screen

this can be a method.
Post 26 Feb 2011, 18:45
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 26 Feb 2011, 19:39
Teehee wrote:
i'm just making my poor OS. So im trying to follow some pattern Razz


Never use CCALL if you can avoid it. And "simple" is better than "fast".
With time, every register passed argument tends to become stack passed.
For me STDCALL is the choice.
Post 26 Feb 2011, 19:39
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 26 Feb 2011, 20:05
JohnFound wrote:
And "simple" is better than "fast".


Its bc some of my codes seem weird (ugly/unoptimized) for me, like this that i've made:

Code:
        push COLOR_WHITE ; color
        push 10   ; height
        push 100  ; width
        push 50   ; y
        push 50   ; x
        call drawfillrect 

; ...

; + drawfillrect: Draw a filled rectangle.
;       in     ( video_ptr edi, x push, y push, width push, height push, color push )
;       change ( eax, ebx, ecx, edx, ebp, edi )
     
    drawfillrect:
        mov  ecx,[esp+5*4] ; c
        mov  ebp,[esp+4*4] ; h
        mov  edx,[esp+3*4] ; w
        mov  eax,[esp+2*4] ; y
        mov  ebx,[esp+1*4] ; x
        add  edx,ebx
        add  ebp,eax
    @@: push eax edx
        call putpixel
        pop  edx eax
        inc  ebx           ; x++
        cmp  ebx,edx       ; if (x != w) loop
        jne  @b            ; else
        mov  ebx,[esp+1*4] ; reset x
        inc  eax           ; y++
        cmp  eax,ebp       ; if (y != h) loop
        jne  @b
        ret  5*4    

if there was a way to put them all into regs so i would have like 6 line less (mov ecx,[esp+5*4]...).

_________________
Sorry if bad english.
Post 26 Feb 2011, 20:05
View user's profile Send private message Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 26 Feb 2011, 20:09
Maybe its just something of my head, like i thought that a simple line (or a couple) its enough to determine that my code will be slow.
Post 26 Feb 2011, 20:09
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 27 Feb 2011, 03:07
Teehee wrote:
Maybe its just something of my head, like i thought that a simple line (or a couple) its enough to determine that my code will be slow.
Modern CPUs are so complex that it has become impossible to tell anything about execution speed from just looking at source code snippets. The CPUs hold internal state and have all sorts of buffers and other things that you can't know the contents of. On top of that is the varied CPU architectures, each will have its own little things that it does well and things it does poorly. The only way to know if your code is fast, or not, is to test it in the real system running normally (i.e. not some artificial loop). And even then you only get timings for your system, other systems almost certainly will give different results.
Post 27 Feb 2011, 03:07
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 27 Feb 2011, 12:50
use ebp for stack frame access, then, you will be able to use pop and push anywhere in the code.
and put pixel can use ebp+c to locate the color.
then, you will free the ecx register, usefull for loops.

just use this code with ebp instead of esi
Code:
        mov eax,[esi+.x]
        mov ebx,[esi+.y]
        mov ecx,[esi+.c]
        mov edi,[esi+.xl]
        mov edx,[esi+.yl]
@@:
        call pixel
        inc eax
        dec edi
        jne @b
        mov edi,[esi+.xl]
        mov eax,[esi+.x]
        inc ebx
        dec edx
        jne @b
.end:
    


or something like that.
Post 27 Feb 2011, 12:50
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 27 Feb 2011, 15:31
edfed: great, überslow code and custom calling convention - nothing can go wrong with that!
Post 27 Feb 2011, 15:31
View user's profile Send private message Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 27 Feb 2011, 15:34
huh? i'm losting something? sorry didnt understand you both. Sad
Post 27 Feb 2011, 15:34
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.