flat assembler
Message board for the users of flat assembler.
Index
> Main > what is faster? Goto page 1, 2, 3 Next |
Author |
|
revolution 26 Feb 2011, 07:59
Teehee wrote: what is faster? Long answer: We don't know. What CPU? What RAM timings? What mobo? What video card? What OS? What is in cache? How many times do you call it? etc. etc. etc. Helpful answer: If you can't notice any change in your program's runtime then it doesn't matter which one you use. |
|||
26 Feb 2011, 07:59 |
|
Teehee 26 Feb 2011, 10:56
hmm.. interesting, rev.. i just ask bc i have heard that MUL instruction is slower than shift.
|
|||
26 Feb 2011, 10:56 |
|
edfed 26 Feb 2011, 11:11
it's true on some X86 models, and false on others.
depends on the parralelisation. implementation of the instruction (wired mul is very fast, iterative mul is slow)... etc... then, only one way to know in your case, compare the execution time of both, with the same amount of pixels, on the same machine. |
|||
26 Feb 2011, 11:11 |
|
Teehee 26 Feb 2011, 11:49
im doing loops to fill my screen with pixels but i need to go byte-by-byte bc my buffer has 3 bytes color. There is a way to perform a faster looping than byte-to-byte?
Code: clrscr: ; in(color) : (dl) mov edi,[ModeInfoBlock.PhysBasePtr] mov ecx,[src_lenght] @@: mov byte[edi+ecx],dl dec ecx jnz @b ret |
|||
26 Feb 2011, 11:49 |
|
JohnFound 26 Feb 2011, 12:08
using putpixes is not a good idea especially in order to fill big areas with the same collor.
Use dword chunks of data. For 24 bit color this is not very convenient, but still can be implemented - use 4pixel array, that is exactly 3 dwords load them in 3 registers and then fill the area on dword basis. |
|||
26 Feb 2011, 12:08 |
|
edfed 26 Feb 2011, 12:15
yep.
use a double buffer in 32 bits. and just transfer pixels for 24 bpp with dwords. basically, you can create a virtual screen, of any resolution (very big is possible), in 32 bpp, and only vsync will do the refresh to the target resolution and bpp. |
|||
26 Feb 2011, 12:15 |
|
revolution 26 Feb 2011, 12:21
Teehee: There is really no way we can answer your question. There are just far too many variables involved. It depends. edfed gave you good advice: "only one way to know in your case, compare the execution time of both, with the same amount of pixels, on the same machine."
Also, see my previous post Now with edfed's advice in mind, and my "answers" in mind, you could try using stosb. It may, or may not, be faster for you. But until you try it. there is no way to tell just by looking at the source code. |
|||
26 Feb 2011, 12:21 |
|
Teehee 26 Feb 2011, 15:18
i think i got it.
Question: In a big asm system is it better to pass parameters by push or by registers? Code: push eax call something something: cmp [esp-4],0 je $ ret 4 Code: mov eax,XXX call something something: cmp eax,0 je $ ret |
|||
26 Feb 2011, 15:18 |
|
cod3b453 26 Feb 2011, 15:31
Do you only want to support one specific resolution?
Other possible optimisations include: - lea eax,[2*eax+eax] for multiply by 3 () - Using MMX or XMM registers (for 24 bit you could use 6 or 3 respectively to fill 16 pixels at a time without trashing other pixels, these would also be aligned accessess) - Configuring the LFB to be a write combining memory region using MTRRs [requires CPU support] |
|||
26 Feb 2011, 15:31 |
|
f0dder 26 Feb 2011, 18:18
Teehee wrote: Question: In a big asm system is it better to pass parameters by push or by registers? " But that snarky comment aside, what is important is being consistent in what you do - use one of the existing calling conventions, whether it be STDCALL, FASTCALL, CDECL. If register vs. stack parameter passing does much of a difference, you're designing functions at a wrong granularity level. PutPixel is something you really shouldn't spend your time on - heck, I'd say a graphics library shouldn't even include this function! It's shoot-me-in-the-face bad for performance. Design larger primitives, and preferably use GPU acceleration if possible, we're not in the early 90'es anymore |
|||
26 Feb 2011, 18:18 |
|
Teehee 26 Feb 2011, 18:22
i'm just making my poor OS. So im trying to follow some pattern
|
|||
26 Feb 2011, 18:22 |
|
edfed 26 Feb 2011, 18:45
double buffer fixed resolution and bpp > vsync > resolution and bpp convert to screen
this can be a method. |
|||
26 Feb 2011, 18:45 |
|
JohnFound 26 Feb 2011, 19:39
Teehee wrote: i'm just making my poor OS. So im trying to follow some pattern Never use CCALL if you can avoid it. And "simple" is better than "fast". With time, every register passed argument tends to become stack passed. For me STDCALL is the choice. |
|||
26 Feb 2011, 19:39 |
|
Teehee 26 Feb 2011, 20:05
JohnFound wrote: And "simple" is better than "fast". Its bc some of my codes seem weird (ugly/unoptimized) for me, like this that i've made: Code: push COLOR_WHITE ; color push 10 ; height push 100 ; width push 50 ; y push 50 ; x call drawfillrect ; ... ; + drawfillrect: Draw a filled rectangle. ; in ( video_ptr edi, x push, y push, width push, height push, color push ) ; change ( eax, ebx, ecx, edx, ebp, edi ) drawfillrect: mov ecx,[esp+5*4] ; c mov ebp,[esp+4*4] ; h mov edx,[esp+3*4] ; w mov eax,[esp+2*4] ; y mov ebx,[esp+1*4] ; x add edx,ebx add ebp,eax @@: push eax edx call putpixel pop edx eax inc ebx ; x++ cmp ebx,edx ; if (x != w) loop jne @b ; else mov ebx,[esp+1*4] ; reset x inc eax ; y++ cmp eax,ebp ; if (y != h) loop jne @b ret 5*4 if there was a way to put them all into regs so i would have like 6 line less (mov ecx,[esp+5*4]...). _________________ Sorry if bad english. |
|||
26 Feb 2011, 20:05 |
|
Teehee 26 Feb 2011, 20:09
Maybe its just something of my head, like i thought that a simple line (or a couple) its enough to determine that my code will be slow.
|
|||
26 Feb 2011, 20:09 |
|
revolution 27 Feb 2011, 03:07
Teehee wrote: Maybe its just something of my head, like i thought that a simple line (or a couple) its enough to determine that my code will be slow. |
|||
27 Feb 2011, 03:07 |
|
edfed 27 Feb 2011, 12:50
use ebp for stack frame access, then, you will be able to use pop and push anywhere in the code.
and put pixel can use ebp+c to locate the color. then, you will free the ecx register, usefull for loops. just use this code with ebp instead of esi Code: mov eax,[esi+.x] mov ebx,[esi+.y] mov ecx,[esi+.c] mov edi,[esi+.xl] mov edx,[esi+.yl] @@: call pixel inc eax dec edi jne @b mov edi,[esi+.xl] mov eax,[esi+.x] inc ebx dec edx jne @b .end: or something like that. |
|||
27 Feb 2011, 12:50 |
|
f0dder 27 Feb 2011, 15:31
edfed: great, überslow code and custom calling convention - nothing can go wrong with that!
|
|||
27 Feb 2011, 15:31 |
|
Teehee 27 Feb 2011, 15:34
huh? i'm losting something? sorry didnt understand you both.
|
|||
27 Feb 2011, 15:34 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.