flat assembler
Message board for the users of flat assembler.
Index
> Main > Optimising programs - nothing serious though |
Author |
|
cod3b453 15 May 2011, 23:08
1) Reads themselves are not particularly expensive but are worsened by how you read memory (e.g. if you read a byte at random offsets or from a structure larger than your CPU's cache it'll mostly/always cause a cache miss which is slower) and how often (e.g. in C "for (i=0i<X;i++) {...}" if i was a memory location it'd be slower to read/compare/inc/write than to simply compare/inc a register.
EDIT: I forgot to add alignment - aligning to 4/8/16 bytes according to the instruction can be faster. 2) There are a few ways to do this but it's very specific to the code as to which is better (speed/size) usually there are much more obvious factors, otherwise it doesn't really matter. Sometimes you can reuse results from previous calculations as your zero source. 3) I'm not sure exactly, it's usually easier to deal with blocks but ultimately that just the way it was defined. 4) Copying and zeroing usually benefit from more registers; for larger structures you'll find precaching, aligned accesses and streaming registers help performance (e.g. MMX/SSE) 5) I'm not sure about stalling but branching is covered in the respective CPU manuals about how they treat branch prediction. For frequent loops, it may be faster to rewrite a compare/jump to benefit from the way branch hinting is performed. Hope that's helped a little. |
|||
15 May 2011, 23:08 |
|
bitshifter 16 May 2011, 01:03
My general rules of thumb...
1) If you need to use it more than once, load it into a register. 2) sub eax,eax sets all the flags and is my preferred way to zero a register. 3) The alignment is only for a 'section' which is usually 1024 bytes (IIRC) 4) Use esi and edi as pointers (using string instructions are optional) 5) Dont bother to optimize until it is proven to be slow by profiling. I first optimize for size when writing, then when needed to optimize for speed. _________________ Coding a 3D game engine with fasm is like trying to eat an elephant, you just have to keep focused and take it one 'byte' at a time. |
|||
16 May 2011, 01:03 |
|
MattDiesel 16 May 2011, 08:01
Thanks guys, between the two of you you have most of it covered. I think the alignment is 512 bytes btw.
|
|||
16 May 2011, 08:01 |
|
LocoDelAssembly 16 May 2011, 14:47
Quote:
|
|||
16 May 2011, 14:47 |
|
MattDiesel 17 May 2011, 18:36
One last thing on stalling... What about moving from memory to memory? Usually I go via eax, like:
mov eax,[mem] mov [mem2],eax Is there a better way to do that? |
|||
17 May 2011, 18:36 |
|
bitshifter 17 May 2011, 21:04
Code: mov eax,[mem] ;do something else here which doesnt involve eax mov [mem2],eax Also, some things i have profiled show that memory reads and writes within a certain number of clocks had caused a slowdown (with SSE2 anyway) |
|||
17 May 2011, 21:04 |
|
cod3b453 17 May 2011, 21:08
For a small single read/write, very little scope for improvement. Note that you're not forced to use eax. If you're moving a lot of memory, it's the same as (4).
|
|||
17 May 2011, 21:08 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.