flat assembler
Message board for the users of flat assembler.

Index > Main > Optimising programs - nothing serious though

Author
Thread Post new topic Reply to topic
MattDiesel



Joined: 31 Oct 2010
Posts: 34
Location: England
MattDiesel
I have a few questions that I guess I need a better understanding of the internals of executables to really answer...

1) How expensive is reading memory? For example, at what point is it better to load the memory into a register and use that? Functions are similar as well, I've noticed that when optimising c using visual studio, it will load a commonly used function address into edi or another similar register and use that.

2) What is the best way to zero a register, and what about memory? xor is one I see a lot, and I can understand why... But I also saw somewhere there were different instructions depending on whether you want to have a smaller size or faster execution.

3) Why do exes need to be padded to such a large alignment? And what if I want to have a smaller exe at the cost of startup time?

4) What is the best way to copy data from one struct to another? I disassembled an optimised c program that just did struct1 = struct2 and it literally copied them across via the registers (using 4 or them to avoid stalling). What about zeroing a large struct?

5) What is the cost of stalling and branching? Is it something I should be thinking of in a normal program or is it just for people writing insanely fast stuff?

I understand that the answers may depend to some extent on machine, and that I am almost certainly going to get answers that I will be studying for the next week, but hopefully I will learn.

Thanks,

Mat

_________________
Cogito Cogito Ergo Essum
Post 15 May 2011, 20:28
View user's profile Send private message Send e-mail Visit poster's website MSN Messenger Reply with quote
cod3b453



Joined: 25 Aug 2004
Posts: 619
cod3b453
1) Reads themselves are not particularly expensive but are worsened by how you read memory (e.g. if you read a byte at random offsets or from a structure larger than your CPU's cache it'll mostly/always cause a cache miss which is slower) and how often (e.g. in C "for (i=0i<X;i++) {...}" if i was a memory location it'd be slower to read/compare/inc/write than to simply compare/inc a register.

EDIT: I forgot to add alignment - aligning to 4/8/16 bytes according to the instruction can be faster.

2) There are a few ways to do this but it's very specific to the code as to which is better (speed/size) usually there are much more obvious factors, otherwise it doesn't really matter. Sometimes you can reuse results from previous calculations as your zero source.

3) I'm not sure exactly, it's usually easier to deal with blocks but ultimately that just the way it was defined.

4) Copying and zeroing usually benefit from more registers; for larger structures you'll find precaching, aligned accesses and streaming registers help performance (e.g. MMX/SSE)

5) I'm not sure about stalling but branching is covered in the respective CPU manuals about how they treat branch prediction. For frequent loops, it may be faster to rewrite a compare/jump to benefit from the way branch hinting is performed.


Hope that's helped a little.
Post 15 May 2011, 23:08
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
My general rules of thumb...

1) If you need to use it more than once, load it into a register.

2) sub eax,eax sets all the flags and is my preferred way to zero a register.

3) The alignment is only for a 'section' which is usually 1024 bytes (IIRC)

4) Use esi and edi as pointers (using string instructions are optional)

5) Dont bother to optimize until it is proven to be slow by profiling.

I first optimize for size when writing, then when needed to optimize for speed.

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
Post 16 May 2011, 01:03
View user's profile Send private message Reply with quote
MattDiesel



Joined: 31 Oct 2010
Posts: 34
Location: England
MattDiesel
Thanks guys, between the two of you you have most of it covered. I think the alignment is 512 bytes btw.
Post 16 May 2011, 08:01
View user's profile Send private message Send e-mail Visit poster's website MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:

2) sub eax,eax sets all the flags and is my preferred way to zero a register.
If you really need AF flag cleared, OK, otherwise, what's wrong with XOR? Many processors (if not all) can't realize the EAX register can be renamed with SUB EAX, EAX (i.e. it waits for any instruction writing to EAX to complete), however when using XOR EAX, EAX or MOV EAX, 0 they can.
Post 16 May 2011, 14:47
View user's profile Send private message Reply with quote
MattDiesel



Joined: 31 Oct 2010
Posts: 34
Location: England
MattDiesel
One last thing on stalling... What about moving from memory to memory? Usually I go via eax, like:

mov eax,[mem]
mov [mem2],eax

Is there a better way to do that?
Post 17 May 2011, 18:36
View user's profile Send private message Send e-mail Visit poster's website MSN Messenger Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
Code:
mov eax,[mem]
;do something else here which doesnt involve eax
mov [mem2],eax
    

Also, some things i have profiled show that memory reads and writes
within a certain number of clocks had caused a slowdown (with SSE2 anyway)
Post 17 May 2011, 21:04
View user's profile Send private message Reply with quote
cod3b453



Joined: 25 Aug 2004
Posts: 619
cod3b453
For a small single read/write, very little scope for improvement. Note that you're not forced to use eax. If you're moving a lot of memory, it's the same as (4).
Post 17 May 2011, 21:08
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.