flat assembler
Message board for the users of flat assembler.

Index > Main > Memory Move Optimization (BTW, I have read the other thread)

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
Do we need this in WIN32/64 FAQ?
Post 19 Dec 2007, 09:11
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
levicki



Joined: 29 Jul 2007
Posts: 26
Location: Belgrade, Serbia
levicki
Fastest way to move memory should be as follows:

1. Allocate a block 1/4 or 1/2 the size of L1 cache and lock it using VirtualLock so it doesn't get paged out

2. Read data from source in 128-byte chunks using movaps and prefetchnta for reading and movaps for writing until you fill that block.

3. Stream data out of that block to destination in 128-byte chunks by reading it with movaps and writing it with movntps

Destination alignment is much more important than source.

You can also test if the source data is already in cache before copying by timing the read from the source with the rdtsc instruction.

Intel C/C++ compiler runtime library already has brutally optimized memcpy() so you might want to take a look at their code. You can download a free trial from their website.
Post 22 Dec 2007, 14:05
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
In assembly, if we know some information that the compiler doesn't then we can get better performance. eg. if we know that source and dest are always aligned to 64 byte boundary then we don't need extra code to check the alignments and such. Our initialisation code can be just a few instructions setting up the loop counters. Anyhow, this is just an example where a little bit more knowledge that is not possible to tell the compiler can be to our advantage.
Post 22 Dec 2007, 14:09
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
The compiler doesn't need to know about memory alignment, the programmer does. I'd never use these optimizations for a simple memcpy (would probably stick with rep movsd there), but only use this kind of stuff where I know explicitly it gives a performance boost... and then I'd obviously also make sure that memory is aligned.

Imho all the cache-control, superbig moves etc. doesn't belong in a generic memcpy, which will most likely be used mostly for max a few hundred bytes most of the time Smile
Post 22 Dec 2007, 14:16
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.