flat assembler
Message board for the users of flat assembler.

Index > Main > Fastest Memory Copying Algorithms

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 25 Nov 2004, 21:36
MCD, sorry, but your codes are buggy, should i move them to test area?
or maeby you didn't wanted to post them?
ps.: have you read the previous codes we have posted?
Post 25 Nov 2004, 21:36
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 29 Nov 2004, 17:13
Quote:

MCD, sorry, but your codes are buggy, should i move them to test area?
or maeby you didn't wanted to post them?


Sorry, you must understand, I was in a great hurry when writing this because I'm never online @home, but rather in some kind of Internet Cafe with time-limited access.

But, I've fixed them now, and I hope that there aren't any bugs left!

In the first part, I forgot to change esi to edi in the target movntps and in the second TSC-example for DOS, I forgot to change some register from my abreviation to regular register names!

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 29 Nov 2004, 17:13
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 29 Nov 2004, 17:17
And yes, I read most, but not all of your codes previously posted
Post 29 Nov 2004, 17:17
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 15 Feb 2006, 15:27
also look here about some memcopy algos:
http://board.flatassembler.net/topic.php?t=4467
Post 15 Feb 2006, 15:27
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 16 Feb 2006, 02:17
Single case algorithms aren't the best way to measure a memcopy speed. Unless your building the algo for a custom task like, data will ALWAYS be 16byte aligned or data size will ALWAYS be a multiple of 8.

For 32bit systems (without some overhead) the code doesn't know whether the CPU has SSE extensions. In the same sense (without overhead) the code doesn't know if the source or destination or both are aligned or unaligned.

Here's the ASM of WinXP SP2's memmove api in ntdll.dll
I left the disassembly addressing in (you can delete it with fasm's IDE block select).

On Win XP 64, the 64bit memcopy is by far the fastest for random align, random size copying. But the 32bit version doesn't seem to use any MMX or SSE which means no prefetching, so it'll probably only be faster for copies of less than say 4096 bytes that are unaligned.


Description:
Download
Filename: copy.asm
Filesize: 44.98 KB
Downloaded: 689 Time(s)

Post 16 Feb 2006, 02:17
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
viki



Joined: 03 Jan 2006
Posts: 40
Location: Czestochowa, Poland
viki 06 Jul 2006, 13:04
I have additional question about move32 proc. It is ok that first we execute this movsb code. I'm affraid that if data is alligned this instruction cause that next looped instracions will work on not alligned datas. Am I right?
Post 06 Jul 2006, 13:04
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.