flat assembler
Message board for the users of flat assembler.

Index > Main > memcpy

Author
Thread Post new topic Reply to topic
packet_50071



Joined: 31 Oct 2007
Posts: 15
packet_50071 02 Jan 2008, 16:33
sry for my Ignorance but - i made a memcpy function and looked at others memcpy function in this site - but i don't understand y it was soo complicated Shocked

my code is : -- is some thing wrong with it ?? - I need speed too
Code:
void memcpy2(void *dest, const void *src, unsigned long count,unsigned int sz);

char main()
{
 int szz = sizeof(int);
      int sz[] = {1234,12634};
  int de[] = {0,0};
    memcpy2(de,sz,2,sizeof(int));
    printf(de);
    getchar();
}
void memcpy2(void *dest, const void *src, unsigned long count ,unsigned int sz)
{
        
    count = count * sz;
 __asm {
                push ecx
            push esi
            push edi
            ;saved the past values 
             mov esi,[src]
               mov edi,[dest]
              mov ecx,0
           ;loaded the new values       
               cmp sz,1
            je cpyloopbyte
              ;use the loop accordingly
cpyloop:
           mov eax,[esi + ecx]
         mov [edi + ecx],eax
         inc ecx
             imul ecx,sz
         cmp ecx , count
             jae end
             jmp cpyloop
cpyloopbyte:
             mov al,[esi + ecx]
          mov [edi + ecx],al
          inc ecx
             cmp ecx , count
             jae end
             jmp cpyloopbyte
end:
         pop ecx
             pop esi
             pop edi
     }
}    


thx for ur help Smile
Post 02 Jan 2008, 16:33
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 02 Jan 2008, 16:57
wow... does that code work?

are you sure about "imul ecx, sz" there?
Post 02 Jan 2008, 16:57
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 02 Jan 2008, 17:01
Think about using 'rep movsb' or similar instructions.
Post 02 Jan 2008, 17:01
View user's profile Send private message Visit poster's website Reply with quote
packet_50071



Joined: 31 Oct 2007
Posts: 15
packet_50071 02 Jan 2008, 17:35
of course the code works
imul ecx, sz -- yeah - i know its only for recent processors

edit

after following revolution advise -- is this code faster !
Code:
void memcpy2(void *dest, const void *src, unsigned long count ,unsigned int sz)
{
    
    count = count * sz;
 __asm {
                push ecx
            push esi
            push edi
            ;saved the past values 
             mov esi,[src]
               mov edi,[dest]
              mov ecx,count
               ;loaded the new values       
cpy:
           rep movsb
       ; copy values from esi to edi until ecx 
end:
            pop ecx
             pop esi
             pop edi
             ;restored the past values
   }
}
    

y is the other codes complicated any way !! - or am i missing something?[/code]
Post 02 Jan 2008, 17:35
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 02 Jan 2008, 18:05
packet_50071 wrote:
imul ecx, sz -- yeah - i know its only for recent processors
I recall it's from 386 and above, so around 1990s 'recently' Laughing

Your second code should be faster.

The fact that the other are so "complicated" is because they use MMX or SSE to copy multiple bytes (more than 32-bits) in parallel. With MMX you have 64-bits (8 bytes) and with SSE 128-bits (16 bytes) at a time.

Not only that, but there are other factors involved: cache, prefetch, etc..

I hope you know about those? (they are related to how the processor works)
Post 02 Jan 2008, 18:05
View user's profile Send private message Reply with quote
packet_50071



Joined: 31 Oct 2007
Posts: 15
packet_50071 02 Jan 2008, 18:28
So using SSE is obviously much faster than mine right !
Post 02 Jan 2008, 18:28
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 02 Jan 2008, 18:33
The answer is 'it depends'. Sorry but it really does. There is no universal copy routine that is always best. SSE and cache blocking etc. are great if the amount of data to transfer is large, but fail miserably if the data sizes are small. rep movsd (and variants) are pretty good and simple to use but do have shortcomings with certain data sizes also.

If you can accurately profile your data copying requirements it would help to make a better judgement of what will work best for you.
Post 02 Jan 2008, 18:33
View user's profile Send private message Visit poster's website Reply with quote
packet_50071



Joined: 31 Oct 2007
Posts: 15
packet_50071 02 Jan 2008, 18:42
i want some thing that won't fail no matter what the data size is and do the copying as fast as it can -- therefore I need to have have both right ?

'data sizes are small' - what would u thing the small number be ?
Post 02 Jan 2008, 18:42
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 02 Jan 2008, 18:58
You want to have your cake and eat it too. Sorry, you can't have both. Just use the OS memcpy. It is pretty good for most purposes. If you have some particularly nasty edge case then you can look at optimising your own routine specifically for your needs.
Post 02 Jan 2008, 18:58
View user's profile Send private message Visit poster's website Reply with quote
packet_50071



Joined: 31 Oct 2007
Posts: 15
packet_50071 02 Jan 2008, 19:33
LOL I am making a simple OS myself - so i cannot uses other resource
Post 02 Jan 2008, 19:33
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 02 Jan 2008, 19:48
Okay, can I assume you are using a modern CPU like P4 or later?

'rep movsb' is hard to beat. The CPU has some stuff in there to make it faster under certain conditions. I think for you this might be the best solution. Try not to spend too much time worrying about optimisation yet. Standard advice is to 'get it working first then optimise if necessary'. It is good advice with many experienced programmers across the years behind it.
Post 02 Jan 2008, 19:48
View user's profile Send private message Visit poster's website Reply with quote
packet_50071



Joined: 31 Oct 2007
Posts: 15
packet_50071 02 Jan 2008, 20:08
thx revolution - i need this fuction to make the screen move up Smile
Post 02 Jan 2008, 20:08
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 02 Jan 2008, 20:41
i stop you right now.
to make a screen_move_up, you need a specialised function named screen_move_up:
not an else...
screen pixels are byte, 2bytes, 3bytes or 4bytes lengh
optimisation of this code is to make it full, with the less call as possible, a really short loop, and where are the two offsets? into kernel memory? in program memory? or in an other memory...?
Post 02 Jan 2008, 20:41
View user's profile Send private message Visit poster's website Reply with quote
packet_50071



Joined: 31 Oct 2007
Posts: 15
packet_50071 03 Jan 2008, 00:17
nah -- i am following this as a guide - and i don't think its wrong

http://www.osdever.net/bkerndev/
Post 03 Jan 2008, 00:17
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 03 Jan 2008, 08:59
packet_50071: for text mode I think a basic movsd would be quite adequate. it is only moving a few kB at a low update rate.

I think edfed assumed you were doing graphics mode moving. For that he is correct, it requires very different techniques, especially if you want to use the GPU to assist the CPU.
Post 03 Jan 2008, 08:59
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.