flat assembler
Message board for the users of flat assembler.

Index > Main > dword aligned rep movsd

Author
Thread Post new topic Reply to topic
Kain



Joined: 26 Oct 2003
Posts: 108
Kain
Here's my feeble attempt. I'm sure there are much better algo's. I'd like to hear suggestions.

Code:
testalign_cld:
        ; test for dword alignment in esi
        ; esi = src memory address
        ; edi = dest memory address
        ; ecx = number of bytes to move
        mov             eax, esi
        and             eax, -4
        sub             eax, esi
        jz              passed
        movsb
        dec             ecx
        jz                done
        jmp             testalign_cld
passed:
        shr             ecx, 2
        rep             movsd
done:
        ret
    

_________________
:sevag.k
Post 25 Nov 2006, 04:45
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I think you can replace
Code:
        mov             eax, esi
        and             eax, -4
        sub             eax, esi
    
With
Code:
        test             esi, 3
    


Or you need EAX for something at return?
Post 25 Nov 2006, 05:03
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Wait, your algo has a bug, what if ESI is aligned but ECX < 4? Or what if ESI = (addr & -4) + 1..3 and ECX = 4?

Sorry, it's too late here now to think in a solution Razz

I'll give to it another view tomorrow (if no other forum member provides a solution).

Regards
Post 25 Nov 2006, 05:20
View user's profile Send private message Reply with quote
Kain



Joined: 26 Oct 2003
Posts: 108
Kain
Thanks, that's much better Smile

I should have mentioned that I won't even bother to call this if ecx <=4 (I'll do a rep movsb on the spot) which means there was also an unnecessary jz in there. and the buffer always has extra padding bytes to catch overflows so...

Code:
testalign:
        ; test for dword boundary
        ; esi = src memory address
        ; edi = dest memory address
        ; ecx > 4       number of bytes to move
        test    esi, 3
        jz              @f
        movsb
        dec             ecx
        jmp             testalign
@@:
        shr             ecx, 2
        rep             movsd
        ret
    

_________________
:sevag.k
Post 25 Nov 2006, 09:14
View user's profile Send private message Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 559
smiddy
Nice code! (I hope you don't mind if I use it),

I realize I'm opening up a can of worms with this question, but are there issues with not ensuring DWORD alignment? For that matter what about paragraph alignment or other alignments, is there a rule of thumb to use? I recall there where issues in VGA with segment alignments but if you're using 32-bit memory are these issues still issues?
Post 25 Nov 2006, 14:02
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
smiddy: do you have visual studio installed? look at

"c:\Program Files\Microsoft Visual Studio 8\VC\crt\src\intel\memcpy.asm"
Post 25 Nov 2006, 14:15
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 559
smiddy
I have an older version installed, but I expect the information should be the same. When I get the opportunity I'll give it a look see. A quick looks assumes ES and DS are the same. I'll have to critically look at it, thanks. I never considered looking at these puppies.
Post 25 Nov 2006, 14:52
View user's profile Send private message Reply with quote
Kain



Joined: 26 Oct 2003
Posts: 108
Kain
smiddy wrote:
Nice code! (I hope you don't mind if I use it),

I realize I'm opening up a can of worms with this question, but are there issues with not ensuring DWORD alignment? For that matter what about paragraph alignment or other alignments, is there a rule of thumb to use? I recall there where issues in VGA with segment alignments but if you're using 32-bit memory are these issues still issues?


You may use any code I post, but with caution. Sometimes it is buggy and sometimes it is very specialized. I haven't don't much with alignments before so this is pretty new for me.

The code above would be safer with:
Code:
@@:
        push   ecx
        shr     ecx, 2
        rep     movsd
        pop    ecx
        ; take care of any remaining bytes
        and    ecx, 3
        rep     movsb
        ret 
    

_________________
:sevag.k
Post 26 Nov 2006, 00:19
View user's profile Send private message Reply with quote
Kain



Joined: 26 Oct 2003
Posts: 108
Kain
Here is a more specialized function based on what I have learned so far.

Seems to work with a couple dozen test cases

Typically, this is for splitting n number of spaces in a string.

esi = address of last useful byte in buffer
edi = esi + n
ecx = esi - (addr buffer + split_index )

Code:
fixalign_std:
        ; aligned rep movsd with direction bit set
        ; esi   points to src address = byte from which to start
        ; edi   points to dst address
        ; ecx   num bytes to move
        
        std
align_on_odd:
        ; make sure esi is on odd boundary
        ; so that esi - 3 = dword alignment
        test    esi,1
        jnz     test_count;
        movsb
        dec     ecx
        jz      done
        jmp     align_on_odd
        
test_count:
        ; esi is odd, check remaining bytes to move
        cmp     ecx,3
        ja      align_ok
        rep     movsb
        jmp     done
                
align_ok:
        ; ecx > 3 & (esi-3 = align 4)
        sub     esi,3 
        sub     edi,3
        push    ecx
        shr     ecx,2
        rep     movsd
        ; take care of remaining bytes, if any
        pop     ecx
        and     ecx,3
        jz done;
        ; readjust after movsd for byte operation
        add     esi, 3
        add     edi, 3
        rep     movsb

done:
        cld
        ret
    



Now, in this case, esi is aligned, but edi will only be aligned if n is a multiple of 4. I wonder if both src and dest have to be aligned to avoid a stall.

_________________
:sevag.k
Post 26 Nov 2006, 00:49
View user's profile Send private message Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
Kain wrote:
Here's my feeble attempt. I'm sure there are much better algo's. I'd like to hear suggestions.

Code:
testalign_cld:
        ; test for dword alignment in esi
        ; esi = src memory address
        ; edi = dest memory address
        ; ecx = number of bytes to move
        mov             eax, esi
        and             eax, -4
        sub             eax, esi
        jz              passed
        movsb
        dec             ecx
        jz                done
        jmp             testalign_cld
passed:
        shr             ecx, 2
        rep             movsd
done:
        ret
    


hi Kain
you have another bug there,
you align the address, but what if ecx is not multiple of 2 ?

in addition to dword aligning esi, we should decrease ecx to be multiple of 4, store the remaining value, and after completing the copy, putting the remaining few bytes there too, so not only +-3 bytes will be copied.
This is a special case, but not so special if you are doing some serious programmig, and you want to make a stable memory copy algorithm that copies <ecx> bytes.

Also, what is, if source address is aligned, and destination address is not?
or source address is not aligned, and destination address is.
This is an interesting question i think.
Which one whould you like to align? Smile
Source address or destination address, or both? Wink

btw,
you might find this thread interesting Fastest Memory Copying Algorithms

i'm having much todo now just migrated to linux from windows, but im here, and i'm goung to have more time from now on.
Post 26 Nov 2006, 15:47
View user's profile Send private message Visit poster's website Reply with quote
Vasilev Vjacheslav



Joined: 11 Aug 2004
Posts: 392
Vasilev Vjacheslav
for faster memory copying also check amd code optimization guide, very useful
Post 26 Nov 2006, 17:18
View user's profile Send private message Reply with quote
Kain



Joined: 26 Oct 2003
Posts: 108
Kain
Thanks Matrix, that's a useful article, but I'm sticking with movsd- for compatibility. Have you tried the MEM_copy_5 or less algos with an aligned source?

I may also look into the floating-point ones. What kind of timing did you get with those compared to movsd?
Post 26 Nov 2006, 18:25
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.