flat assembler
Message board for the users of flat assembler.
Index
> Main > dword aligned rep movsd |
Author |
|
LocoDelAssembly 25 Nov 2006, 05:03
I think you can replace
Code: mov eax, esi and eax, -4 sub eax, esi Code: test esi, 3 Or you need EAX for something at return? |
|||
25 Nov 2006, 05:03 |
|
LocoDelAssembly 25 Nov 2006, 05:20
Wait, your algo has a bug, what if ESI is aligned but ECX < 4? Or what if ESI = (addr & -4) + 1..3 and ECX = 4?
Sorry, it's too late here now to think in a solution I'll give to it another view tomorrow (if no other forum member provides a solution). Regards |
|||
25 Nov 2006, 05:20 |
|
Kain 25 Nov 2006, 09:14
Thanks, that's much better
I should have mentioned that I won't even bother to call this if ecx <=4 (I'll do a rep movsb on the spot) which means there was also an unnecessary jz in there. and the buffer always has extra padding bytes to catch overflows so... Code: testalign: ; test for dword boundary ; esi = src memory address ; edi = dest memory address ; ecx > 4 number of bytes to move test esi, 3 jz @f movsb dec ecx jmp testalign @@: shr ecx, 2 rep movsd ret _________________ :sevag.k |
|||
25 Nov 2006, 09:14 |
|
smiddy 25 Nov 2006, 14:02
Nice code! (I hope you don't mind if I use it),
I realize I'm opening up a can of worms with this question, but are there issues with not ensuring DWORD alignment? For that matter what about paragraph alignment or other alignments, is there a rule of thumb to use? I recall there where issues in VGA with segment alignments but if you're using 32-bit memory are these issues still issues? |
|||
25 Nov 2006, 14:02 |
|
vid 25 Nov 2006, 14:15
smiddy: do you have visual studio installed? look at
"c:\Program Files\Microsoft Visual Studio 8\VC\crt\src\intel\memcpy.asm" |
|||
25 Nov 2006, 14:15 |
|
smiddy 25 Nov 2006, 14:52
I have an older version installed, but I expect the information should be the same. When I get the opportunity I'll give it a look see. A quick looks assumes ES and DS are the same. I'll have to critically look at it, thanks. I never considered looking at these puppies.
|
|||
25 Nov 2006, 14:52 |
|
Kain 26 Nov 2006, 00:19
smiddy wrote: Nice code! (I hope you don't mind if I use it), You may use any code I post, but with caution. Sometimes it is buggy and sometimes it is very specialized. I haven't don't much with alignments before so this is pretty new for me. The code above would be safer with: Code: @@: push ecx shr ecx, 2 rep movsd pop ecx ; take care of any remaining bytes and ecx, 3 rep movsb ret _________________ :sevag.k |
|||
26 Nov 2006, 00:19 |
|
Kain 26 Nov 2006, 00:49
Here is a more specialized function based on what I have learned so far.
Seems to work with a couple dozen test cases Typically, this is for splitting n number of spaces in a string. esi = address of last useful byte in buffer edi = esi + n ecx = esi - (addr buffer + split_index ) Code: fixalign_std: ; aligned rep movsd with direction bit set ; esi points to src address = byte from which to start ; edi points to dst address ; ecx num bytes to move std align_on_odd: ; make sure esi is on odd boundary ; so that esi - 3 = dword alignment test esi,1 jnz test_count; movsb dec ecx jz done jmp align_on_odd test_count: ; esi is odd, check remaining bytes to move cmp ecx,3 ja align_ok rep movsb jmp done align_ok: ; ecx > 3 & (esi-3 = align 4) sub esi,3 sub edi,3 push ecx shr ecx,2 rep movsd ; take care of remaining bytes, if any pop ecx and ecx,3 jz done; ; readjust after movsd for byte operation add esi, 3 add edi, 3 rep movsb done: cld ret Now, in this case, esi is aligned, but edi will only be aligned if n is a multiple of 4. I wonder if both src and dest have to be aligned to avoid a stall. _________________ :sevag.k |
|||
26 Nov 2006, 00:49 |
|
Matrix 26 Nov 2006, 15:47
Kain wrote: Here's my feeble attempt. I'm sure there are much better algo's. I'd like to hear suggestions. hi Kain you have another bug there, you align the address, but what if ecx is not multiple of 2 ? in addition to dword aligning esi, we should decrease ecx to be multiple of 4, store the remaining value, and after completing the copy, putting the remaining few bytes there too, so not only +-3 bytes will be copied. This is a special case, but not so special if you are doing some serious programmig, and you want to make a stable memory copy algorithm that copies <ecx> bytes. Also, what is, if source address is aligned, and destination address is not? or source address is not aligned, and destination address is. This is an interesting question i think. Which one whould you like to align? Source address or destination address, or both? btw, you might find this thread interesting Fastest Memory Copying Algorithms i'm having much todo now just migrated to linux from windows, but im here, and i'm goung to have more time from now on. |
|||
26 Nov 2006, 15:47 |
|
Vasilev Vjacheslav 26 Nov 2006, 17:18
for faster memory copying also check amd code optimization guide, very useful
|
|||
26 Nov 2006, 17:18 |
|
Kain 26 Nov 2006, 18:25
Thanks Matrix, that's a useful article, but I'm sticking with movsd- for compatibility. Have you tried the MEM_copy_5 or less algos with an aligned source?
I may also look into the floating-point ones. What kind of timing did you get with those compared to movsd? |
|||
26 Nov 2006, 18:25 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.