flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
shaolin007 12 May 2005, 20:48
Hmmm seems to me that it wouldn't make much of a difference since you can traverse an array either forwards/backwards by setting the direction flag in string operations. I skimmed over the guide and where did you see that suggestion?
|
|||
![]() |
|
WytRaven 12 May 2005, 21:09
It's in Chapter 7 under subheading Minimize Pointer Arithmetic
_________________ All your opcodes are belong to us |
|||
![]() |
|
f0dder 12 May 2005, 23:31
I would travel the arrays forward, to make sure all *current* CPUs would handle access correctly...
|
|||
![]() |
|
r22 12 May 2005, 23:52
long a[200], b[2000], c[2000];
for(int x = 0; x < 2000; x++){ c[x] = a[x] + b[x];} ---- Normal Slow Bloated Code Code: mov edi, a mov esi, b mov ebx, c .aLoop: mov eax, [edi] mov edx, [esi] add eax, edx mov [ebx], eax add edi, 4 add esi, 4 add ebx, 4 add ecx, 1 CMP ecx, 2000 JL .aLoop Notice how edi, esi, ebx, and ecx all have to be incremented. By using the built in arithmatic you can avoid all this extra code. ---- Faster Shorter Better Code going in reverse Code: mov ecx, 1999 ;MAXSIZE = 2000, MAXSIZE - 1 = 1999 .aBetterLoop: mov eax, [ecx*4 + a] mov edx, [ecx*4 + b] add eax, edx mov [ecx*4 + c], eax DEC ecx JNS .aBetterLoop Notice how there's a lot less code. By transversing the arrays in reverse we don't have to use a CMP with the counter ECX just the DEC and the conditional jump while the sign flag is NOT set (ie: while ecx is >= 0). Another plus of using the built in arithmatic is that there is less register strain (EDI, ESI and EBX don't need to be used!!!!!!!!). ---- If you CAN'T go through the array in reverse shorter better code Code: mov ecx, 0FFFFF830h ; (-MAXSIZE) negative of 2000 .aBetterLoopB: mov eax, [ecx*4 + a + 2000*4]; first iteration= mov eax, [a] mov edx, [ecx*4 + b + 2000*4] add eax,edx mov [ecx*4 + c + 2000*4], eax INC ECX JNZ .aBetterLoopB If you have to, you can transverse the arrays from least index to greatest. The counter is -2000 (negative 2000) so it's incremented +1 until it reaches -1 (if its zero the loop ends). [-1*4 + c + 2000*4] = the last address of array c, c[1999] (arrays go from index 0 - (MAX-1)). ---- The *4 is used because LONGs (refer to FOR loop at the top) are currently 4 bytes 32bits. The loops can be unrolled and further optimized using the SSE2 instructions, if you are worried about cache. Caching doesn't really play a roll unless you are unrolling the loop, in which case using prefetch opcodes, reading 64bytes at a time (I believe this is the cache line size on AMD altholons) and transversing the arrays from least index to greatest would be the best choice. |
|||
![]() |
|
WytRaven 13 May 2005, 06:19
Thx for the replies guys. That certainly helped me a lot. Much appreciated.
_________________ All your opcodes are belong to us |
|||
![]() |
|
MCD 17 May 2005, 11:20
Well, it's possible to traverse some arrays forward without any additional CMPs!! check this out:
Code: ;These 2 procedures copy an integer number of dwords ;esi: source addr ;edi: destination addr ;ecx: number of bytes to copy AND (-1-3) ;First version: add esi,ecx add edi,ecx neg ecx .CopyLoop1: mov eax,[esi+ecx] mov [edi+ecx],eax add ecx,4 jnc .CopyLoop1 ;Second version: lea esi,[esi+ecx*4] lea edi,[edi+ecx*4] neg ecx .CopyLoop2: mov eax,[esi+ecx*4] mov [edi+ecx*4],eax inc ecx jnz .CopyLoop2 The only problem with this kind of array looping is that the start addresses in both esi and edi will be destroyed ![]() _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || |
|||
![]() |
|
MCD 17 May 2005, 11:21
Well, it's possible to traverse some arrays forward without any additional CMPs!! check this out:
Code: ;These 2 procedures copy an integer number of dwords ;esi: source addr ;edi: destination addr ;ecx: number of bytes to copy AND (-1-3) ;First version: add esi,ecx add edi,ecx neg ecx .CopyLoop1: mov eax,[esi+ecx] mov [edi+ecx],eax add ecx,4 jnc .CopyLoop1 ;Second version: lea esi,[esi+ecx*4] lea edi,[edi+ecx*4] neg ecx .CopyLoop2: mov eax,[esi+ecx*4] mov [edi+ecx*4],eax inc ecx jnz .CopyLoop2 The only problem with this kind of array looping is that the start addresses in both esi and edi will be destroyed ![]() _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.