flat assembler
Message board for the users of flat assembler.

Index > Main > Tranversing in reverse

Author
Thread Post new topic Reply to topic
WytRaven



Joined: 08 Sep 2004
Posts: 45
Location: Canberra, Australia
WytRaven 12 May 2005, 18:36
Ok, I'm an ASM try-hard, I'm trying hard to learn.

I came across something recently in the Athlon Optimization Guide from AMD recently which has me confused.

Basically it's recommending transversing arrays in reverse Shocked. That, I would have thought, would screw up caching. Obviously if AMD is recommending it then that can't be the case.

I was just wondering if anyone here has enough know-how to explain to a newb how it is that this doesn't screw with caching.

_________________
All your opcodes are belong to us
Post 12 May 2005, 18:36
View user's profile Send private message Reply with quote
shaolin007



Joined: 03 Sep 2004
Posts: 65
shaolin007 12 May 2005, 20:48
Hmmm seems to me that it wouldn't make much of a difference since you can traverse an array either forwards/backwards by setting the direction flag in string operations. I skimmed over the guide and where did you see that suggestion?
Post 12 May 2005, 20:48
View user's profile Send private message Reply with quote
WytRaven



Joined: 08 Sep 2004
Posts: 45
Location: Canberra, Australia
WytRaven 12 May 2005, 21:09
It's in Chapter 7 under subheading Minimize Pointer Arithmetic

_________________
All your opcodes are belong to us
Post 12 May 2005, 21:09
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 12 May 2005, 23:31
I would travel the arrays forward, to make sure all *current* CPUs would handle access correctly...
Post 12 May 2005, 23:31
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 12 May 2005, 23:52
long a[200], b[2000], c[2000];
for(int x = 0; x < 2000; x++){
c[x] = a[x] + b[x];}
----
Normal Slow Bloated Code
Code:
mov edi, a
mov esi, b
mov ebx, c
.aLoop:
mov eax, [edi]
mov edx, [esi]
add eax, edx
mov [ebx], eax
add edi, 4
add esi, 4
add ebx, 4
add ecx, 1
CMP ecx, 2000
JL .aLoop
    

Notice how edi, esi, ebx, and ecx all have to be incremented. By using the built in arithmatic you can avoid all this extra code.
----
Faster Shorter Better Code going in reverse
Code:
mov ecx, 1999   ;MAXSIZE = 2000, MAXSIZE - 1 = 1999
.aBetterLoop:
mov eax, [ecx*4 + a]
mov edx, [ecx*4 + b]
add eax, edx
mov [ecx*4 + c], eax
DEC ecx
JNS .aBetterLoop
    

Notice how there's a lot less code. By transversing the arrays in reverse we don't have to use a CMP with the counter ECX just the DEC and the conditional jump while the sign flag is NOT set (ie: while ecx is >= 0).
Another plus of using the built in arithmatic is that there is less register strain (EDI, ESI and EBX don't need to be used!!!!!!!!).
----
If you CAN'T go through the array in reverse shorter better code
Code:
mov ecx, 0FFFFF830h ; (-MAXSIZE) negative of 2000
.aBetterLoopB:
mov eax, [ecx*4 + a + 2000*4]; first iteration= mov eax, [a]
mov edx, [ecx*4 + b + 2000*4]
add eax,edx
mov [ecx*4 + c + 2000*4], eax
INC ECX
JNZ .aBetterLoopB
    

If you have to, you can transverse the arrays from least index to greatest. The counter is -2000 (negative 2000) so it's incremented +1 until it reaches -1 (if its zero the loop ends). [-1*4 + c + 2000*4] = the last address of array c, c[1999] (arrays go from index 0 - (MAX-1)).
----

The *4 is used because LONGs (refer to FOR loop at the top) are currently 4 bytes 32bits. The loops can be unrolled and further optimized using the SSE2 instructions, if you are worried about cache.

Caching doesn't really play a roll unless you are unrolling the loop, in which case using prefetch opcodes, reading 64bytes at a time (I believe this is the cache line size on AMD altholons) and transversing the arrays from least index to greatest would be the best choice.
Post 12 May 2005, 23:52
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
WytRaven



Joined: 08 Sep 2004
Posts: 45
Location: Canberra, Australia
WytRaven 13 May 2005, 06:19
Thx for the replies guys. That certainly helped me a lot. Much appreciated.

_________________
All your opcodes are belong to us
Post 13 May 2005, 06:19
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 17 May 2005, 11:20
Well, it's possible to traverse some arrays forward without any additional CMPs!! check this out:
Code:
;These 2 procedures copy an integer number of dwords

;esi: source addr
;edi: destination addr
;ecx: number of bytes to copy AND (-1-3)

;First version:
add esi,ecx
add edi,ecx
neg ecx
.CopyLoop1:
mov eax,[esi+ecx]
mov [edi+ecx],eax
add ecx,4
jnc .CopyLoop1

;Second version:
lea esi,[esi+ecx*4]
lea edi,[edi+ecx*4]
neg ecx
.CopyLoop2:
mov eax,[esi+ecx*4]
mov [edi+ecx*4],eax
inc ecx
jnz .CopyLoop2
    

The only problem with this kind of array looping is that the start addresses in both esi and edi will be destroyed Sad but ey, at least the actual loop, which takes the most of the execution time, is shorter and faster!

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 17 May 2005, 11:20
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 17 May 2005, 11:21
Well, it's possible to traverse some arrays forward without any additional CMPs!! check this out:
Code:
;These 2 procedures copy an integer number of dwords

;esi: source addr
;edi: destination addr
;ecx: number of bytes to copy AND (-1-3)

;First version:
add esi,ecx
add edi,ecx
neg ecx
.CopyLoop1:
mov eax,[esi+ecx]
mov [edi+ecx],eax
add ecx,4
jnc .CopyLoop1

;Second version:
lea esi,[esi+ecx*4]
lea edi,[edi+ecx*4]
neg ecx
.CopyLoop2:
mov eax,[esi+ecx*4]
mov [edi+ecx*4],eax
inc ecx
jnz .CopyLoop2
    

The only problem with this kind of array looping is that the start addresses in both esi and edi will be destroyed Sad but ey, at least the actual loop, which takes the most of the execution time, is shorter and faster!

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 17 May 2005, 11:21
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.