flat assembler
Message board for the users of flat assembler.

Index > Main > optimization help

Author
Thread Post new topic Reply to topic
ofoltd



Joined: 26 Oct 2008
Posts: 1
ofoltd 26 Oct 2008, 01:13
Hi,

I have the following piece of code which my app will have to traverse many (billions) of times:

Code:
        mov     eax, [infDat + edx*sizeof.INF_DAT + INF_DAT.x]
        mov     eax, [infInfo + INF_INFO.pos + eax*sizeof.INF_INFO]
        shl     eax, 12
        mov     ebx, [infDat + edx*sizeof.INF_DAT + INF_DAT.y]
        mov     ebx, [infInfo + ebx*sizeof.INF_INFO + INF_INFO.pos]
        shl     ebx, 2
        add     eax, ebx
        add     eax, tLM
        mov     eax, [eax]
        add     eax, tLM
    


I have been trying to tweak and optimize it, but can't get it any smaller\simpler. Looking over it, I'm sure I'm missing something to shorten it but think I'm probably in brain freeze at the moment. Any ideas would be greatly appreciated :O)

tia

ofoltd
Post 26 Oct 2008, 01:13
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 26 Oct 2008, 02:21
Something I've spotted:
Code:
        shl     ebx, 2
        add     eax, ebx
        add     eax, tLM
    


Can be
Code:
lea     eax, [eax+ebx*4+tLM]    
(Supposing you don't need EBX scaled later)

Could you tell more about this code, like what it does and how it will be traversed billons of times? It is inside a loop? It is important to know that so the code can be optimized to fit in its surrounding code better.
Post 26 Oct 2008, 02:21
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 26 Oct 2008, 03:57
Code:
;;;ALIGN 16
       mov     eax, [infDat + edx*sizeof.INF_DAT + INF_DAT.x]
       mov     ebx, [infDat + edx*sizeof.INF_DAT + INF_DAT.y]
       mov     eax, [infInfo + INF_INFO.pos + eax*sizeof.INF_INFO]
       mov     ebx, [infInfo + ebx*sizeof.INF_INFO + INF_INFO.pos]
       shl       eax, 12
       lea       eax, [eax+ebx*4+tLM]
       mov     eax, [eax]
       add      eax, tLM 
    

Make sure the code is aligned if its in a loop or a function your calling. Proper code alignment can improve performance 5+%

You'll take a penalty from all the consecutive reads/writes with EAX.
But if you want to really optimize a piece of code you should start with the structure of the DATA the code is accessing.

If you can't modify the underlying data structures your trying to access then the above (interleaving the eax/ebx MOVs, using LEA to combine the multiply and add [LocoDel's idea], and aligning the code will probably be your best bet.
IF you CAN modify the underlying data structures then try to de-normalize, interleave the elements from the result sets to archive a structure you can access with SIMD to perform more of 'whatever the hell your code is doing' per iteration.
Post 26 Oct 2008, 03:57
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 26 Oct 2008, 04:27
ofoltd,

Or even better, simply
Code:
;;;     lea     eax, [eax+ebx*4+tLM]
;;;     mov     eax, [eax]
        mov     eax, [eax+ebx*4+tLM]    
At a first glance I thought that tLM is a 1024*1024 array of DWORDs, indexed by infInfo[infDat[edx].x].pos and infInfo[infDat[edx].x].pos, but last add eax, tLM confuses me much. Does tLM contains offsets into itself?
Post 26 Oct 2008, 04:27
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.