flat assembler
Message board for the users of flat assembler.

Index > Main > get sum of a Byte/Word array part in Delphi asm

Author
Thread Post new topic Reply to topic
DavidB3



Joined: 09 Apr 2014
Posts: 3
DavidB3 09 Apr 2014, 06:27
Hi,

I'm a Delphi programmer for years but this is my first try in asm.

I'm trying to make 2 functions similar with SumInt (it sums all items from an Integer array) but with 2 differences:
1) Should work with Byte and Word array;
2) Should sum only a part of the array.

The original SumInt function from Delphi:

Code:
function SumInt(const Data: array of Integer): Integer;
asm  // IN: EAX = ptr to Data, EDX = High(Data) = Count - 1
     // loop unrolled 4 times, 5 clocks per loop, 1.2 clocks per datum
      PUSH EBX
      MOV  ECX, EAX         // ecx = ptr to data
      MOV  EBX, EDX
      XOR  EAX, EAX
      AND  EDX, not 3
      AND  EBX, 3
      SHL  EDX, 2
      JMP  @Vector.Pointer[EBX*4]
@Vector:
      DD @@1
      DD @@2
      DD @@3
      DD @@4
@@4:
      ADD  EAX, [ECX+12+EDX]
      JO   RaiseOverflowError
@@3:
      ADD  EAX, [ECX+8+EDX]
      JO   RaiseOverflowError
@@2:
      ADD  EAX, [ECX+4+EDX]
      JO   RaiseOverflowError
@@1:
      ADD  EAX, [ECX+EDX]
      JO   RaiseOverflowError
      SUB  EDX,16
      JNS  @@4
      POP  EBX
end;    


So far I've done those:

Code:
function SumByte(const PDataStart: Pointer; const Count: Integer): Integer;
// EAX is PDataStart and the result, EDX is Count
asm
      MOV  ECX, EAX
      XOR  EAX, EAX
      CMP  EDX, 0
      JE   @end
      PUSH EBX
      XOR  EBX, EBX
      SUB  ECX, 1
      @loop:

      MOV  BL, [ECX + EDX]
      ADD  EAX, EBX

      DEC  EDX
      JNZ  @loop
      POP  EBX
      @end:
end;

function SumWord(const PDataStart: Pointer; const Count: Integer): Integer;
// EAX is PDataStart and the result, EDX is Count
asm
      MOV  ECX, EAX
      XOR  EAX, EAX
      CMP  EDX, 0
      JE   @end
      PUSH EBX
      SUB  ECX, 2
      SHL  EDX, 1
      XOR  EBX, EBX
      @loop:

      MOV  BX, [ECX + EDX]
      ADD  EAX, EBX

      SUB  EDX, 2
      JNZ  @loop
      POP  EBX
      @end:
end;    


Usage: SumInt/SumWord(@Array[Start], Count)

They seem to work but:
1) I don't know if it's the fastest code. Is there a faster one (but safe)?
2) The speed tests give some weird results. Sometimes they are faster than regular code, sometimes they are slower or with the same speed. It depends on how and where I add other code outside the test and outside these functions (so theoretically it shouldn't influence the result).

Could you please help me?
Thank you.

Delphi version and OS: 7 + XP SP3 and XE5 + 7 SP1 x86

Regards,
David

PS: I tried to use Code tags but they don't seem to work, sorry

Use the forward slash. Okay, I already did it for you. revolution
Post 09 Apr 2014, 06:27
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20630
Location: In your JS exploiting you and your system
revolution 09 Apr 2014, 06:36
Speed tests for something like this will not be useful. As you saw the results vary a lot because internally the CPU is doing other things that sometimes do and sometimes don't affect the timings.

I'd suggest that you need to look at your entire program to see if and where it needs optimising. Little functions like this are only worthwhile looking at if such a thing is called thousands of times per second for endless hours where saving 5% runtime might actually have some realtime benefit.

Also, what might be faster on your system could be slower on another system. If your algorithm is strong then such small linear timing details won't matter much.
Post 09 Apr 2014, 06:36
View user's profile Send private message Visit poster's website Reply with quote
DavidB3



Joined: 09 Apr 2014
Posts: 3
DavidB3 09 Apr 2014, 07:06
Thank you.

revolution wrote:
Speed tests for something like this will not be useful. As you saw the results vary a lot because internally the CPU is doing other things that sometimes do and sometimes don't affect the timings.


Ok, but how can I know if it's faster than regular code code and it's worth using?

revolution wrote:
I'd suggest that you need to look at your entire program to see if and where it needs optimising. Little functions like this are only worthwhile looking at if such a thing is called thousands of times per second for endless hours where saving 5% runtime might actually have some realtime benefit.


I already optimized the most part of the code.
And these function are sometimes called millions of times per minute. So it's worth trying to code them in assembler.
Post 09 Apr 2014, 07:06
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20630
Location: In your JS exploiting you and your system
revolution 09 Apr 2014, 08:44
DavidB3 wrote:
Ok, but how can I know if it's faster than regular code code and it's worth using?
If you can't detect any significant changes in runtime then that is your answer: It is neither faster or slower.

If it was significantly faster, and you are calling it millions of times, then you would know pretty quickly by the reduced runtimes.

Optimising for speed is hard. It is not just a simple matter of writing a tighter loop or avoiding div. Things like algorithm selection, cache management and streaming data to DRAM etc. are where the majority of speed-ups are to be found. Usually only horribly inefficient code will be responsive to simplistic instruction level improvements. And these days many compilers do a reasonable job of not producing horribly inefficient code (as long as you get the algorithms right).
Post 09 Apr 2014, 08:44
View user's profile Send private message Visit poster's website Reply with quote
DavidB3



Joined: 09 Apr 2014
Posts: 3
DavidB3 09 Apr 2014, 09:16
revolution wrote:
If you can't detect any significant changes in runtime then that is your answer: It is neither faster or slower.

If it was significantly faster, and you are calling it millions of times, then you would know pretty quickly by the reduced runtimes.


As I mentioned the test results are weird.
I tried directly with the application's code. It showed ~23% speed increase.
Ok. I added some code in other areas (that has nothing to do with the code involved in test). I tested again, this time was no difference in speed (?!).
I'm not new in this kind of testing and ALWAYS I got consistent results.
This is happening ONLY when I start using the assembler from Delphi.
So either it's a bug in Delphi assembler (but I doubt that) or it's a bug in my assembler code which allows it to access memory it shouldn't have.
Post 09 Apr 2014, 09:16
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20630
Location: In your JS exploiting you and your system
revolution 09 Apr 2014, 09:22
DavidB3 wrote:
I added some code in other areas (that has nothing to do with the code involved in test).
Well that could be your problem. Everything the CPU does affects the other things. This can happen by loading/evicting more data or code to/from the caches. Or because code or data alignment has changed. Or a number of other things that have can an effect. This is something I mentioned above, the whole program needs to be assessed to see where changes are most effective.
Post 09 Apr 2014, 09:22
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.