flat assembler
Message board for the users of flat assembler.

Index > Windows > timing functions

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 15 Dec 2012, 15:21
I made a third sample, with 10 different ways of dividing a number by 2 and comparing the speeds in a graph. Smile

Image


Description:
Download
Filename: div_by_2.zip
Filesize: 6.46 KB
Downloaded: 236 Time(s)

Post 15 Dec 2012, 15:21
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 15 Dec 2012, 18:42
doesn't work on Win7(x64, Ultimate)
Post 15 Dec 2012, 18:42
View user's profile Send private message Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 15 Dec 2012, 19:20
what happens?
Post 15 Dec 2012, 19:20
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 15 Dec 2012, 19:35
Nothing. Probably an exception somewhere. Don't have time to debug it.

I double click it and nothing happens.
Post 15 Dec 2012, 19:35
View user's profile Send private message Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 15 Dec 2012, 19:50
It has been tested on 32 and 64 bit windows 7 and it works well here. I also have 100% exception handling in it, it doesn't go by without a single error being detected. I seldom use this much exception handling but since it is a library I decided to add 100% exception handling, not a single api call goes unhandled.
Post 15 Dec 2012, 19:50
View user's profile Send private message Reply with quote
mindcooler



Joined: 01 Dec 2009
Posts: 423
Location: Västerås, Sweden
mindcooler 15 Dec 2012, 19:55
Requires SSE4.
Post 15 Dec 2012, 19:55
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 15 Dec 2012, 19:56
The example requires SSE4, but the library does not. If you don't have SSE4 then that might be it.

If you download the library you can customize many things in the source file. Very Happy
Post 15 Dec 2012, 19:56
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 15 Dec 2012, 20:53
Hmm. I wanted to make sure. I used this little nifty tool(http://www.cpuid.com/).

I have up to 3S. Sad

Image
Post 15 Dec 2012, 20:53
View user's profile Send private message Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 15 Dec 2012, 20:56
Isn't that annoying Smile

I guess I have to implement cpuid for sse now too. That will be my next task.


Last edited by nmake on 15 Dec 2012, 20:57; edited 1 time in total
Post 15 Dec 2012, 20:56
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 15 Dec 2012, 20:57
^^Well you can always make a backward compatibility hack. Wink
Post 15 Dec 2012, 20:57
View user's profile Send private message Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 16 Dec 2012, 07:02
In case you wonder what it would look like if you had sse4, I bought this cpu this summer btw. Overclock-wise I regret the deal, feature-wise I love the choice I made considering it supports the new random instruction on chip, avx, aes and the low die size.

Image
Post 16 Dec 2012, 07:02
View user's profile Send private message Reply with quote
Fred



Joined: 22 Oct 2010
Posts: 39
Fred 19 Dec 2012, 13:43
Hm, I'll have to try this. Can it time 64 bit code?
Post 19 Dec 2012, 13:43
View user's profile Send private message Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 19 Dec 2012, 14:10
Well, I did test on a 64 bit windows 7 machine and it worked, but of course that might not be the case for everyone.

If you time functions up against each other, remember to do warmup of the functions before testing so the function have a chance to cache, or else the time may vary a bit.

Like this for example:
Code:
        rept 10 { stdcall function1 }   ; Prefetch before timing
        invoke start_timing,0           ; Start timing
        stdcall function1               ; function call
        invoke stop_ticks,FALSE         ; Stop timing
    

The rept 10 will warmup and cache the function I want to time.

If you do not warmup, the closest function that is positioned closest to start_timing might be cached along with the code that starts the timing, and function 2 might not be within that cacheline, and they will differ. Cache both before timing.
Post 19 Dec 2012, 14:10
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 19 Dec 2012, 16:12
Can these (SSE4) be found under virtualization ? Or do the Virtual Machines actually rely on the machine itself to emulate so?

I think that'd be pretty cool.
Post 19 Dec 2012, 16:12
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 20 Dec 2012, 08:25
Actually it really doesn't matter much if you used:
Code:
proc div2_2 num
;     shr [num],1
     mov eax,[num]
      shr eax,1
     ret
endp   
    

or
Code:
proc div2_2 num
;     shr [num],1
     mov eax,[num]
      shr eax,1 ; times 10 shr eax,1 doesn't work, I think I found a bug in FASM 1.70.02 Smile
      shr eax,1 ; anyone cares to test that in newer versions?
      shr eax,1 ; times 10 ror eax,1 does work
      shr eax,1
      shr eax,1
      shr eax,1
      shr eax,1
      shr eax,1
      shr eax,1
      shr eax,1
     ret
endp   
    

when benchmarking because the call/ret overhead is so great that the tiny shr has only marginal effect.

Btw, the graphics look very nice. I love visual stuff.
Post 20 Dec 2012, 08:25
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 20 Dec 2012, 08:58
Thanks, it is simple GDI, brush and pens Smile (Source code is in first page at the bottom of this thread)
Post 20 Dec 2012, 08:58
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 20 Dec 2012, 09:50
Madis731 wrote:
times 10 shr eax,1 doesn't work
The expression calculator is greedy so it tries to use as much of the line as possible. It finds that "10 shr eax" cannot be converted to a simple number.

edit: forgot to mention that you can fix it with a colon:
Code:
times 10: shr eax,1    
Post 20 Dec 2012, 09:50
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 20 Dec 2012, 11:06
revolution wrote:
The expression calculator is greedy....

I guessed it was something along that line.

revolution wrote:
you can fix it with a colon

Indeed I can, thanks!

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 20 Dec 2012, 11:06
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.