flat assembler
Message board for the users of flat assembler.

Index > Windows > timing functions

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 10 Dec 2012, 11:06
I made a library with timing functions, if anyone want it. Usage is:

1. Call start_timing before the code you want to time.
2. Call stop_xxxx to stop timing and output result.
xxx is second/milli/micro or ticks. (4 functions to choose from)

Example is included.

ImageImageImageImage


Description:
Download
Filename: timing.zip
Filesize: 9.72 KB
Downloaded: 275 Time(s)



Last edited by nmake on 12 Dec 2012, 05:35; edited 2 times in total
Post 10 Dec 2012, 11:06
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 10 Dec 2012, 12:01
If this is always the same timespan only more detailed measure, what does it show in timer ticks ? 3 milliseconds shoud be about 10 million ticks depending on your processor, not just 10.000 ? Shocked
Post 10 Dec 2012, 12:01
View user's profile Send private message Send e-mail Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 10 Dec 2012, 12:07
QueryPerformanceCounter usually ticks by your cpu frequency, but not necessarily according to intel. Anyway, this is a ring 3 timing library, if you want to use instructions like rdtsc, you need ring 0 access.

(10849 / 3079) * 1000000 = 3,523,546

which is about the frequency of my cpu atm, proportionally.
Post 10 Dec 2012, 12:07
View user's profile Send private message Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 10 Dec 2012, 13:50
Fixed a bug in the function x87_support and re-uploaded the zip file again.
Post 10 Dec 2012, 13:50
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Dec 2012, 14:01
Quote:

if you want to use instructions like rdtsc, you need ring 0 access.
It really doesn't work on your computer? Although it is possible to disable non ring 0 access, Windows leaves it available.
Post 10 Dec 2012, 14:01
View user's profile Send private message Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 10 Dec 2012, 14:09
I tried it many times, it doesn't work here. I think I need to debug my own program to make it work or write it in its own driver.
Post 10 Dec 2012, 14:09
View user's profile Send private message Reply with quote
marcinzabrze12



Joined: 07 Aug 2011
Posts: 61
marcinzabrze12 10 Dec 2012, 15:32
Code:
format pe gui 4.0
entry start
section '.code' code readable executable
start:
        xor     eax, eax
        cpuid
        rdtsc
        ret 
    


try like above. Have exception ?
Post 10 Dec 2012, 15:32
View user's profile Send private message Send e-mail Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 10 Dec 2012, 16:07
Hi, yes I think I tried that some time ago. I will try it again when I have time. Smile
Post 10 Dec 2012, 16:07
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 10 Dec 2012, 22:02
nmake wrote:

(10849 / 3079) * 1000000 = 3,523,546

which is about the frequency of my cpu atm, proportionally.


You have a cpu with 3,523,546 Very Happy ?
This is less than the first 8086 cpu with 4.77 MHz.

I think you mean 3,523,546,000 which is about 3.5 GHz ?

That's the question for, I think you forgot to multiply with 1000.
Post 10 Dec 2012, 22:02
View user's profile Send private message Send e-mail Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 10 Dec 2012, 23:26
"proportionally" Smile A cpu with 133 MHz and a multiplier of 10 gives 1.3 GHz, the internal clock cycle is not the same as the external one.
Post 10 Dec 2012, 23:26
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 11 Dec 2012, 12:36
nmake wrote:
"proportionally" Smile A cpu with 133 MHz and a multiplier of 10 gives 1.3 GHz, the internal clock cycle is not the same as the external one.

I used several times rdtsc and it is derived directly from cpu clock.
There maybe an internal divider but sure not with factor 1000 nor with factor 10.
So there must be something wrong with your measurements if you get only about 10 ticks in 3 microseconds. Be sure. Very Happy

It maybe has to do with running under WIN operating system which gives only partly execution time to your program/thread. Try to use rdtsc on DOS and you will see a big difference.
Post 11 Dec 2012, 12:36
View user's profile Send private message Send e-mail Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 11 Dec 2012, 12:41
nmake wrote:
Anyway, this is a ring 3 timing library, if you want to use instructions like rdtsc, you need ring 0 access.


Ah I see. So this is not the timer ticks from CPU. So I wouldn't call it ticks because people may think you are talking about the Intel timer ticks.

I would take the microsecond timer only because the timer with about 300 nanoseconds make no big difference and you have the problem that this value is depending on processors.
Post 11 Dec 2012, 12:41
View user's profile Send private message Send e-mail Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 11 Dec 2012, 14:11
shutdownall, I can assure you it is correct. QueryPerformancexx works that way. When you use the stop_ticks call, it does not divide anything, it simply just subtracts the old counter from the new, it doesn't divide anything at all, in the other functions for milli, second and micro, it does divide on the value, but when using ticks, it simply subtracts the old and new value, there is nothing done on it. You can verify the frequency in ollydbg if you want, the frequency timer is really that low. It is 3.5 million on my cpu and will probably be somewhat around that on your cpu as well. rdtsc is the best way to measure time, but apart from that it is also very limited in how you can use it.

Look here in olly, it is 3.5 million after calling QueryPerformanceFrequency

Image

And this makes sense in the first post I made when I had 10849 ticks and 3079 microseconds. 3.5 million ticks equals one second. If you divide 3.5 million by 1 million you get 3.5 microseconds. microseconds is only 3.5 times away from ticks-measurement.

10849 / 3079 = 3.52 Razz

I agree it should not really be called ticks, but there is no time measurement in cycles, so maybe it should be called cycles instead of ticks. I do like the name ticks better though.


Last edited by nmake on 12 Dec 2012, 02:43; edited 1 time in total
Post 11 Dec 2012, 14:11
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 11 Dec 2012, 14:44
The frequency is related to the NTSC chrominance signal.

14.31818Mhz / 4 = 3.579545 Mhz.

This is from the old IBM PC days where the CPU clock was driven by a 14.31818Mhz crystal divided by 3 (giving ~4.77Mhz) and the video output was 14.31818Mhz divided by 4.


Last edited by revolution on 11 Dec 2012, 16:17; edited 1 time in total
Post 11 Dec 2012, 14:44
View user's profile Send private message Visit poster's website Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 11 Dec 2012, 16:14
revolution wrote:
The frequency is related to the NTSC chrominance signal.

14.31818Mhz / 4 = 3.579545 Mhz.

No, AFAIK Windows QueryPerformanceFrequency is either based on HPET which runs at ~14.3MHz and will give a qpf of 14318180 or based on the CPU clock (HFM frequency divided by 1024).

Since MS don't seem to be able to make their minds up as which one to use as default from one Windows version to the next then you might want to think about using the "useplatformclock" option in the bcd.
Post 11 Dec 2012, 16:14
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1619
Location: Toronto, Canada
AsmGuru62 11 Dec 2012, 17:36
I think knowing frequency has no point.
When talking about performance -- all is needed to know, is that
code #1 is faster/slower than code #2 and the counter is enough for that.
Is there really a need to know how many microseconds it is?
Post 11 Dec 2012, 17:36
View user's profile Send private message Send e-mail Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 11 Dec 2012, 18:02
I can't think of any good use of microseconds, other than if your code runs so long that reading very large numbers becomes less practical, and so lowering the resolution helps reading numbers.

Oh maybe there is another area too, if you have a sorting program which sorts 40 TB of data with a quick sort algorithm and it needs 30 seconds to run, the tick counter would overflow and is not of much use.

Well maybe it has a third usage, if you develop your program on two different computers, it might be worth knowing microseconds instead of moving between two computer that has completely different tick resolutions.

I can't think of a fourth reason, no.. well maybe, if you are developing a music program and you need to accurately measure notes, then using ticks would not be practical.

Is there a fifth reason, I doubt it. I can't think of any more at the moment.

Wink
Post 11 Dec 2012, 18:02
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1619
Location: Toronto, Canada
AsmGuru62 11 Dec 2012, 18:10
I see. Good points.
Post 11 Dec 2012, 18:10
View user's profile Send private message Send e-mail Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 12 Dec 2012, 05:36
Fixed two bugs, it should be bugfree now. Re-uploaded the zip archive again. One bug was with the x87 support and the other was bad checking of overflow. Also added support for 64 bit int wrap-around, which will happen every 83,563 years Very Happy (can't be too careful, perhaps some of us are still here after 80k years)
Post 12 Dec 2012, 05:36
View user's profile Send private message Reply with quote
nmake



Joined: 13 Sep 2012
Posts: 192
nmake 15 Dec 2012, 10:29
I have improved the library a great deal to include possibility to use a graph. The parameters for the different functions are now a bit different.

1. start_timing,0 <- this has one parameter, choose a slot to store the result, the slot can be 0-MAX_SLOTS-1 (which is defined in the source)

2. stop_xxxxx,TRUE <- this have one parameter as well, set to TRUE to display the result in a messagebox and FALSE if not. xxxx is either ticks/micro/milli/second

I've added a new function called graph. It has no parameters, it will display a comparison graph between the timed functions so you can test functions up against one other. It will only test the slots that have been used provided in the start_timing function. It will show results in percentage. The function that performed worst will be the template for all remaining functions, and will also be displayed in red.

You can customize things in the source file.

Examples are included. Smile

Image


Description:
Download
Filename: Timing.zip
Filesize: 16.75 KB
Downloaded: 263 Time(s)

Post 15 Dec 2012, 10:29
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.