flat assembler
Message board for the users of flat assembler.

Index > Windows > Is this the correct way to count how many CPU cycles?

Author
Thread Post new topic Reply to topic
flier mate



Joined: 24 May 2025
Posts: 29
flier mate 25 May 2025, 13:44
EDIT: Sorry folks, I don't have confidence in what I posted, that's why the deletion of comments and code.


Last edited by flier mate on 01 Jun 2025, 15:37; edited 2 times in total
Post 25 May 2025, 13:44
View user's profile Send private message Reply with quote
Core i7



Joined: 14 Nov 2024
Posts: 109
Location: Socket on motherboard
Core i7 25 May 2025, 14:46
That's right.
If you disassemble GetTickCount(), it executes in just 15 instructions, since it simply reads a field from the user structure KUSER_SHARED_DATA:
Code:
0:000:x86> uf /i GetTickCount
15 instructions scanned

KERNELBASE!GetTickCount:
75299034  eb02            jmp     KERNELBASE!GetTickCount+0x4 (75299038)

KERNELBASE!GetTickCount+0x2:
75299036  f390            pause

KERNELBASE!GetTickCount+0x4:
75299038  8b0d2403fe7f    mov     ecx,dword ptr [SharedUserData+0x324 (7ffe0324)]
7529903e  8b152003fe7f    mov     edx,dword ptr [SharedUserData+0x320 (7ffe0320)]
75299044  a12803fe7f      mov     eax,dword ptr [SharedUserData+0x328 (7ffe0328)]
75299049  3bc8            cmp     ecx,eax
7529904b  75e9            jne     KERNELBASE!GetTickCount+0x2 (75299036)

KERNELBASE!GetTickCount+0x19:
7529904d  a10400fe7f      mov     eax,dword ptr [SharedUserData+0x4 (7ffe0004)]
75299052  f7e2            mul     eax,edx
75299054  c1e108          shl     ecx,8
75299057  0faf0d0400fe7f  imul    ecx,dword ptr [SharedUserData+0x4 (7ffe0004)]
7529905e  0facd018        shrd    eax,edx,18h
75299062  c1ea18          shr     edx,18h
75299065  03c1            add     eax,ecx
75299067  c3              ret
    
Post 25 May 2025, 14:46
View user's profile Send private message Reply with quote
flier mate



Joined: 24 May 2025
Posts: 29
flier mate 25 May 2025, 15:01
....


Last edited by flier mate on 30 May 2025, 17:03; edited 1 time in total
Post 25 May 2025, 15:01
View user's profile Send private message Reply with quote
Core i7



Joined: 14 Nov 2024
Posts: 109
Location: Socket on motherboard
Core i7 25 May 2025, 15:32
@flier mate, profiling on new processors has many nuances.
For example, running tests on only one core (see fn.SetProcessAffinityMask), using rdtscp instead of rdtsc, clearing the CPU pipeline with instruction lfence, etc.
Post 25 May 2025, 15:32
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20685
Location: In your JS exploiting you and your system
revolution 25 May 2025, 22:16
Sadly there is no "correct way" count CPU cycles. RDTSC is almost always wrong for counting CPU cycles. In many CPUs RDTSC doesn't count the CPU clock cycles at all, instead it is a fixed speed timer completely unconnected to the CPU clocks.

The closest to CPU cycle counting are the performance monitoring registers.
Post 25 May 2025, 22:16
View user's profile Send private message Visit poster's website Reply with quote
Jessé



Joined: 03 May 2025
Posts: 59
Location: Brazil
Jessé 26 May 2025, 01:40
Once, I accidentally discovered that TSC is counting based on the default clock, whereas CPUs running on different, post synthesized/modulated clock speeds that can be lower or higher than default clock, and also vary.
I test similar ideas in Linux, and since multitasking sometimes kicks in, is completely unreliable to benchmark code in one shot, by both factors: clock modulation and multitasking environment.
But, a tip for your tests if you want to refine them is to replace cpuid with something lighter:

Code:
    finit
    mfence    ; Measure empty benchmark frame impact
    rdtsc
    push edx
    push eax
    mfence    ; Do serialization
    rdtsc
    push edx
    push eax
    fild qword [esp]
    fild qword [esp+8]
    fsubp st1, st0
    fistp dword [esp]
    pop dword [e_offset]   ; empty number of cycles*, save it elsewhere

    mfence    ; Do benchmark
    rdtsc
    push edx
    push eax
    ; test code goes here
    mfence
    rdtsc
    push edx
    push eax
    fild qword [esp]
    fild qword [esp+8]
    fsubp st1, st0
    fistp dword [esp]
    pop ecx
    sub ecx, [e_offset]    ; Number of cycles
    add esp, 24    ; release allocated stack
    


Try it, and leave a feedback if you have it done.
Post 26 May 2025, 01:40
View user's profile Send private message Visit poster's website Reply with quote
flier mate



Joined: 24 May 2025
Posts: 29
flier mate 26 May 2025, 06:10
....


Last edited by flier mate on 30 May 2025, 17:03; edited 1 time in total
Post 26 May 2025, 06:10
View user's profile Send private message Reply with quote
Core i7



Joined: 14 Nov 2024
Posts: 109
Location: Socket on motherboard
Core i7 26 May 2025, 07:08
revolution wrote:
The closest to CPU cycle counting are the performance monitoring registers.

And what does the performance counter have to do with the CPU frequency? It works at the frequency of the HPET timer 12 MHz, which is a separate hardware device. Or am I wrong?

The problem is that the frequency of modern processors is not constant, and changes dynamically depending on the percentage of load. Moreover, one processor can have cores of different types - P and E, which also work at different frequencies. Therefore, the value of the TSC counter depends on the core, although it counts the clock cycles correctly.
Post 26 May 2025, 07:08
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20685
Location: In your JS exploiting you and your system
revolution 26 May 2025, 09:56
Core i7 wrote:
And what does the performance counter have to do with the CPU frequency? It works at the frequency of the HPET timer 12 MHz, which is a separate hardware device. Or am I wrong?
Not the mobo "performance counter", I'm suggesting the CPU "performance monitoring registers". The naming is unfortunate, but they are completely different things.


Last edited by revolution on 26 May 2025, 10:01; edited 1 time in total
Post 26 May 2025, 09:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20685
Location: In your JS exploiting you and your system
revolution 26 May 2025, 10:00
Core i7 wrote:
Therefore, the value of the TSC counter depends on the core, although it counts the clock cycles correctly.
It really doesn't. It is totally uncoupled from the CPU cycle clock. The original 80486 did have the TSC coupled to the CPU clock. After that things became much more complex with how the TSC is clocked. The primary function is in the name "Time Stamp Counter", it was never supposed to be a CPU cycle counter.

BTW: What even is a CPU cycle in a modern CPU? It isn't such an easy question to answer.
Post 26 May 2025, 10:00
View user's profile Send private message Visit poster's website Reply with quote
Core i7



Joined: 14 Nov 2024
Posts: 109
Location: Socket on motherboard
Core i7 26 May 2025, 11:06
I used to experiment a lot with timers and counters. https://codeby.net/threads/sistemnyye-taimery-chast-4-local-apic.73735/
As far as I know, there is an instruction rdpmc (performance counter), but it is also tied to the HPET timer. Specifically, the processor has only a LAPIC timer, which counts from the real (not effective) frequency of the system bus, and the processor then applies a multiplier to it. Among the LAPIC registers (in the LocalVectorTable LVT) there is also PerfMon, but this is not a counter, but an interrupt line. In general, if you tell me which CPU registers are responsible for "performance monitoring", it would clarify the situation. Here is the log from my utility:


Description:
Filesize: 74.42 KB
Viewed: 1928 Time(s)

lapic_timer.png


Post 26 May 2025, 11:06
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20685
Location: In your JS exploiting you and your system
revolution 26 May 2025, 11:22
If you have the Intel manuals then:

Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2: CHAPTER 21 PERFORMANCE MONITORING

There is an entire chapter with all the hundreds of registers and controls. I can't do it justice to repost it here.
Post 26 May 2025, 11:22
View user's profile Send private message Visit poster's website Reply with quote
Core i7



Joined: 14 Nov 2024
Posts: 109
Location: Socket on motherboard
Core i7 27 May 2025, 03:15
Good registers, but they are not accessible from user mode,
and as i understand functions QPF+QPC has nothing to do with them.
Or are there other ways to access them?
Post 27 May 2025, 03:15
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20685
Location: In your JS exploiting you and your system
revolution 27 May 2025, 05:12
The QPF and QPC are Windows functions that access the mobo timer hardware. Completely unrelated to instruction cycle times, or anything else internal to the CPU.
Post 27 May 2025, 05:12
View user's profile Send private message Visit poster's website Reply with quote
Mat Qua sar



Joined: 13 Jun 2025
Posts: 35
Mat Qua sar 20 Jun 2025, 05:40
Core i7 wrote:

If you disassemble GetTickCount(), it executes in just 15 instructions, since it simply reads a field from the user structure KUSER_SHARED_DATA:
Code:
0:000:x86> uf /i GetTickCount
15 instructions scanned...
    


Hi @Core i7, I want to ask, how to Unassemble Function in WinDbg? I tried to open a x86 executable first, then type the following, but with error:
Code:
0:000> uf GetTickCount
Couldn't resolve error at 'GetTickCount'
    


I notice mine, 0:000 is not the same as your 0:000:x86.

It would be nice I am able to unassemble Win32 API function.

Also, does "kd" means kernel debugging"?
Post 20 Jun 2025, 05:40
View user's profile Send private message Reply with quote
Mat Qua sar



Joined: 13 Jun 2025
Posts: 35
Mat Qua sar 20 Jun 2025, 05:41
And sorry for keeping changing username, for keep deleting comments.
Post 20 Jun 2025, 05:41
View user's profile Send private message Reply with quote
Core i7



Joined: 14 Nov 2024
Posts: 109
Location: Socket on motherboard
Core i7 20 Jun 2025, 06:52
Mat Qua sar, have you configured the *.pdb symbols for WinDBG? Without them, it will not be able to display information.

1. Create a folder for symbols C:\Symbols
2. Run WinDBG and press [Ctrl+S] - see the menu
3. In the window that appears, specify the server for downloading symbols to your folder: srv*c:\Symbols*https://msdl.microsoft.com/download/symbols
4. Save the settings and restart the debugger.
5. Connect to the Internet in "online".
5. Hit [Ctrl+E] and open any *.exe in WinDBG.
6. Reload the symbols with the command .reload /f
7. Look in your "Symbols" folder, whether any symbols were loaded or not.
8. If not, start debugging with the command g, and check the server with the command .sympath

Starting with Win7, the debugger cannot handle local kernel debugging (kd mode), and a second computer is needed. However, Mark Russinovich's "LiveKd" utility can make kernel dumps and display all its structures without a second computer. When first launched, the utility also tries to configure symbols, and the "C:\Symbols" folder is already specified by default. All you have to do is connect to the network and press Enter twice (I accept the default). LiveKd will not start if there are no Ntoskrnl + Ntdll.pdb symbols. You can call offline help directly from the debugger window with the command .hh, for example .hh !peb. If there are problems with auto-loading symbols, I can also help with manual configuration.
Post 20 Jun 2025, 06:52
View user's profile Send private message Reply with quote
Mat Qua sar



Joined: 13 Jun 2025
Posts: 35
Mat Qua sar 20 Jun 2025, 07:29
(A bit faint) I am not familiar with debugging symbols, but I tried your step 1-7, it says my executable (i.exe)'s PDB file (i.pdb) is not loaded / not found. i.exe is a FASM program, must I generate it first using FASMW? Anyway, will let you know later.
Post 20 Jun 2025, 07:29
View user's profile Send private message Reply with quote
Core i7



Joined: 14 Nov 2024
Posts: 109
Location: Socket on motherboard
Core i7 20 Jun 2025, 08:00
..yes, in WinDBG you only need to open *.exe files
For your i.exe, pdb is not necessary - symbols are needed for system libraries. What OS and version of WindBG do you have?
Open exe and press "g" to load all the dlls of your application. After that, request information about any dll, for example kernel32 - get information about symbols:
Code:
0:000:x86> !lmi kernel32        ;<---------
Loaded Module Info: [kernel32] 
         Module:  kernel32
   Base Address:  0000000076990000
     Image Name:  C:\Windows\syswow64\kernel32.dll
   Machine Type:  332 (I386)
     Time Stamp:  66f77b38 Sat Sep 28 08:42:48 2024
           Size:  110000
       CheckSum:  11549b
Characteristics:  2102  perf
Debug Data Dirs:  Type  Size     VA  Pointer
              CODEVIEW    26, d0e50,   d0e50 RSDS - GUID: {EC4B15F0-9D87-42A0-BDD2-FBA9BE736232}
                Age: 2, Pdb: wkernel32.pdb
                 CLSID     4, d0e4c,   d0e4c [Data not mapped]
     Image Type: FILE     - Image read successfully from debugger.
                 C:\Windows\syswow64\kernel32.dll
    Symbol Type: PDB      - Symbols loaded successfully from symbol server.
                 c:\symbols\wkernel32.pdb\EC4B15F09D8742A0BDD2FBA9BE7362322\wkernel32.pdb   ;<---------------
    Load Report: private symbols , not source indexed 
                 c:\symbols\wkernel32.pdb\EC4B15F09D8742A0BDD2FBA9BE7362322\wkernel32.pdb
    

PS: For questions about WinDBG, it is better to create a separate topic, since there are a lot of nuances there.
Post 20 Jun 2025, 08:00
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.