flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
Core i7 25 May 2025, 14:46
That's right.
If you disassemble GetTickCount(), it executes in just 15 instructions, since it simply reads a field from the user structure KUSER_SHARED_DATA: Code: 0:000:x86> uf /i GetTickCount 15 instructions scanned KERNELBASE!GetTickCount: 75299034 eb02 jmp KERNELBASE!GetTickCount+0x4 (75299038) KERNELBASE!GetTickCount+0x2: 75299036 f390 pause KERNELBASE!GetTickCount+0x4: 75299038 8b0d2403fe7f mov ecx,dword ptr [SharedUserData+0x324 (7ffe0324)] 7529903e 8b152003fe7f mov edx,dword ptr [SharedUserData+0x320 (7ffe0320)] 75299044 a12803fe7f mov eax,dword ptr [SharedUserData+0x328 (7ffe0328)] 75299049 3bc8 cmp ecx,eax 7529904b 75e9 jne KERNELBASE!GetTickCount+0x2 (75299036) KERNELBASE!GetTickCount+0x19: 7529904d a10400fe7f mov eax,dword ptr [SharedUserData+0x4 (7ffe0004)] 75299052 f7e2 mul eax,edx 75299054 c1e108 shl ecx,8 75299057 0faf0d0400fe7f imul ecx,dword ptr [SharedUserData+0x4 (7ffe0004)] 7529905e 0facd018 shrd eax,edx,18h 75299062 c1ea18 shr edx,18h 75299065 03c1 add eax,ecx 75299067 c3 ret |
|||
![]() |
|
flier mate 25 May 2025, 15:01
....
Last edited by flier mate on 30 May 2025, 17:03; edited 1 time in total |
|||
![]() |
|
Core i7 25 May 2025, 15:32
@flier mate, profiling on new processors has many nuances.
For example, running tests on only one core (see fn.SetProcessAffinityMask), using rdtscp instead of rdtsc, clearing the CPU pipeline with instruction lfence, etc. |
|||
![]() |
|
revolution 25 May 2025, 22:16
Sadly there is no "correct way" count CPU cycles. RDTSC is almost always wrong for counting CPU cycles. In many CPUs RDTSC doesn't count the CPU clock cycles at all, instead it is a fixed speed timer completely unconnected to the CPU clocks.
The closest to CPU cycle counting are the performance monitoring registers. |
|||
![]() |
|
Jessé 26 May 2025, 01:40
Once, I accidentally discovered that TSC is counting based on the default clock, whereas CPUs running on different, post synthesized/modulated clock speeds that can be lower or higher than default clock, and also vary.
I test similar ideas in Linux, and since multitasking sometimes kicks in, is completely unreliable to benchmark code in one shot, by both factors: clock modulation and multitasking environment. But, a tip for your tests if you want to refine them is to replace cpuid with something lighter: Code: finit mfence ; Measure empty benchmark frame impact rdtsc push edx push eax mfence ; Do serialization rdtsc push edx push eax fild qword [esp] fild qword [esp+8] fsubp st1, st0 fistp dword [esp] pop dword [e_offset] ; empty number of cycles*, save it elsewhere mfence ; Do benchmark rdtsc push edx push eax ; test code goes here mfence rdtsc push edx push eax fild qword [esp] fild qword [esp+8] fsubp st1, st0 fistp dword [esp] pop ecx sub ecx, [e_offset] ; Number of cycles add esp, 24 ; release allocated stack Try it, and leave a feedback if you have it done. |
|||
![]() |
|
flier mate 26 May 2025, 06:10
....
Last edited by flier mate on 30 May 2025, 17:03; edited 1 time in total |
|||
![]() |
|
Core i7 26 May 2025, 07:08
revolution wrote: The closest to CPU cycle counting are the performance monitoring registers. And what does the performance counter have to do with the CPU frequency? It works at the frequency of the HPET timer 12 MHz, which is a separate hardware device. Or am I wrong? The problem is that the frequency of modern processors is not constant, and changes dynamically depending on the percentage of load. Moreover, one processor can have cores of different types - P and E, which also work at different frequencies. Therefore, the value of the TSC counter depends on the core, although it counts the clock cycles correctly. |
|||
![]() |
|
revolution 26 May 2025, 09:56
Core i7 wrote: And what does the performance counter have to do with the CPU frequency? It works at the frequency of the HPET timer 12 MHz, which is a separate hardware device. Or am I wrong? Last edited by revolution on 26 May 2025, 10:01; edited 1 time in total |
|||
![]() |
|
revolution 26 May 2025, 10:00
Core i7 wrote: Therefore, the value of the TSC counter depends on the core, although it counts the clock cycles correctly. BTW: What even is a CPU cycle in a modern CPU? It isn't such an easy question to answer. |
|||
![]() |
|
Core i7 26 May 2025, 11:06
I used to experiment a lot with timers and counters. https://codeby.net/threads/sistemnyye-taimery-chast-4-local-apic.73735/
As far as I know, there is an instruction rdpmc (performance counter), but it is also tied to the HPET timer. Specifically, the processor has only a LAPIC timer, which counts from the real (not effective) frequency of the system bus, and the processor then applies a multiplier to it. Among the LAPIC registers (in the LocalVectorTable LVT) there is also PerfMon, but this is not a counter, but an interrupt line. In general, if you tell me which CPU registers are responsible for "performance monitoring", it would clarify the situation. Here is the log from my utility:
|
||||||||||
![]() |
|
revolution 26 May 2025, 11:22
If you have the Intel manuals then:
Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2: CHAPTER 21 PERFORMANCE MONITORING There is an entire chapter with all the hundreds of registers and controls. I can't do it justice to repost it here. |
|||
![]() |
|
Core i7 27 May 2025, 03:15
Good registers, but they are not accessible from user mode,
and as i understand functions QPF+QPC has nothing to do with them. Or are there other ways to access them? |
|||
![]() |
|
revolution 27 May 2025, 05:12
The QPF and QPC are Windows functions that access the mobo timer hardware. Completely unrelated to instruction cycle times, or anything else internal to the CPU.
|
|||
![]() |
|
Mat Qua sar 20 Jun 2025, 05:40
Core i7 wrote:
Hi @Core i7, I want to ask, how to Unassemble Function in WinDbg? I tried to open a x86 executable first, then type the following, but with error: Code: 0:000> uf GetTickCount Couldn't resolve error at 'GetTickCount' I notice mine, 0:000 is not the same as your 0:000:x86. It would be nice I am able to unassemble Win32 API function. Also, does "kd" means kernel debugging"? |
|||
![]() |
|
Mat Qua sar 20 Jun 2025, 05:41
And sorry for keeping changing username, for keep deleting comments.
|
|||
![]() |
|
Core i7 20 Jun 2025, 06:52
Mat Qua sar, have you configured the *.pdb symbols for WinDBG? Without them, it will not be able to display information.
1. Create a folder for symbols C:\Symbols 2. Run WinDBG and press [Ctrl+S] - see the menu 3. In the window that appears, specify the server for downloading symbols to your folder: srv*c:\Symbols*https://msdl.microsoft.com/download/symbols 4. Save the settings and restart the debugger. 5. Connect to the Internet in "online". 5. Hit [Ctrl+E] and open any *.exe in WinDBG. 6. Reload the symbols with the command .reload /f 7. Look in your "Symbols" folder, whether any symbols were loaded or not. 8. If not, start debugging with the command g, and check the server with the command .sympath Starting with Win7, the debugger cannot handle local kernel debugging (kd mode), and a second computer is needed. However, Mark Russinovich's "LiveKd" utility can make kernel dumps and display all its structures without a second computer. When first launched, the utility also tries to configure symbols, and the "C:\Symbols" folder is already specified by default. All you have to do is connect to the network and press Enter twice (I accept the default). LiveKd will not start if there are no Ntoskrnl + Ntdll.pdb symbols. You can call offline help directly from the debugger window with the command .hh, for example .hh !peb. If there are problems with auto-loading symbols, I can also help with manual configuration. |
|||
![]() |
|
Mat Qua sar 20 Jun 2025, 07:29
(A bit faint) I am not familiar with debugging symbols, but I tried your step 1-7, it says my executable (i.exe)'s PDB file (i.pdb) is not loaded / not found. i.exe is a FASM program, must I generate it first using FASMW? Anyway, will let you know later.
|
|||
![]() |
|
Core i7 20 Jun 2025, 08:00
..yes, in WinDBG you only need to open *.exe files
For your i.exe, pdb is not necessary - symbols are needed for system libraries. What OS and version of WindBG do you have? Open exe and press "g" to load all the dlls of your application. After that, request information about any dll, for example kernel32 - get information about symbols: Code: 0:000:x86> !lmi kernel32 ;<--------- Loaded Module Info: [kernel32] Module: kernel32 Base Address: 0000000076990000 Image Name: C:\Windows\syswow64\kernel32.dll Machine Type: 332 (I386) Time Stamp: 66f77b38 Sat Sep 28 08:42:48 2024 Size: 110000 CheckSum: 11549b Characteristics: 2102 perf Debug Data Dirs: Type Size VA Pointer CODEVIEW 26, d0e50, d0e50 RSDS - GUID: {EC4B15F0-9D87-42A0-BDD2-FBA9BE736232} Age: 2, Pdb: wkernel32.pdb CLSID 4, d0e4c, d0e4c [Data not mapped] Image Type: FILE - Image read successfully from debugger. C:\Windows\syswow64\kernel32.dll Symbol Type: PDB - Symbols loaded successfully from symbol server. c:\symbols\wkernel32.pdb\EC4B15F09D8742A0BDD2FBA9BE7362322\wkernel32.pdb ;<--------------- Load Report: private symbols , not source indexed c:\symbols\wkernel32.pdb\EC4B15F09D8742A0BDD2FBA9BE7362322\wkernel32.pdb PS: For questions about WinDBG, it is better to create a separate topic, since there are a lot of nuances there. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.