flat assembler
Message board for the users of flat assembler.

Index > High Level Languages > System time acceleration after modifying QPC

Author
Thread Post new topic Reply to topic
SoHigh200



Joined: 18 Nov 2025
Posts: 3
SoHigh200 18 Nov 2025, 20:34
Hello I am conducting research into Windows timing mechanisms and have modified RtlQueryPerformanceCounter in ntdll to return raw Time Stamp Counter values directly from RDTSC instead of the normalized values that Windows typically provides. The modification is implemented through kernel driver patches to the shared ntdll section.
The Problem:
After applying this modification, I observe that system time advances approximately four hundred times faster than real time. The system clock rapidly jumps forward, with the Windows Time Service showing synchronization timestamps weeks into the future. This occurs because RtlGetSystemTimePrecise and related functions calculate elapsed time by combining QueryPerformanceCounter deltas with SharedUserData calibration parameters at offset 0x358 and 0x360. These parameters were established during boot initialization for ten megahertz normalized counter values, but my modified QueryPerformanceCounter returns approximately four gigahertz raw TSC values, creating a frequency mismatch.
Current Solution:
I have resolved the time acceleration by patching RtlGetSystemTimePrecise to read SystemTime directly from SharedUserData at offset 0x14 using the proper sequence lock protocol, bypassing any QueryPerformanceCounter-based calculation. The x64 assembly implementation reads the KSYSTEM_TIME structure atomically and returns the kernel-maintained time value.
Code:
static UCHAR systemTimePatchCode[] = {
        0x49, 0xBA, 0x14, 0x00, 0xFE, 0x7F, 0x00, 0x00, 0x00, 0x00,  // movabs r10, 0x7FFE0014
        // retry_loop:
        0x45, 0x8B, 0x42, 0x04,                    // mov r8d, [r10+4]     (High1Time)
        0x41, 0x8B, 0x02,                          // mov eax, [r10]       (LowPart)
        0x45, 0x3B, 0x42, 0x08,                    // cmp r8d, [r10+8]     (High2Time)
        0x75, 0xF3,                                // jne retry_loop
        0x49, 0xC1, 0xE0, 0x20,                    // shl r8, 0x20
        0x4C, 0x09, 0xC0,                          // or rax, r8
        0x48, 0x89, 0x01,                          // mov [rcx], rax
        0xC3                                       // ret
    };

    // Locate RtlGetSystemTimePrecise
    PVOID systemTimeAddr = FindExportedFunctionAddres(mappedBase, "RtlGetSystemTimePrecise");
    if (!systemTimeAddr) {
        DbgPrint("[PhysPatch] Failed to locate RtlGetSystemTimePrecise\n");
        status = STATUS_NOT_FOUND;
        __leave;
    }    

This approach successfully prevents system time corruption and allows QueryPerformanceCounter to continue providing raw TSC values for high-performance interval timing. However, the precision of RtlGetSystemTimePrecise is reduced from microsecond-level resolution to approximately fifteen millisecond resolution because the kernel updates SystemTime only during timer interrupts at that frequency.
The Question:
Are there alternative approaches that maintain both correctness and microsecond-level precision when QueryPerformanceCounter returns raw TSC values at processor frequency? Specifically, I am interested in:
Methods for recalibrating SharedUserData fields like QpcSystemTimeIncrement and BaselineSystemTimeQpc to work with raw TSC frequency while accounting for the kernel continuously updating these values during timer interrupts.
Techniques for implementing independent TSC-based time calculations that remain synchronized with kernel-maintained system time while providing continuously advancing timestamps between timer interrupt intervals.
Architectural approaches that allow high-performance raw TSC access through QueryPerformanceCounter while preserving the high-resolution characteristics that applications expect from RtlGetSystemTimePrecise.
I have investigated recalibrating the SharedUserData increment values but discovered that kernel timer interrupt handlers overwrite the baseline fields multiple times per second, invalidating any calibration attempts. I am seeking guidance on whether there exist established patterns or techniques for addressing this challenge within the Windows timing architecture.
Environment:
Windows 10 build 19044, x64 architecture, TSC frequency approximately 4.008 gigahertz, kernel driver implementation with system-wide ntdll patching.
Post 18 Nov 2025, 20:34
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1744
Location: Toronto, Canada
AsmGuru62 18 Nov 2025, 23:55
Nice research, but I think you can get the same value (no patching needed) using this function, designed for HLL:
https://learn.microsoft.com/en-us/cpp/intrinsics/rdtsc?view=msvc-170
Post 18 Nov 2025, 23:55
View user's profile Send private message Send e-mail Reply with quote
SoHigh200



Joined: 18 Nov 2025
Posts: 3
SoHigh200 20 Nov 2025, 01:07
AsmGuru62 wrote:
Nice research, but I think you can get the same value (no patching needed) using this function, designed for HLL:
https://learn.microsoft.com/en-us/cpp/intrinsics/rdtsc?view=msvc-170



After extensive investigation and multiple implementation attempts, I need to inform you that i cannot achieve both raw Time Stamp Counter access through QueryPerformanceCounter and accurate system clock progression simultaneously within the Windows kernel architecture. This conclusion is based on concrete technical barriers rather than implementation challenges, and I want to provide you with a complete explanation of what i discovered and the options available moving forward.
my research goal was to modify QueryPerformanceCounter to return raw processor cycle counts at the actual TSC frequency while maintaining normal system time progression. i successfully implemented the first part by patching the ntdll shared library section to return raw TSC values directly from the RDTSC instruction. However, this modification creates a frequency mismatch with the kernel's timing calibration parameters, causing the system clock to advance approximately four hundred times faster than real time.
i attempted several approaches to resolve this time acceleration issue. The first approach involved directly modifying the kernel's SharedUserData calibration parameters to account for the raw TSC frequency. While i successfully computed mathematically correct calibration values using the Windows-provided RtlGenerateQpcToIncrementConstants function, writing these values to SharedUserData caused the system clock to freeze completely. This occurred because the kernel maintains baseline performance counter values synchronized with the previous normalized frequency regime, creating an irreconcilable mismatch when we changed only the calibration parameters.
The second approach attempted to hook the kernel's KiUpdateSystemTime function to substitute corrected calibration parameters during timer interrupt processing. This implementation encountered critical failures related to executable memory allocation for the hook trampoline, resulting in system crashes with access violation exceptions. Even with corrected executable memory allocation, the complexity of maintaining synchronization between all timing subsystem components during the transition from normalized to raw counter values proved architecturally problematic.
The third approach proposed hooking KeQueryPerformanceCounter to provide context-aware return values, scaling raw TSC to normalized frequency for kernel timer interrupts while returning raw values for user-mode applications. However, this function is protected by Control Flow Guard, a kernel security feature that prevents modification of critical system functions to protect against exploitation. This represents a fundamental architectural barrier that cannot be circumvented without disabling core security features.
The technical constraints i encountered reflect deliberate design decisions by Microsoft to protect system timing integrity. The Windows kernel implements multiple layers of security and consistency checks specifically designed to prevent tampering with timing infrastructure, as accurate timekeeping is fundamental to scheduler operation, file system integrity, security event logging, and numerous other critical system functions.
Given these architectural limitations my only option is accept the time acceleration and move on
Post 20 Nov 2025, 01:07
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 805
Location: Massachusetts, USA
bitshifter 20 Nov 2025, 01:56
Measuring time on multi core systems is not simple.
Agner Fog has nice low level performance timing code for many systems.
Maybe the answer can be found here.
https://agner.org/optimize/testp.zip
Post 20 Nov 2025, 01:56
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20774
Location: In your JS exploiting you and your system
revolution 20 Nov 2025, 06:03
SoHigh200 wrote:
... return raw processor cycle counts at the actual TSC frequency ...
The TSC doesn't return CPU cycle counts, BTW. It returns a count that might, or might not, be a fixed frequency (depends upon CPU), and might, or might not be, clocked at the same rate as the CPU (also depends upon the CPU). Be aware that each different core can return a different value, also depends upon the CPU.

For almost all CPUs manufactured today the TSC runs at a fixed frequency, and is in no way related the the CPU cycle counts. But each CPU is different and future CPUs might use it differently.
Post 20 Nov 2025, 06:03
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1744
Location: Toronto, Canada
AsmGuru62 20 Nov 2025, 13:54
To measure (or research) performance that same function QueryPerformanceCounter can be used.
All what is needed is to make a long loop where the only difference is the principle (or a way) of coding.
Here is an example:
Code:
; ---------------------------------------------------------------------------
; PROGRAM ENTRY POINT
; ---------------------------------------------------------------------------
align 16
start:
    mov     ebx, ParamTest_PushMem
    call    ParamTest_Main

    mov     ebx, ParamTest_PushReg
    call    ParamTest_Main

    int3
    ;
    ; Here:
    ;   ST0 = relative time taken by 'ParamTest_PushReg'
    ;   ST1 = relative time taken by 'ParamTest_PushMem'
    ;
    nop
    nop
    nop

    invoke  ExitProcess, 0
    

And the rest of functions are next:
Code:
; ---------------------------------------------------------------------------
; FILE: ParamTest.Asm
; DATE: November 20, 2025
; ---------------------------------------------------------------------------
align 16
proc ParamTest_PushMem uses ecx, param1:DWORD, param2:DWORD, param3:DWORD
; ---------------------------------------------------------------------------
    invoke  MulDiv, [param1], [param2], [param3]
    ret
endp

align 16
proc ParamTest_PushReg uses ecx, param1:DWORD, param2:DWORD, param3:DWORD
; ---------------------------------------------------------------------------
    mov     eax, [param1]
    mov     ecx, [param2]
    mov     edx, [param3]
    invoke  MulDiv, eax, ecx, edx
    ret
endp

align 16
proc ParamTest_Main
; ---------------------------------------------------------------------------
; INPUT:
;   ebx = pointer to a function to test
; ---------------------------------------------------------------------------
    local   low_int64:DWORD
    local   high_int64:DWORD
    ;
    ; Get the counter before code being tested
    ;
    lea     esi, [low_int64]
    invoke  QueryPerformanceCounter, esi
    fild    qword [esi]
    ;
    ; Call the code being tested in a long loop
    ;
    mov     ecx, 0x10000000
@@:
    stdcall ebx, 100, 25, 4
    loop    @r
    ;
    ; Get the counter after code being tested
    ;
    invoke  QueryPerformanceCounter, esi
    fild    qword [esi]
    ;
    ; Subtract two counter values
    ;
    fsubrp
    ;
    ; ST0 = difference between counter values.
    ; Just look at it in debugger later on.
    ;
    ret
endp
    
Post 20 Nov 2025, 13:54
View user's profile Send private message Send e-mail Reply with quote
SoHigh200



Joined: 18 Nov 2025
Posts: 3
SoHigh200 21 Nov 2025, 21:46
AsmGuru62 wrote:

i fix it by patching directly into system process thanks Smile

Image
Post 21 Nov 2025, 21:46
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.