flat assembler
Message board for the users of flat assembler.

Index > High Level Languages > RDTSC Timer

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
daniel02



Joined: 21 Feb 2023
Posts: 9
daniel02 21 Feb 2023, 06:18
hello im trying to read the RDTSC timer am i doing is correct? Thanks

Code:
typedef double                                  ADouble;
#define CALLQ           WINAPI
__int64 microseccounter;
__int64 Ticks() 
{
    __int64 cpu_count;
    DWORD h32, l32;
  _asm{
      rdtsc
      mov h32, eax
      mov l32, edx
    }
    cpu_count = h32;
    cpu_count <<= 32;
    cpu_count += l32; 
    return cpu_count;
}    




Code:
ADouble CALLQ AProcessorTicksPerSecond()
{
    static ADouble ticks = -1.0;

    if (ticks <= 0.0)
    {
        HKEY hKey;
        DWORD procSpeed;
        DWORD buflen;
        DWORD ret;

        if (!RegOpenKeyExW(HKEY_LOCAL_MACHINE, L"HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0", 0, KEY_READ, &hKey))
        {
            procSpeed = 0;
            buflen = sizeof(procSpeed);

            ret = RegQueryValueExW(hKey, L"~MHz", 0, 0, (LPBYTE)&procSpeed, &buflen);

            if (ret != ERROR_SUCCESS)
                ret = RegQueryValueExW(hKey, L"~Mhz", 0, 0, (LPBYTE)&procSpeed, &buflen);

            if (ret != ERROR_SUCCESS)
                ret = RegQueryValueExW(hKey, L"~mhz", 0, 0, (LPBYTE)&procSpeed, &buflen);

            RegCloseKey(hKey);

            if (ret == ERROR_SUCCESS)
                ticks = (ADouble)procSpeed * 1000000.0;
        }
    }

    return ticks;
}    


Code:
int main()
{
    SetThreadAffinityMask(GetCurrentThread, 1);
    int tpriorityX;

    tpriorityX = GetThreadPriority(GetCurrentThread());

    SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);
   
    __int64 count1 = Ticks();
    Sleep(1000);
    microseccounter = (Ticks() - count1) / AProcessorTicksPerSecond();
    SetThreadPriority(GetCurrentThread(), tpriorityX);
    HWND hwnd;
    system("pause");
    hwnd = GetConsoleWindow();
    ShowWindow(hwnd, SW_HIDE);
    system("pause");
    return microseccounter;
}    
Post 21 Feb 2023, 06:18
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 687
Ali.Z 21 Feb 2023, 06:28
the opposite, eax is the low order and edx is high order.

_________________
Asm For Wise Humans
Post 21 Feb 2023, 06:28
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20215
Location: In your JS exploiting you and your system
revolution 21 Feb 2023, 06:28
I don't know about the HLL code, I'll pretend it works fine.

But for rdtsc, it can't be used as a measurement of real time (i.e. seconds, minutes etc.) because the clock speed to the CPU will change frequency depending upon the workload.

Windows provides other counters that are always at a fixed frequency and can be used to measure the passing of time.
Post 21 Feb 2023, 06:28
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3977
Location: vpcmipstrm
bitRAKE 21 Feb 2023, 06:50
revolution wrote:
rdtsc, it can't be used as a measurement of real time (i.e. seconds, minutes etc.) because the clock speed to the CPU will change frequency depending upon the workload.


Unless invariance is specifically tested for.
AMD wrote:
The behavior of the RDTSC instruction is implementation dependent. The TSC counts at a constant rate, but may be affected by power management events (such as frequency changes), depending on the processor implementation. If CPUID Fn8000_0007_EDX[TscInvariant] = 1, then the TSC rate is ensured to be invariant across all P-States, C-States, and stop-grant transitions (such as STPCLK Throttling); therefore, the TSC is suitable for use as a source of time.
... and then maybe fallback to the OS for time measurement.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 21 Feb 2023, 06:50
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20215
Location: In your JS exploiting you and your system
revolution 21 Feb 2023, 07:19
Re: TscInvariant

A good feature. Does it also guarantee to work across cores? Some CPUs have a separate counter for each core, and if the OS migrates the thread to a different core the count value suddenly jumps, sometimes backwards.

Also VMs can make such things unreliable. There are lots of potential hazards when the OS and underlying operating conditions can vary so much.
Post 21 Feb 2023, 07:19
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3977
Location: vpcmipstrm
bitRAKE 21 Feb 2023, 08:11
RDTSCP can be useful if there is some expectation that threads are being moved around, but this would depend on how they are created/configured. External tools can move most threads - assuming they exist long enough.

For VMs TscInvariant = 0. Wink

Hazards can be mitigated if one actually wants to use the features of the processor.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 21 Feb 2023, 08:11
View user's profile Send private message Visit poster's website Reply with quote
daniel02



Joined: 21 Feb 2023
Posts: 9
daniel02 21 Feb 2023, 18:16
thank you guys so much! i want go further in kernel to bypass time kernel as well

something like this

Code:
int testrdtsc() {
        __int64 count_per_microsec;
        const ULONG numberOfProcessors = KeQueryActiveProcessorCountEx(ALL_PROCESSOR_GROUPS);
        PROCESSOR_NUMBER processorNumber;
        

        NTSTATUS status = KeGetProcessorNumberFromIndex(numberOfProcessors - 1, &processorNumber);
        if (!NT_SUCCESS(status))
        {
                
                return 0;
        }

        GROUP_AFFINITY affinity, oldAffinity;
        affinity.Group = processorNumber.Group;
        affinity.Mask = 1ULL << processorNumber.Number;
        affinity.Reserved[0] = affinity.Reserved[1] = affinity.Reserved[2] = 0;
        KeSetSystemGroupAffinityThread(&affinity, &oldAffinity);

        KIRQL originalIrql;
        KeRaiseIrql(HIGH_LEVEL, &originalIrql);
        _disable();

        __int64 count1 = __rdtsc();
        LARGE_INTEGER waitTime;
        waitTime.QuadPart = -10000000; // 1 second
        KeDelayExecutionThread(KernelMode, FALSE, &waitTime);
        count_per_microsec = (__rdtsc() - count1) / (1.0E+63 / (60 * 60 * 24 * 365)) / 1.0E+9; // cycle will end when its reach 584.94 Ghz 
        KeLowerIrql(originalIrql);
        return count_per_microsec;
}

this one give me amazing precision in kernel !    
Post 21 Feb 2023, 18:16
View user's profile Send private message Reply with quote
I



Joined: 19 May 2022
Posts: 58
I 22 Feb 2023, 02:28
RDTSC is a counter not a timer. IIRC Windows synchronizes cores with MSR if available and enabled but I still prefer to use affinity with RDTSC. Windows does not recommend using RDTSC but has itself been doing so for many years to provide QPC unless configuration is set to use something else such as platformclock / HPET. IIRC with W7 a simple shift right 10 (divide base clock or HFM by 1024) then W10 used divide by multiply to give an equivalent freq of 10MHz. Although this implies 100ns resolution QPC is only updated within timer resolution of something like 0.5ms best IIRC or maybe if an event is triggered, if things haven't changed in that respect over the years.

If you are going to store edx:eax then maybe store dwords to lower / upper location of 64-bit variable so don't have to do the shift. If times are always going to be less than a 32bit count then can just use eax.

The PC isn't typically a precision instrument so should note that, reference clock changes with temperature, HPET on my system reports a fixed value clock time interval regardless if it's running at something else.

Or maybe I'm just remembering this all wrong?
Post 22 Feb 2023, 02:28
View user's profile Send private message Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 789
Location: Adelaide
sinsi 22 Feb 2023, 06:02
Re: RDTSC
I remember in the XP days testing this on a Q6600, start 4 threads suspended and set the affinity to each core.
The only thing the thread did was store the value from RDTSC. All 4 values were different, one was significantly lower than the others.
The HAL in 2000/XP used to start each CPU in sequence, so if the starting value is 0 they will be different.
Some later processors have a constant TSC even in sleep mode.
Post 22 Feb 2023, 06:02
View user's profile Send private message Reply with quote
daniel02



Joined: 21 Feb 2023
Posts: 9
daniel02 22 Feb 2023, 23:24
nvm


Last edited by daniel02 on 23 Feb 2023, 08:37; edited 1 time in total
Post 22 Feb 2023, 23:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20215
Location: In your JS exploiting you and your system
revolution 22 Feb 2023, 23:47
Save and restore esi with push/pop. But the major problem is the calling convention is completely wrong. For 64-bit you need FASTCALL registers, so RCX is the first parameter.

But how does it cause BSOD? Is that a kernel bug?! At the worst it should only cause the process to crash.

BTW: I would hesitate to call it "sleep". Your CPU will still run at full power. More like an anxious person waiting at the dentist, it burns through energy and makes itself exhausted. Confused
Post 22 Feb 2023, 23:47
View user's profile Send private message Visit poster's website Reply with quote
daniel02



Joined: 21 Feb 2023
Posts: 9
daniel02 22 Feb 2023, 23:54
revolution wrote:
Save and restore esi with push/pop. But the major problem is the calling convention is completely wrong. For 64-bit you need FASTCALL registers, so RCX is the first parameter.

But how does it cause BSOD? Is that a kernel bug?! At the worst it should only cause the process to crash.

BTW: I would hesitate to call it "sleep". Your CPU will still run at full power. More like an anxious person waiting at the dentist, it burns through energy and makes itself exhausted. Confused


sorry i mean in usermode win32 work great not in kernel

thanks i will try fastcall
Post 22 Feb 2023, 23:54
View user's profile Send private message Reply with quote
daniel02



Joined: 21 Feb 2023
Posts: 9
daniel02 23 Feb 2023, 01:21
revolution wrote:
Save and restore esi with push/pop. But the major problem is the calling convention is completely wrong. For 64-bit you need FASTCALL registers, so RCX is the first parameter.

But how does it cause BSOD? Is that a kernel bug?! At the worst it should only cause the process to crash.

BTW: I would hesitate to call it "sleep". Your CPU will still run at full power. More like an anxious person waiting at the dentist, it burns through energy and makes itself exhausted. Confused



this one did the trick Wink

Code:
.code

FasterSleep proc ticks : DWORD
        rdtsc
        shl rdx,32
        or rax,rdx
        add rcx,rax
        _loop:
        pause
        pause
        pause
        pause
        rdtsc
        shl rdx,32
        or rax,rdx
        cmp rax,rcx
        jb _loop
        ret
FasterSleep endp

end    
Post 23 Feb 2023, 01:21
View user's profile Send private message Reply with quote
I



Joined: 19 May 2022
Posts: 58
I 24 Feb 2023, 05:46
@sinsi how did you synchronize reading TSC? Even in UEFI best I could get is within 150 cycles such as
Code:
Number of CPU's 12, Run Time    27014uS 
0000007E56D3E14C
0000007E56D3E149
0000007E56D3E12D
0000007E56D3E127
0000007E56D3E16B
0000007E56D3E16E
0000007E56D3E1C1
0000007E56D3E1BB
0000007E56D3E199
0000007E56D3E193
Max difference 154 cycles    

Note that only startup AP cores shown, also core threads are typically 6 or 7 cycles apart on this system with i7-8600K. Zero TSC offsets observed which might suggest all time stamp clocks are started at the same time in this system?

W10 does make some adjustments but not much, BCLK run with SS down

Image

Usually some variance each boot, this was with SS disabled.

Image

A faster "Go" flag would be nice if someone can suggest, tried mem, global msr bit but takes time.

Example
Code:
        lea     rbx,[Go]
 @@:
        cmp     byte[rbx],0
        jz      @b

        rdtsc
        shl     rdx,32
        or      rax,rdx
        mov     [TS+rsi*8],rax     ; rsi is CPU number from WhoAmI
    


@daniel02 "jb _loop" should be jl even if the chances of wrap around are really small. There's also interrupt driven IA32_TSC_DEADLINE but maybe too much work / restrictive to impliment.
Post 24 Feb 2023, 05:46
View user's profile Send private message Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 789
Location: Adelaide
sinsi 24 Feb 2023, 06:36
Quote:
@sinsi how did you synchronize reading TSC? Even in UEFI best I could get is within 150 cycles such as

This was under XP, so CreateThread/SetThreadAffinityMask then activate the threads.
Post 24 Feb 2023, 06:36
View user's profile Send private message Reply with quote
I



Joined: 19 May 2022
Posts: 58
I 24 Feb 2023, 11:39
For me with just resume, RDTSC spans over 12 million cycles.
Code:
        xor     ebx,ebx
  @@:
        xor     ecx,ecx                                       ; pSecurity Attributes
        mov     edx,0x1000                                    ; Initial stack size
        lea     r8,[CpuThread]                                ; Thread start address
        mov     r9d,ebx                                       ; Parameter Processor Number
        mov     qword[rsp+20h],CREATE_SUSPENDED
        mov     qword[rsp+28h],0                              ; pThreadID
        call    [CreateThread]
        mov     [rbx*8+hThread],rax                           ; Thread handle

        mov     cl,bl
        mov     edx,4                                         ; start affinity from second core
        shl     edx,cl
        mov     rcx,rax
        call    [SetThreadAffinityMask]
        inc     ebx
        cmp     ebx,10                                        ; use 10 of 12 logical CPU's
        jb      @b

        xor     ebx,ebx                                       ; Resume threads
 @@:
        mov     rcx,[rbx*8+hThread]
        call    [ResumeThread]
        inc     ebx
        cmp     ebx,10
        jb      @b
    


Adding "Go" flag to workers gets it near 150 if all threads are active and not thrown off by the dispatcher.
Image
Post 24 Feb 2023, 11:39
View user's profile Send private message Reply with quote
daniel02



Joined: 21 Feb 2023
Posts: 9
daniel02 24 Feb 2023, 22:21
another way to read RDTSC with amazing precision in kernel without suffer from KeDelayExecutionThread or any sleep thanks to Ryan Geiss

Code:
extern "C" __forceinline void __fastcall Clocks(unsigned int dest);    


Code:
 .code

Clocks proc dest : DWORD
rdtsc
mov eax, dest
shl rdx, 32
or rax, rdx
ret
Clocks endp

end      


Code:
double RDTSCDirect(unsigned __int64 frequency)
{
        // returns < 0 on failure; otherwise, returns current cpu time, in seconds.
        // warning: watch out for wraparound!

        if (frequency == 0)
                return -1.0;

        // get high-precision time:
        __try
        {
                unsigned __int64 high_perf_time;
                unsigned __int64* dest = &high_perf_time;
                Clocks(dest);
                __int64 time_s = (__int64)(high_perf_time / frequency);  // unsigned->sign conversion should be safe here
                __int64 time_fract = (__int64)(high_perf_time % frequency);  // unsigned->sign conversion should be safe here
                // note: here, we wrap the timer more frequently (once per year) 
                // than it otherwise would (VERY RARELY - once every 585 years on
                // a 1 GHz), to alleviate floating-point precision errors that start 
                // to occur when you get to very high counter values.  
                double ret = (time_s % (60 * 60 * 24 * 365)) + (double)time_fract / (double)((__int64)frequency);
                return ret;
        }
        __except (EXCEPTION_EXECUTE_HANDLER)
        {
                return -1.0;
        }

        return -1.0;
}    


Code:
void testrdtsc(unsigned __int64 freq) {
        __int64 count_per_microsec;
        const ULONG numberOfProcessors = KeQueryActiveProcessorCountEx(ALL_PROCESSOR_GROUPS);
        PROCESSOR_NUMBER processorNumber;
        

        NTSTATUS status = KeGetProcessorNumberFromIndex(numberOfProcessors - 1, &processorNumber);
        if (!NT_SUCCESS(status))
        {
                
                return;
        }

        GROUP_AFFINITY affinity, oldAffinity;
        affinity.Group = processorNumber.Group;
        affinity.Mask = 1ULL << processorNumber.Number;
        affinity.Reserved[0] = affinity.Reserved[1] = affinity.Reserved[2] = 0;
        KeSetSystemGroupAffinityThread(&affinity, &oldAffinity);

        KIRQL originalIrql;
        KeRaiseIrql(HIGH_LEVEL, &originalIrql);
        _disable();

        ascwq(freq);

        _enable();
        KeLowerIrql(originalIrql);
}
    
Post 24 Feb 2023, 22:21
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20215
Location: In your JS exploiting you and your system
revolution 24 Feb 2023, 22:32
daniel02 wrote:
Code:
rdtsc
mov eax, dest    
You just overrode the lower 32-bits with the value of dest. That reduces your precision by 9 orders of magnitude.
Post 24 Feb 2023, 22:32
View user's profile Send private message Visit poster's website Reply with quote
daniel02



Joined: 21 Feb 2023
Posts: 9
daniel02 25 Feb 2023, 05:05
revolution wrote:
daniel02 wrote:
Code:
rdtsc
mov eax, dest    
You just overrode the lower 32-bits with the value of dest. That reduces your precision by 9 orders of magnitude.


thank you very much i fixed it I enjoy playing games now i hope microsoft one day will found away to read rdtsc directly
Post 25 Feb 2023, 05:05
View user's profile Send private message Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 789
Location: Adelaide
sinsi 25 Feb 2023, 08:18
Post 25 Feb 2023, 08:18
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.