flat assembler
Message board for the users of flat assembler.
Index
> High Level Languages > RDTSC Timer Goto page 1, 2 Next |
Author |
|
Ali.Z 21 Feb 2023, 06:28
the opposite, eax is the low order and edx is high order.
_________________ Asm For Wise Humans |
|||
21 Feb 2023, 06:28 |
|
revolution 21 Feb 2023, 06:28
I don't know about the HLL code, I'll pretend it works fine.
But for rdtsc, it can't be used as a measurement of real time (i.e. seconds, minutes etc.) because the clock speed to the CPU will change frequency depending upon the workload. Windows provides other counters that are always at a fixed frequency and can be used to measure the passing of time. |
|||
21 Feb 2023, 06:28 |
|
bitRAKE 21 Feb 2023, 06:50
revolution wrote: rdtsc, it can't be used as a measurement of real time (i.e. seconds, minutes etc.) because the clock speed to the CPU will change frequency depending upon the workload. Unless invariance is specifically tested for. AMD wrote: The behavior of the RDTSC instruction is implementation dependent. The TSC counts at a constant rate, but may be affected by power management events (such as frequency changes), depending on the processor implementation. If CPUID Fn8000_0007_EDX[TscInvariant] = 1, then the TSC rate is ensured to be invariant across all P-States, C-States, and stop-grant transitions (such as STPCLK Throttling); therefore, the TSC is suitable for use as a source of time. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
21 Feb 2023, 06:50 |
|
revolution 21 Feb 2023, 07:19
Re: TscInvariant
A good feature. Does it also guarantee to work across cores? Some CPUs have a separate counter for each core, and if the OS migrates the thread to a different core the count value suddenly jumps, sometimes backwards. Also VMs can make such things unreliable. There are lots of potential hazards when the OS and underlying operating conditions can vary so much. |
|||
21 Feb 2023, 07:19 |
|
bitRAKE 21 Feb 2023, 08:11
RDTSCP can be useful if there is some expectation that threads are being moved around, but this would depend on how they are created/configured. External tools can move most threads - assuming they exist long enough.
For VMs TscInvariant = 0. Hazards can be mitigated if one actually wants to use the features of the processor. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
21 Feb 2023, 08:11 |
|
daniel02 21 Feb 2023, 18:16
thank you guys so much! i want go further in kernel to bypass time kernel as well
something like this Code: int testrdtsc() { __int64 count_per_microsec; const ULONG numberOfProcessors = KeQueryActiveProcessorCountEx(ALL_PROCESSOR_GROUPS); PROCESSOR_NUMBER processorNumber; NTSTATUS status = KeGetProcessorNumberFromIndex(numberOfProcessors - 1, &processorNumber); if (!NT_SUCCESS(status)) { return 0; } GROUP_AFFINITY affinity, oldAffinity; affinity.Group = processorNumber.Group; affinity.Mask = 1ULL << processorNumber.Number; affinity.Reserved[0] = affinity.Reserved[1] = affinity.Reserved[2] = 0; KeSetSystemGroupAffinityThread(&affinity, &oldAffinity); KIRQL originalIrql; KeRaiseIrql(HIGH_LEVEL, &originalIrql); _disable(); __int64 count1 = __rdtsc(); LARGE_INTEGER waitTime; waitTime.QuadPart = -10000000; // 1 second KeDelayExecutionThread(KernelMode, FALSE, &waitTime); count_per_microsec = (__rdtsc() - count1) / (1.0E+63 / (60 * 60 * 24 * 365)) / 1.0E+9; // cycle will end when its reach 584.94 Ghz KeLowerIrql(originalIrql); return count_per_microsec; } this one give me amazing precision in kernel ! |
|||
21 Feb 2023, 18:16 |
|
I 22 Feb 2023, 02:28
RDTSC is a counter not a timer. IIRC Windows synchronizes cores with MSR if available and enabled but I still prefer to use affinity with RDTSC. Windows does not recommend using RDTSC but has itself been doing so for many years to provide QPC unless configuration is set to use something else such as platformclock / HPET. IIRC with W7 a simple shift right 10 (divide base clock or HFM by 1024) then W10 used divide by multiply to give an equivalent freq of 10MHz. Although this implies 100ns resolution QPC is only updated within timer resolution of something like 0.5ms best IIRC or maybe if an event is triggered, if things haven't changed in that respect over the years.
If you are going to store edx:eax then maybe store dwords to lower / upper location of 64-bit variable so don't have to do the shift. If times are always going to be less than a 32bit count then can just use eax. The PC isn't typically a precision instrument so should note that, reference clock changes with temperature, HPET on my system reports a fixed value clock time interval regardless if it's running at something else. Or maybe I'm just remembering this all wrong? |
|||
22 Feb 2023, 02:28 |
|
sinsi 22 Feb 2023, 06:02
Re: RDTSC
I remember in the XP days testing this on a Q6600, start 4 threads suspended and set the affinity to each core. The only thing the thread did was store the value from RDTSC. All 4 values were different, one was significantly lower than the others. The HAL in 2000/XP used to start each CPU in sequence, so if the starting value is 0 they will be different. Some later processors have a constant TSC even in sleep mode. |
|||
22 Feb 2023, 06:02 |
|
daniel02 22 Feb 2023, 23:24
nvm
Last edited by daniel02 on 23 Feb 2023, 08:37; edited 1 time in total |
|||
22 Feb 2023, 23:24 |
|
revolution 22 Feb 2023, 23:47
Save and restore esi with push/pop. But the major problem is the calling convention is completely wrong. For 64-bit you need FASTCALL registers, so RCX is the first parameter.
But how does it cause BSOD? Is that a kernel bug?! At the worst it should only cause the process to crash. BTW: I would hesitate to call it "sleep". Your CPU will still run at full power. More like an anxious person waiting at the dentist, it burns through energy and makes itself exhausted. |
|||
22 Feb 2023, 23:47 |
|
daniel02 22 Feb 2023, 23:54
revolution wrote: Save and restore esi with push/pop. But the major problem is the calling convention is completely wrong. For 64-bit you need FASTCALL registers, so RCX is the first parameter. sorry i mean in usermode win32 work great not in kernel thanks i will try fastcall |
|||
22 Feb 2023, 23:54 |
|
daniel02 23 Feb 2023, 01:21
revolution wrote: Save and restore esi with push/pop. But the major problem is the calling convention is completely wrong. For 64-bit you need FASTCALL registers, so RCX is the first parameter. this one did the trick Code: .code FasterSleep proc ticks : DWORD rdtsc shl rdx,32 or rax,rdx add rcx,rax _loop: pause pause pause pause rdtsc shl rdx,32 or rax,rdx cmp rax,rcx jb _loop ret FasterSleep endp end |
|||
23 Feb 2023, 01:21 |
|
I 24 Feb 2023, 05:46
@sinsi how did you synchronize reading TSC? Even in UEFI best I could get is within 150 cycles such as
Code: Number of CPU's 12, Run Time 27014uS 0000007E56D3E14C 0000007E56D3E149 0000007E56D3E12D 0000007E56D3E127 0000007E56D3E16B 0000007E56D3E16E 0000007E56D3E1C1 0000007E56D3E1BB 0000007E56D3E199 0000007E56D3E193 Max difference 154 cycles Note that only startup AP cores shown, also core threads are typically 6 or 7 cycles apart on this system with i7-8600K. Zero TSC offsets observed which might suggest all time stamp clocks are started at the same time in this system? W10 does make some adjustments but not much, BCLK run with SS down Usually some variance each boot, this was with SS disabled. A faster "Go" flag would be nice if someone can suggest, tried mem, global msr bit but takes time. Example Code: lea rbx,[Go] @@: cmp byte[rbx],0 jz @b rdtsc shl rdx,32 or rax,rdx mov [TS+rsi*8],rax ; rsi is CPU number from WhoAmI @daniel02 "jb _loop" should be jl even if the chances of wrap around are really small. There's also interrupt driven IA32_TSC_DEADLINE but maybe too much work / restrictive to impliment. |
|||
24 Feb 2023, 05:46 |
|
sinsi 24 Feb 2023, 06:36
Quote: @sinsi how did you synchronize reading TSC? Even in UEFI best I could get is within 150 cycles such as This was under XP, so CreateThread/SetThreadAffinityMask then activate the threads. |
|||
24 Feb 2023, 06:36 |
|
I 24 Feb 2023, 11:39
For me with just resume, RDTSC spans over 12 million cycles.
Code: xor ebx,ebx @@: xor ecx,ecx ; pSecurity Attributes mov edx,0x1000 ; Initial stack size lea r8,[CpuThread] ; Thread start address mov r9d,ebx ; Parameter Processor Number mov qword[rsp+20h],CREATE_SUSPENDED mov qword[rsp+28h],0 ; pThreadID call [CreateThread] mov [rbx*8+hThread],rax ; Thread handle mov cl,bl mov edx,4 ; start affinity from second core shl edx,cl mov rcx,rax call [SetThreadAffinityMask] inc ebx cmp ebx,10 ; use 10 of 12 logical CPU's jb @b xor ebx,ebx ; Resume threads @@: mov rcx,[rbx*8+hThread] call [ResumeThread] inc ebx cmp ebx,10 jb @b Adding "Go" flag to workers gets it near 150 if all threads are active and not thrown off by the dispatcher. |
|||
24 Feb 2023, 11:39 |
|
daniel02 24 Feb 2023, 22:21
another way to read RDTSC with amazing precision in kernel without suffer from KeDelayExecutionThread or any sleep thanks to Ryan Geiss
Code: extern "C" __forceinline void __fastcall Clocks(unsigned int dest); Code: .code Clocks proc dest : DWORD rdtsc mov eax, dest shl rdx, 32 or rax, rdx ret Clocks endp end Code: double RDTSCDirect(unsigned __int64 frequency) { // returns < 0 on failure; otherwise, returns current cpu time, in seconds. // warning: watch out for wraparound! if (frequency == 0) return -1.0; // get high-precision time: __try { unsigned __int64 high_perf_time; unsigned __int64* dest = &high_perf_time; Clocks(dest); __int64 time_s = (__int64)(high_perf_time / frequency); // unsigned->sign conversion should be safe here __int64 time_fract = (__int64)(high_perf_time % frequency); // unsigned->sign conversion should be safe here // note: here, we wrap the timer more frequently (once per year) // than it otherwise would (VERY RARELY - once every 585 years on // a 1 GHz), to alleviate floating-point precision errors that start // to occur when you get to very high counter values. double ret = (time_s % (60 * 60 * 24 * 365)) + (double)time_fract / (double)((__int64)frequency); return ret; } __except (EXCEPTION_EXECUTE_HANDLER) { return -1.0; } return -1.0; } Code: void testrdtsc(unsigned __int64 freq) { __int64 count_per_microsec; const ULONG numberOfProcessors = KeQueryActiveProcessorCountEx(ALL_PROCESSOR_GROUPS); PROCESSOR_NUMBER processorNumber; NTSTATUS status = KeGetProcessorNumberFromIndex(numberOfProcessors - 1, &processorNumber); if (!NT_SUCCESS(status)) { return; } GROUP_AFFINITY affinity, oldAffinity; affinity.Group = processorNumber.Group; affinity.Mask = 1ULL << processorNumber.Number; affinity.Reserved[0] = affinity.Reserved[1] = affinity.Reserved[2] = 0; KeSetSystemGroupAffinityThread(&affinity, &oldAffinity); KIRQL originalIrql; KeRaiseIrql(HIGH_LEVEL, &originalIrql); _disable(); ascwq(freq); _enable(); KeLowerIrql(originalIrql); } |
|||
24 Feb 2023, 22:21 |
|
revolution 24 Feb 2023, 22:32
daniel02 wrote:
|
|||
24 Feb 2023, 22:32 |
|
daniel02 25 Feb 2023, 05:05
revolution wrote:
thank you very much i fixed it I enjoy playing games now i hope microsoft one day will found away to read rdtsc directly |
|||
25 Feb 2023, 05:05 |
|
sinsi 25 Feb 2023, 08:18
|
|||
25 Feb 2023, 08:18 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.