flat assembler
Message board for the users of flat assembler.
Index
> Main > MP TSC synchronisation? Goto page 1, 2 Next |
Author |
|
nasm64developer 27 May 2007, 18:37
Look at the source!
That of the Linux kernel, for example. It contains code which handles TSC synchronization.[/quote] |
|||
27 May 2007, 18:37 |
|
Madis731 27 May 2007, 22:28
The simple answer is YES - they can be synchronised, but usually you don't need that and syncing is really painful. There are IPIs to do that for you, which are relatively fast.
My logic says that the quick&dirty way goes something like: 1) Put all CPUs to known address and state (through IPIs) 2) HLT them 3) Raise some APIC-wide interrupt to make all CPUs start executing the same interrupt code 4) CPUs having the same state and same conditions cache-wise will probaly execute your nifty "TSC to Zero" code with equal timing. Voila! That doesn't sound so simple anymore, but I couldn't write in less words Last edited by Madis731 on 30 May 2007, 18:32; edited 1 time in total |
|||
27 May 2007, 22:28 |
|
lazer1 30 May 2007, 01:22
Madis731 wrote: The simple answer is YES - they can be synchronised, but usually you don't need that and suncing is really painful. There are IPIs to do that for you, which are relatively fast. but are you guaranteed they will be exactly the same? eg the BSP APIC may be holding off hardware interrupts which will be an asymmetry, this machine only has 2 cpus, by using a "mov" and "cmp" 1 way semaphore which doesnt need "lock" : BSP: read TSC to registers BSP: set sem to 1 AP: waits for sem to be 1 AP: reads TSC to registers AP: set sem to 0 BSP: waits for sem to be 0 BSP: reads TSC (the actual code to do this is a lot more complicated but the above is a meta presentation of it. In fact I run the code twice to guarantee everything is via caches. I switch off ints as well as this idea doesnt use interrupts) Here the TSC's arent changed but I want to just measure the difference AND to quantify the error: in one trial the difference of the 2 TSC's was 5cca2h which proves the BIOS HADNT synched them, and the error factor is usually <= 2a0h so here I can synch them via subtraction up to an error factor of 2a0h, that error factor being the difference between the 2 BSP TSC measurements so it takes some 700 clock cycles from the first BSP TSC read to the second one, one question: do all TSC's ALWAYS change at the same speed? eg if the unmeasurable difference right now is 5cca2h then in 10 hours am I guaranteed the difference is exactly the same regardless of what the CPU's do (disregarding the error factor of 2a0h)? or can one of the TSC's start moving faster or slower than the others? if the TSC's are guaranteed to always change at the same speed I can do syncs by differences and the computed errors, and dont need IPIs but if they can change I need to keep taking some synch action via IPI's, |
|||
30 May 2007, 01:22 |
|
f0dder 30 May 2007, 11:41
Do keep in mind the difference between AMD & Intel, and multi-core vs. "real" SMP. For instance, all the Unreal Engine games crash on AMD64x2, unless you limit thread affinity, while they work fine on dualcore intels... which is a TSC issue.
|
|||
30 May 2007, 11:41 |
|
Madis731 30 May 2007, 18:31
Difference between AMD & Intel? I don't think there's much to counting TSCs...
A bit on side-topic look at the results count on this: http://www.google.com/search?q=dual+core+crash&start=0&start=0&ie=utf-8&oe=utf-8&client=mozilla&rls=org.mozilla:en-US:unofficial I'm usually on Intel side on everything, but I know that multiple cores/threads cause problems on ANY CPU if the program is not ready for it. The problem is smaller with HT when there's only one CPU. Though there was this cargame where Core 2 f*d up the network traffic on LAN. Anyway, there's a way you can guarantee "no-drifting-TSC" when you force all your CPUs to stay in S0 (or was it C0?) or make them work in full load and have Windows worry about setting them to C0. I don't recall reading anything about BIOS TSC syncing. Maybe it exists, but I think its at least optional and definately not obligatory ^o) I just thought about the reasons why one would want to know both TSCs. I couldn't Maybe the workaround would be just to use one affinity when getting TSC-readings... |
|||
30 May 2007, 18:31 |
|
lazer1 30 May 2007, 20:30
f0dder wrote: Do keep in mind the difference between AMD & Intel, and multi-core vs. "real" SMP. For instance, all the Unreal Engine games crash on AMD64x2, unless you limit thread affinity, while they work fine on dualcore intels... which is a TSC issue. this machine is an AMD which uses the faster better "real" SMP, I think the AMD code will run on the fake SMP, problems are if you use fake SMP code on real SMP probably on fake SMP the 2 TSC's are the same, |
|||
30 May 2007, 20:30 |
|
lazer1 30 May 2007, 20:54
Madis731 wrote: Difference between AMD & Intel? I don't think there's much to counting TSCs... on the Intel's the CPUs can share the same TSC, because the "2" CPUs are in fact 1 CPU pretending to be 2, "hyperthreading", so eg if the 1 CPU changes the TSC the other TSC will also have changed, AMD's seem not to use that and really have 2 unconnected TSC's, Quote:
AMD CPU design decisions totally outclass Intel, Intel CPU design is overgeneralised: too many options which are never used. AMD's design is almost always exactly correct, where Intel are good is at low level non CPU architecture such as buses, SATA, USB2, Quote:
CPUs sharing resources other than physical memory will create problems, as you have to keep track of which CPUs share resources, AFAIK AMD dont use hyperthreading, so no problems on AMD, Quote:
what are S0 and C0? Quote:
the BIOS on this machine has NOT synched its TSC's, they are some 5cca2h apart, whereas I can synch them in s/w to less than 300h, Quote:
if you want high res time and can synch the TSC's it means only 1 CPU needs to keep track of real time. the other CPUs can calculate the high res time when they need it thus: find the current RTC time in seconds eg 30th May 2007 21:03:12 TSC0 then was xyz, and at 21:03:11 was fgh, CPU0 maintains these numbers once a second in memory protected by a semaphore for all CPUs to read, and say TSC1 right now is abc, we now calculate TSC0 right now as abc - lag where we measured lag when booting CPU1, eg on this machine on one bootup the lag was 5cca2h, so on CPU1 (abc-lag) - xyz clock cycles have happened since 21:03:12 so the high res time is: 30th May 2007 21:03:12 + ((abc-lag)-xyz)/(xyz - fgh) seconds, on my machine here that should give high res time to microsecond accuracy, The advantage of this is you dont need inter processor interrupts for calculating high res time, This scheme will work for hyperthreading as well as we dont modify the TSC's we just measure the differences, for this scheme to function we need the TSC's to always change at exactly the same rate. That was why I was asking if the TSCs change at the same rate always, the problem with going via IPI's is if you arent careful the system will freeze up, |
|||
30 May 2007, 20:54 |
|
Madis731 31 May 2007, 07:27
f0dder wrote: Do keep in mind the difference between AMD & Intel, and multi-core vs. "real" SMP. For instance, all the Unreal Engine games crash on AMD64x2, unless you limit thread affinity, while they work fine on dualcore intels... which is a TSC issue. I'm sorry, but you just confused me even more You are saying that Core 2 doesn't use multiple TSCs and isn't the "real" SMP? Why? I KNOW that Core 2 has multiple sets of MSRs. This is NOT the reason why it crashes on AMD and not on Intel. Lets get some thing clear before anymore confusion: lazer1 wrote: on the Intel's the CPUs can share the same TSC, because the "2" CPUs are in fact 1 CPU pretending to be 2, "hyperthreading", so eg if the 1 CPU changes the TSC the other TSC will also have changed, AMD's seem not to use that and really have 2 unconnected TSC's, 1) AMD never came out with HTs 2) Intel HT behaves as expected - it has got only ONE MSR so only one TSC 3) AMD64x2 type of CPUs are multi-core like Intel Cores and Pentium D series. So if saying that AMD seems not to use that - then nor does Intel with Core architecture. lazer1 wrote: AFAIK AMD dont use hyperthreading, so no problems on AMD I don't quite follow. Now the problem is with HT and not with multiple cores? Or which AMD you meant? Plain simple without HT/DC or the x2? lazer1 wrote: what are S0 and C0? Shocked Erm, I knew I should've looked it up. S and D are meant for system and its peripherals as i understand. C0 is the running state for CPU. S0 means your whole system is running at full. C1, C2, C3 etc. are deeper and deeper sleep states. HLT, lower clocks etc. C1 is the most common. When CPU doen't do anything, it will be put to HLT and no clocks are counted. If one CPU is in HLT while the other is doing tasks, the TSCs will be out of sync. There are others things to concern like T0...T7 which is throttling of 12.5% increments. T4 on CPU0 and T0 on the CPU1 will float your CPU TSCs away at the speed of half the clock which is about 1GHz on T7200 CPU. And finally: Yeah, I know IPIs are painful to use and dangerous That's but that might be the one you need if you don't want to use GetPerformanceCounter() |
|||
31 May 2007, 07:27 |
|
f0dder 31 May 2007, 08:35
Madis: most programs won't have any problem with SMP/HT/whatever, as far as I see it, only programs that are threaded but doing it wrong will suffer.
Unreal Engine games (and some others as well) work fine on Intel dual-core machines, but fail on AMD64x2 (not sure what the scenario is on "real SMP", though!) - my guess is it's because of unsynced TSC. Then again, it's my observation that QueryPerformanceCounter() on my AMD64x2 box uses RDTSC, while on intel boxes it uses some chipset timer... so perhaps the games are really doing QPC, and TSCs might be unsynced on intel as well. |
|||
31 May 2007, 08:35 |
|
Madis731 31 May 2007, 09:26
f0dder wrote: ...are threaded but doing it wrong will suffer... Sorry, that's what I meant in the first place I'm a man with many words but little meaning behind them Btw, I read the Intel optimization manual 248966.pdf, page 373, B.1.2 Counting Clocks: There are million and one different ways :S What I learned is next. -With HT, the TSC stops counting ONLY if both threads are at deeper sleep -Non-halted Clock Ticks are not stopped in any power-saving mode (unless powerdown ofc ) -Non-sleep Clock Ticks are not stopped in any sleep modes nor power-saving -TSC is the one that is not per logical unit on HT and as I understand sleep/power modes don't affect this?! Intel manuals wrote: most of the chip (including the performance monitoring hardware) being powered |
|||
31 May 2007, 09:26 |
|
f0dder 31 May 2007, 10:00
Also, speedstep/whatever affects the rate TSC is increased, right?
Imho, the net result of it all is that TSC is okay to measure code timing (ie., for algorithm benchmarking), but shouldn't be used for much else. |
|||
31 May 2007, 10:00 |
|
lazer1 31 May 2007, 13:08
Madis731 wrote:
darn! I wrote that without consulting the docs! I found it now it is Intel vol3 section 7.8.1, yes each CPU has its own TSC alright, there isnt a problem with the HT, Quote:
where is this documented? if I dont do HLT will all cpu TSCs be the same distance apart? Quote:
on this machine the all-including-self doesnt seem to function, when I try that it just interrupts the cpu causing the interrupt, all-excluding-self does function, so it looks like I cannot interrupt all cpus symmetrically, I can only interrupt OTHER cpus symmetrically (unless there is an error with my experiments) |
|||
31 May 2007, 13:08 |
|
lazer1 31 May 2007, 13:16
Quote:
that sounds like the difference between the TSC's is const? the TSCs count clock cycles, is there just 1 clock shared by all the cpus? |
|||
31 May 2007, 13:16 |
|
Madis731 31 May 2007, 13:19
Can you give your machine specs and/or the test source (I hope its not C 'cuz I can only read FASM this summer )
C-states are in Intel manuals, the System Programming part of it (3A). And by accident I found this one on the net: http://acpi.sourceforge.net/documentation/processor.html |
|||
31 May 2007, 13:19 |
|
Madis731 31 May 2007, 13:21
lazer1 wrote:
On HT there's only one TSC and on multiple cores, the TSCs run most of the time and should be "non-drifting", but they won't be synced because BIOS needn't do that (I need confirmation on this - does sometimes BIOS sync TSCs?). |
|||
31 May 2007, 13:21 |
|
lazer1 31 May 2007, 14:33
Madis731 wrote: Can you give your machine specs and/or the test source (I hope its not C 'cuz I can only read FASM this summer ) the CPU is an AMD Turion X2 |
|||
31 May 2007, 14:33 |
|
lazer1 31 May 2007, 14:39
Quote:
Intel vol 3, 7.8.1 says: The following features are duplicated for each logical processor: ....................... * Time stamp counter MSRs |
|||
31 May 2007, 14:39 |
|
lazer1 31 May 2007, 16:10
f0dder wrote: Also, speedstep/whatever affects the rate TSC is increased, right? AFAICS the TSC is the only way to do high res time, eg what is the time now? 2007 May 31 1709 16.831468 seconds, I dont know what speedstep is, but if it is done by the CPU in asm then the CPU can recalibrate each time it does that to maintain TSC sanity, |
|||
31 May 2007, 16:10 |
|
lazer1 31 May 2007, 16:26
some further experiments:
inter processor interrupts "all including self" doesnt function on this machine, however I have managed to interrupt all cpus instead via dest=0ffh, ie destination shorthand==00b and dest ==0ffh, what I am looking at now is each second to use that to interrupt all cpus which then read and record their TSC's that way the TSC's dont need to be at the same speed, the CPUs then are recalibrated each second, so when speed changes the time will be wrong just for up to 1 second, to read the time then you need to do cli and read the local copy of the time and TSC, then sti and interpolate the current TSC value, :I am working on this at the moment, I have to be certain the system cannot freeze up there is no guarantee of accuracy, so eg t1 < t2 with event1 at t1 and event2 at t2 does NOT mean event1 before event2 if they are on different cpus, usually time WILL be accurate, so eg you could use it for precise timing but every now and then it could be imprecise |
|||
31 May 2007, 16:26 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.