flat assembler
Message board for the users of flat assembler.

Index > Main > MP TSC synchronisation?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 20 May 2007, 18:06
echoing MP TSC's they appear approximately to be the same,

if I zero a TSC that one is then definitely different, so that proves that
each CPU has its own private TSC

AFAICS it is impossible to read all the TSC's at exactly the same time
to see if they are exactly the same. Nothing can be guaranteed to
be at the same time with multiprocessors unless it is hardwired,

and it must be impossible to write them all at the same time
to make them the same,

are the TSCs setup by the BIOS identical?

can they be identical?

can they be synchronised?

if they are synchronised can different cpus start
running at different speeds to save power and the
TSCs then no longer synchronised?

does anyone know where in the AMD volume 2 documentation
the TSC is discussed?
Post 20 May 2007, 18:06
View user's profile Send private message Reply with quote
nasm64developer



Joined: 11 Jul 2006
Posts: 4
nasm64developer 27 May 2007, 18:37
Look at the source!

That of the Linux kernel, for example.

It contains code which handles TSC synchronization.[/quote]
Post 27 May 2007, 18:37
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 May 2007, 22:28
The simple answer is YES - they can be synchronised, but usually you don't need that and syncing is really painful. There are IPIs to do that for you, which are relatively fast.

My logic says that the quick&dirty way goes something like:
1) Put all CPUs to known address and state (through IPIs)
2) HLT them
3) Raise some APIC-wide interrupt to make all CPUs start executing the same interrupt code
4) CPUs having the same state and same conditions cache-wise will probaly execute your nifty "TSC to Zero" code with equal timing.

Voila! Very Happy That doesn't sound so simple anymore, but I couldn't write in less words Razz


Last edited by Madis731 on 30 May 2007, 18:32; edited 1 time in total
Post 27 May 2007, 22:28
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 30 May 2007, 01:22
Madis731 wrote:
The simple answer is YES - they can be synchronised, but usually you don't need that and suncing is really painful. There are IPIs to do that for you, which are relatively fast.

My logic says that the quick&dirty way goes something like:
1) Put all CPUs to known address and state (through IPIs)
2) HLT them
3) Raise some APIC-wide interrupt to make all CPUs start executing the same interrupt code
4) CPUs having the same state and same conditions cache-wise will probaly execute your nifty "TSC to Zero" code with equal timing.

Voila! Very Happy That doesn't sound so simple anymore, but I couldn't write in less words Razz


but are you guaranteed they will be exactly the same? Shocked
eg the BSP APIC may be holding off hardware interrupts which will be an asymmetry,

this machine only has 2 cpus, by using a "mov" and "cmp" 1 way semaphore which doesnt need "lock" :

BSP: read TSC to registers
BSP: set sem to 1
AP: waits for sem to be 1
AP: reads TSC to registers
AP: set sem to 0
BSP: waits for sem to be 0
BSP: reads TSC

(the actual code to do this is a lot more complicated but
the above is a meta presentation of it. In fact I run the code
twice to guarantee everything is via caches. I switch off ints
as well as this idea doesnt use interrupts)

Here the TSC's arent changed but I want to just measure
the difference AND to quantify the error:

in one trial the difference of the 2 TSC's was 5cca2h
which proves the BIOS HADNT synched them,

and the error factor is usually <= 2a0h

so here I can synch them via subtraction up to an error factor of 2a0h,
that error factor being the difference between the 2 BSP TSC measurements

so it takes some 700 clock cycles from the first BSP TSC read
to the second one,

one question: do all TSC's ALWAYS change at the same speed?

eg if the unmeasurable difference right now is 5cca2h then in 10 hours
am I guaranteed the difference is exactly the same regardless
of what the CPU's do (disregarding the error factor of 2a0h)?

or can one of the TSC's start moving faster or slower than
the others?

if the TSC's are guaranteed to always change at the same speed
I can do syncs by differences and the computed errors,
and dont need IPIs

but if they can change I need to keep taking some synch action
via IPI's,
Post 30 May 2007, 01:22
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 30 May 2007, 11:41
Do keep in mind the difference between AMD & Intel, and multi-core vs. "real" SMP. For instance, all the Unreal Engine games crash on AMD64x2, unless you limit thread affinity, while they work fine on dualcore intels... which is a TSC issue.
Post 30 May 2007, 11:41
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 30 May 2007, 18:31
Difference between AMD & Intel? I don't think there's much to counting TSCs...
A bit on side-topic look at the results count on this: http://www.google.com/search?q=dual+core+crash&start=0&start=0&ie=utf-8&oe=utf-8&client=mozilla&rls=org.mozilla:en-US:unofficial
I'm usually on Intel side on everything, but I know that multiple cores/threads cause problems on ANY CPU if the program is not ready for it. The problem is smaller with HT when there's only one CPU. Though there was this cargame where Core 2 f*d up the network traffic on LAN.

Anyway, there's a way you can guarantee "no-drifting-TSC" when you force all your CPUs to stay in S0 (or was it C0?) or make them work in full load and have Windows worry about setting them to C0.

I don't recall reading anything about BIOS TSC syncing. Maybe it exists, but I think its at least optional and definately not obligatory ^o)

I just thought about the reasons why one would want to know both TSCs. I couldn't Smile Maybe the workaround would be just to use one affinity when getting TSC-readings...
Post 30 May 2007, 18:31
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 30 May 2007, 20:30
f0dder wrote:
Do keep in mind the difference between AMD & Intel, and multi-core vs. "real" SMP. For instance, all the Unreal Engine games crash on AMD64x2, unless you limit thread affinity, while they work fine on dualcore intels... which is a TSC issue.


this machine is an AMD which uses the faster better "real" SMP,

I think the AMD code will run on the fake SMP, problems are
if you use fake SMP code on real SMP

probably on fake SMP the 2 TSC's are the same,
Post 30 May 2007, 20:30
View user's profile Send private message Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 30 May 2007, 20:54
Madis731 wrote:
Difference between AMD & Intel? I don't think there's much to counting TSCs...


on the Intel's the CPUs can share the same TSC, because the "2" CPUs
are in fact 1 CPU pretending to be 2, "hyperthreading",

so eg if the 1 CPU changes the TSC the other TSC will also have
changed,

AMD's seem not to use that and really have 2 unconnected TSC's,

Quote:

A bit on side-topic look at the results count on this: http://www.google.com/search?q=dual+core+crash&start=0&start=0&ie=utf-8&oe=utf-8&client=mozilla&rls=org.mozilla:en-US:unofficial
I'm usually on Intel side on everything,


AMD CPU design decisions totally outclass Intel,

Intel CPU design is overgeneralised: too many options
which are never used.

AMD's design is almost always exactly correct,

where Intel are good is at low level non CPU architecture
such as buses, SATA, USB2,



Quote:

but I know that multiple cores/threads cause problems on ANY CPU if the program is not ready for it.


CPUs sharing resources other than physical memory will create problems,
as you have to keep track of which CPUs share resources,

AFAIK AMD dont use hyperthreading, so no problems on AMD,

Quote:

The problem is smaller with HT when there's only one CPU.
Though there was this cargame where Core 2 f*d up the network traffic on LAN.

Anyway, there's a way you can guarantee "no-drifting-TSC" when you force all your CPUs to stay in S0 (or was it C0?) or make them work in full load and have Windows worry about setting them to C0.


what are S0 and C0? Shocked

Quote:

I don't recall reading anything about BIOS TSC syncing. Maybe it exists, but I think its at least optional and definately not obligatory ^o)


the BIOS on this machine has NOT synched its TSC's,

they are some 5cca2h apart, whereas I can synch them in s/w
to less than 300h,

Quote:

I just thought about the reasons why one would want to know both TSCs. I couldn't Smile Maybe the workaround would be just to use one affinity when getting TSC-readings...


if you want high res time and can synch the TSC's it means
only 1 CPU needs to keep track of real time.

the other CPUs can calculate the high res time when they need it thus:

find the current RTC time in seconds eg

30th May 2007 21:03:12 TSC0 then was xyz, and
at 21:03:11 was fgh,

CPU0 maintains these numbers once a second
in memory protected by a semaphore for all
CPUs to read,


and say TSC1 right now is abc,

we now calculate TSC0 right now as abc - lag
where we measured lag when booting CPU1,
eg on this machine on one bootup the lag was
5cca2h,


so on CPU1 (abc-lag) - xyz clock cycles have happened
since 21:03:12

so the high res time is:

30th May 2007 21:03:12 + ((abc-lag)-xyz)/(xyz - fgh) seconds,

on my machine here that should give high res time
to microsecond accuracy,

The advantage of this is you dont need inter processor interrupts
for calculating high res time,

This scheme will work for hyperthreading as well as we dont
modify the TSC's we just measure the differences,

for this scheme to function we need the TSC's to always change
at exactly the same rate. That was why I was asking if
the TSCs change at the same rate always,

the problem with going via IPI's is if you arent careful
the system will freeze up,
Post 30 May 2007, 20:54
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 31 May 2007, 07:27
f0dder wrote:
Do keep in mind the difference between AMD & Intel, and multi-core vs. "real" SMP. For instance, all the Unreal Engine games crash on AMD64x2, unless you limit thread affinity, while they work fine on dualcore intels... which is a TSC issue.


I'm sorry, but you just confused me even more Confused You are saying that Core 2 doesn't use multiple TSCs and isn't the "real" SMP? Why? I KNOW that Core 2 has multiple sets of MSRs. This is NOT the reason why it crashes on AMD and not on Intel.

Lets get some thing clear before anymore confusion:
lazer1 wrote:
on the Intel's the CPUs can share the same TSC, because the "2" CPUs are in fact 1 CPU pretending to be 2, "hyperthreading", so eg if the 1 CPU changes the TSC the other TSC will also have changed, AMD's seem not to use that and really have 2 unconnected TSC's,

1) AMD never came out with HTs
2) Intel HT behaves as expected - it has got only ONE MSR so only one TSC
3) AMD64x2 type of CPUs are multi-core like Intel Cores and Pentium D series.

So if saying that AMD seems not to use that - then nor does Intel with Core architecture.

lazer1 wrote:
AFAIK AMD dont use hyperthreading, so no problems on AMD

I don't quite follow. Now the problem is with HT and not with multiple cores? Or which AMD you meant? Plain simple without HT/DC or the x2?

lazer1 wrote:
what are S0 and C0? Shocked

Erm, I knew I should've looked it up. S and D are meant for system and its peripherals as i understand. C0 is the running state for CPU. S0 means your whole system is running at full. C1, C2, C3 etc. are deeper and deeper sleep states. HLT, lower clocks etc. C1 is the most common. When CPU doen't do anything, it will be put to HLT and no clocks are counted. If one CPU is in HLT while the other is doing tasks, the TSCs will be out of sync.
There are others things to concern like T0...T7 which is throttling of 12.5% increments. T4 on CPU0 and T0 on the CPU1 will float your CPU TSCs away at the speed of half the clock which is about 1GHz on T7200 CPU.


And finally:
Yeah, I know IPIs are painful to use and dangerous Sad That's but that might be the one you need if you don't want to use GetPerformanceCounter()

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 31 May 2007, 07:27
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 31 May 2007, 08:35
Madis: most programs won't have any problem with SMP/HT/whatever, as far as I see it, only programs that are threaded but doing it wrong will suffer.

Unreal Engine games (and some others as well) work fine on Intel dual-core machines, but fail on AMD64x2 (not sure what the scenario is on "real SMP", though!) - my guess is it's because of unsynced TSC.

Then again, it's my observation that QueryPerformanceCounter() on my AMD64x2 box uses RDTSC, while on intel boxes it uses some chipset timer... so perhaps the games are really doing QPC, and TSCs might be unsynced on intel as well.
Post 31 May 2007, 08:35
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 31 May 2007, 09:26
f0dder wrote:
...are threaded but doing it wrong will suffer...

Sorry, that's what I meant in the first place Smile I'm a man with many words but little meaning behind them Razz


Btw, I read the Intel optimization manual 248966.pdf, page 373, B.1.2 Counting Clocks:
There are million and one different ways :S What I learned is next.
-With HT, the TSC stops counting ONLY if both threads are at deeper sleep
-Non-halted Clock Ticks are not stopped in any power-saving mode (unless powerdown ofc Very Happy)
-Non-sleep Clock Ticks are not stopped in any sleep modes nor power-saving
-TSC is the one that is not per logical unit on HT and as I understand sleep/power modes don't affect this?!
Intel manuals wrote:
most of the chip (including the performance monitoring hardware) being powered
down. In this situation, it is possible for the time-stamp counter to continue incrementing
because the clock signal on the system bus is still active

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 31 May 2007, 09:26
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 31 May 2007, 10:00
Also, speedstep/whatever affects the rate TSC is increased, right?

Imho, the net result of it all is that TSC is okay to measure code timing (ie., for algorithm benchmarking), but shouldn't be used for much else.
Post 31 May 2007, 10:00
View user's profile Send private message Visit poster's website Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 31 May 2007, 13:08
Madis731 wrote:
f0dder wrote:
Do keep in mind the difference between AMD & Intel, and multi-core vs. "real" SMP. For instance, all the Unreal Engine games crash on AMD64x2, unless you limit thread affinity, while they work fine on dualcore intels... which is a TSC issue.


I'm sorry, but you just confused me even more Confused You are saying that Core 2 doesn't use multiple TSCs and isn't the "real" SMP? Why? I KNOW that Core 2 has multiple sets of MSRs. This is NOT the reason why it crashes on AMD and not on Intel.

Lets get some thing clear before anymore confusion:
lazer1 wrote:
on the Intel's the CPUs can share the same TSC, because the "2" CPUs are in fact 1 CPU pretending to be 2, "hyperthreading", so eg if the 1 CPU changes the TSC the other TSC will also have changed, AMD's seem not to use that and really have 2 unconnected TSC's,

1) AMD never came out with HTs
2) Intel HT behaves as expected - it has got only ONE MSR so only one TSC
3) AMD64x2 type of CPUs are multi-core like Intel Cores and Pentium D series.

So if saying that AMD seems not to use that - then nor does Intel with Core architecture.


darn! Embarassed

I wrote that without consulting the docs!

I found it now it is Intel vol3 section 7.8.1,
yes each CPU has its own TSC

alright, there isnt a problem with the HT,

Quote:

lazer1 wrote:
AFAIK AMD dont use hyperthreading, so no problems on AMD

I don't quite follow. Now the problem is with HT and not with multiple cores? Or which AMD you meant? Plain simple without HT/DC or the x2?

lazer1 wrote:
what are S0 and C0? Shocked

Erm, I knew I should've looked it up. S and D are meant for system and its peripherals as i understand. C0 is the running state for CPU. S0 means your whole system is running at full. C1, C2, C3 etc. are deeper and deeper sleep states. HLT, lower clocks etc. C1 is the most common. When CPU doen't do anything, it will be put to HLT and no clocks are counted. If one CPU is in HLT while the other is doing tasks, the TSCs will be out of sync.
There are others things to concern like T0...T7 which is throttling of 12.5% increments. T4 on CPU0 and T0 on the CPU1 will float your CPU TSCs away at the speed of half the clock which is about 1GHz on T7200 CPU.


Confused

where is this documented?

if I dont do HLT will all cpu TSCs be the same distance apart?

Quote:

And finally:
Yeah, I know IPIs are painful to use and dangerous Sad That's but that might be the one you need if you don't want to use GetPerformanceCounter()


on this machine the all-including-self doesnt seem to function,
when I try that it just interrupts the cpu causing the interrupt,

all-excluding-self does function,

so it looks like I cannot interrupt all cpus symmetrically,
I can only interrupt OTHER cpus symmetrically

(unless there is an error with my experiments)
Post 31 May 2007, 13:08
View user's profile Send private message Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 31 May 2007, 13:16
Quote:

Btw, I read the Intel optimization manual 248966.pdf, page 373, B.1.2 Counting Clocks:
There are million and one different ways :S What I learned is next.
-With HT, the TSC stops counting ONLY if both threads are at deeper sleep
-Non-halted Clock Ticks are not stopped in any power-saving mode (unless powerdown ofc Very Happy)
-Non-sleep Clock Ticks are not stopped in any sleep modes nor power-saving
-TSC is the one that is not per logical unit on HT and as I understand sleep/power modes don't affect this?!


that sounds like the difference between the TSC's is const?

the TSCs count clock cycles, is there just 1 clock shared by
all the cpus?
Post 31 May 2007, 13:16
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 31 May 2007, 13:19
Can you give your machine specs and/or the test source (I hope its not C Razz 'cuz I can only read FASM this summer Smile )

C-states are in Intel manuals, the System Programming part of it (3A).
And by accident I found this one on the net: http://acpi.sourceforge.net/documentation/processor.html
Post 31 May 2007, 13:19
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 31 May 2007, 13:21
lazer1 wrote:
Quote:

Btw, I read the Intel optimization manual 248966.pdf, page 373, B.1.2 Counting Clocks:
There are million and one different ways :S What I learned is next.
-With HT, the TSC stops counting ONLY if both threads are at deeper sleep
-Non-halted Clock Ticks are not stopped in any power-saving mode (unless powerdown ofc Very Happy)
-Non-sleep Clock Ticks are not stopped in any sleep modes nor power-saving
-TSC is the one that is not per logical unit on HT and as I understand sleep/power modes don't affect this?!


that sounds like the difference between the TSC's is const?

the TSCs count clock cycles, is there just 1 clock shared by
all the cpus?

On HT there's only one TSC and on multiple cores, the TSCs run most of the time and should be "non-drifting", but they won't be synced because BIOS needn't do that (I need confirmation on this - does sometimes BIOS sync TSCs?).

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 31 May 2007, 13:21
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 31 May 2007, 14:33
Madis731 wrote:
Can you give your machine specs and/or the test source (I hope its not C Razz 'cuz I can only read FASM this summer Smile )

C-states are in Intel manuals, the System Programming part of it (3A).
And by accident I found this one on the net: http://acpi.sourceforge.net/documentation/processor.html


the CPU is an AMD Turion X2
Post 31 May 2007, 14:33
View user's profile Send private message Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 31 May 2007, 14:39
Quote:

On HT there's only one TSC


Intel vol 3, 7.8.1 says:

The following features are duplicated for each logical processor:

.......................

* Time stamp counter MSRs
Post 31 May 2007, 14:39
View user's profile Send private message Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 31 May 2007, 16:10
f0dder wrote:
Also, speedstep/whatever affects the rate TSC is increased, right?

Imho, the net result of it all is that TSC is okay to measure code timing (ie., for algorithm benchmarking), but shouldn't be used for much else.


AFAICS the TSC is the only way to do high res time,

eg what is the time now?

2007 May 31 1709 16.831468 seconds,

I dont know what speedstep is, but if it is done by the CPU in asm
then the CPU can recalibrate each time it does that to maintain
TSC sanity,
Post 31 May 2007, 16:10
View user's profile Send private message Reply with quote
lazer1



Joined: 24 Jan 2006
Posts: 185
lazer1 31 May 2007, 16:26
some further experiments:

inter processor interrupts "all including self" doesnt function
on this machine, Sad

however I have managed to interrupt all cpus instead via dest=0ffh, Razz

ie destination shorthand==00b and dest ==0ffh,


what I am looking at now is each second to use that to
interrupt all cpus which then read and record their TSC's

that way the TSC's dont need to be at the same speed,

the CPUs then are recalibrated each second, so when speed changes
the time will be wrong just for up to 1 second,

to read the time then you need to do cli
and read the local copy of the time and TSC,
then sti and interpolate the current TSC value,

:I am working on this at the moment, I have to be certain
the system cannot freeze up


there is no guarantee of accuracy, so eg

t1 < t2

with event1 at t1 and event2 at t2

does NOT mean event1 before event2
if they are on different cpus,

usually time WILL be accurate, so eg you could
use it for precise timing but every now and then
it could be imprecise
Post 31 May 2007, 16:26
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.