flat assembler
Message board for the users of flat assembler.

Index > Windows > Try of a core count with GetProcessAffinityMask

Author
Thread Post new topic Reply to topic
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 23 Dec 2008, 14:37
Hi folks,

inspired by the JuliaSSE code and still on the search for code that counts any logical cpu core including the virtual ones by hyper threading, I tried to do this with the proposed GetProcessAffinityMask, see the code attached.

As I'm really not into OS-coding...is that the way to do it correctly ?

For example on a normal Core2Quad it should display '4', on any i7 there shoud be '8', on any Dual Core2Quad there should be also '8'...and so on...hope you get what I mean. I could only test it on a Core2Duo.

I commented out also a way I think how it would be done with CPUID, but Revolution didn't recommend that...so any comments/corrections welcome...


Description:
Download
Filename: logical_cores.asm
Filesize: 3.93 KB
Downloaded: 289 Time(s)

Post 23 Dec 2008, 14:37
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20519
Location: In your JS exploiting you and your system
revolution 23 Dec 2008, 14:55
Kuemmel wrote:
... still on the search for code that counts any logical cpu core including the virtual ones by hyper threading, I tried to do this with the proposed GetProcessAffinityMask, see the code attached.

As I'm really not into OS-coding...is that the way to do it correctly ?
Looks fine to me. In windows that would be the proper way to determine how many CPUs the OS is allowing your program to run on. Usually this is the same as the SystemAffinityMask, but not always. Also this is usually the total number of logical CPUs in the system, but again, not always.
Post 23 Dec 2008, 14:55
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4162
Location: vpcmpistri
bitRAKE 24 Dec 2008, 07:20
For the sake of obtaining the highest speed, I've looked into the NUMA support on windows [AMD]. Numa can provide more detailed information about the relationships between cores and (cache) memory.
Code:
#   Constituent CPUs  Relationship               Parameters
0   CPU-0             Processor
1   CPU-0             Level 1 Data Cache         Associativity  8  LineSize 64  Size 32KB
2   CPU-0             Level 1 Instruction Cache  Associativity  8  LineSize 64  Size 32KB
3   CPU-1             Processor
4   CPU-1             Level 1 Data Cache         Associativity  8  LineSize 64  Size 32KB
5   CPU-1             Level 1 Instruction Cache  Associativity  8  LineSize 64  Size 32KB
6   CPU-0 CPU-1       Level 2 Unified Cache      Associativity 24  LineSize 64  Size 6MB
7   CPU-2             Processor
8   CPU-2             Level 1 Data Cache         Associativity  8  LineSize 64  Size 32KB
9   CPU-2             Level 1 Instruction Cache  Associativity  8  LineSize 64  Size 32KB
10  CPU-0 - CPU-3     Shared Physical Package
11  CPU-3             Processor
12  CPU-3             Level 1 Data Cache         Associativity  8  LineSize 64  Size 32KB
13  CPU-3             Level 1 Instruction Cache  Associativity  8  LineSize 64  Size 32KB
14  CPU-2 CPU-3       Level 2 Unified Cache      Associativity 24  LineSize 64  Size 6MB
15  CPU-4             Processor
16  CPU-4             Level 1 Data Cache         Associativity  8  LineSize 64  Size 32KB
17  CPU-4             Level 1 Instruction Cache  Associativity  8  LineSize 64  Size 32KB
18  CPU-5             Processor
19  CPU-5             Level 1 Data Cache         Associativity  8  LineSize 64  Size 32KB
20  CPU-5             Level 1 Instruction Cache  Associativity  8  LineSize 64  Size 32KB
21  CPU-4 CPU-5       Level 2 Unified Cache      Associativity 24  LineSize 64  Size 6MB
22  CPU-6             Processor
23  CPU-6             Level 1 Data Cache         Associativity  8  LineSize 64  Size 32KB
24  CPU-6             Level 1 Instruction Cache  Associativity  8  LineSize 64  Size 32KB
25  CPU-4 - CPU-7     Shared Physical Package
26  CPU-7             Processor
27  CPU-7             Level 1 Data Cache         Associativity  8  LineSize 64  Size 32KB
28  CPU-7             Level 1 Instruction Cache  Associativity  8  LineSize 64  Size 32KB
29  CPU-6 CPU-7       Level 2 Unified Cache      Associativity 24  LineSize 64  Size 6MB
30  CPU-0 - CPU-7     NUMA Node 0                Free Memory 10,217MB    
...notice how pairs of cores share 6MB of cache, and two packages share four processors. This information comes from GetLogicalProcessorInformation function. Going forward, I imagine the complexity to only increase in this area. There was also a post on slashdot of this very topic recently, Not All Cores Are Created Equal (links to paper).
Post 24 Dec 2008, 07:20
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 28 Dec 2008, 04:30
You either have to use some specific OS routines, or do CPUID (if you don't care about non-x86). GetProcessAffinityMask() is not the way to go.

Also, don't hardcore stuff too much, give users configurability. P4 HT != core i7 HT != 1g multicore != multi-cpu != 2g multicore.
Post 28 Dec 2008, 04:30
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20519
Location: In your JS exploiting you and your system
revolution 28 Dec 2008, 04:41
f0dder wrote:
GetProcessAffinityMask() is not the way to go.
Why not?

What do you suggest is "the way to go"?
Post 28 Dec 2008, 04:41
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 28 Dec 2008, 05:00
GetProcessAffinityMask() only tells you what Windows has decided. I haven't owned a HT CPU, but my guess is you can do better with a combo of CPUID topology querying and user config...

unless all windows versions have really well-working topology detection. And keep in mind that all current windows versions are pre core-i7, which HT implementation is supposed to be a lot different than P4 HT. And older windows versions can't differentiate HT from MP from MC... Smile
Post 28 Dec 2008, 05:00
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20519
Location: In your JS exploiting you and your system
revolution 28 Dec 2008, 05:20
f0dder wrote:
GetProcessAffinityMask() only tells you what Windows has decided.
Yes, that is precisely the reason that I suggest using it. In another thread I gave the example of Win98. Win98 only supports one CPU and has no method of running tasks on more than one core. Using CPUID is pointless in Win98 because even it you find 4 cores, trying to run 4 threads will achieve nothing.

You also said:
f0dder wrote:
You either have to use some specific OS routines ...
And I completely agree.
Post 28 Dec 2008, 05:20
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.