flat assembler
Message board for the users of flat assembler.

Index > Windows > Programming for multicore CPU's

Author
Thread Post new topic Reply to topic
SokilOff



Joined: 20 Sep 2010
Posts: 15
SokilOff
The question is very simple.

There is a program that does some time-consuming calculations and fills a table with results. To speedup calculations this job can be done using more than one thread on multicore CPU. All threads are independent, don't share any data and actually don't even require any synchronization.

So which way would be the simplest - call GetProcessAffinityMask to find out how many cores are potentially available, then create so many threads and allow OS task scheduler to determine which thread should be started on which core ? Or it makes some sense to do it manually using SetThreadAffinityMask ?
Post 06 May 2014, 15:25
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
IMHO, the first approach. Actually the OS should probably dispatch the tasks better. Or even better (if possible) you can think of algorithm that will do the calculations faster.
Post 06 May 2014, 16:00
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 729
tthsqe
Why don't you try both and see which is faster.
I usually use SetThreadAffinityMask when the tasks for each core are the same and the cache is important..
Post 06 May 2014, 16:32
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
There is no general answer that is right for all situations. Just try them and see which works for your particular case. But there is one overriding factor to avoid; and that is to make sure you don't issue more threads than you have cores. And there is another more minor factor; and that is to consider hyperthreading (SMT). There are some cases where using all logical cores can hurt performance, and changing to using only as many threads as you have physical cores (and also using affinity settings to force physical core locking) can boost performance.

ETA: I hope your cooling system is in good condition. Your system will likely get hot.
Post 06 May 2014, 16:51
View user's profile Send private message Visit poster's website Reply with quote
SokilOff



Joined: 20 Sep 2010
Posts: 15
SokilOff
Thanks, guys. I'll try both variants and see, which could give better results.

tthsqe wrote:
I usually use SetThreadAffinityMask when the tasks for each core are the same and the cache is important..

The problem here is that you can assign a thread to already heavily loaded core.

revolution wrote:

And there is another more minor factor; and that is to consider hyperthreading (SMT). There are some cases where using all logical cores can hurt performance, and changing to using only as many threads as you have physical cores (and also using affinity settings to force physical core locking) can boost performance.

Also good point. Thank you.
Post 09 May 2014, 12:53
View user's profile Send private message Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 693
Location: Adelaide
sinsi
SokilOff wrote:
The problem here is that you can assign a thread to already heavily loaded core.

Maybe a hint using SetThreadIdealProcessor might be better than limiting to one core with SetThreadAffinityMask.
Post 09 May 2014, 13:28
View user's profile Send private message Reply with quote
SokilOff



Joined: 20 Sep 2010
Posts: 15
SokilOff
sinsi wrote:
Maybe a hint using SetThreadIdealProcessor might be better than limiting to one core with SetThreadAffinityMask.
Indeed this could be a good solution. Thanks !

One more question. If all threads have one common counter, is it enough to do something like

lock inc [CommonCounter]

in all these threads to be sure they don't modify counter at the same time without knowing that value has already been changed by another thread (for example in other core's cache but not yet written back to RAM) ?

And the same story with something like:

mov eax, [valueToCompare]
mov ecx, [newValue]
mov esi, requiredAddress
lock cmpxchg [esi], ecx ; <---- this one

Will just be enough to use prefix 'lock' in all threads to be sure that content of [requiredAddress] won't be changed more than once ?

In theory it should be (atomic operations are used in both cases). But maybe I missed something...


edit: typo
Post 10 May 2014, 13:33
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
SokilOff wrote:
If all threads have one common counter, is it enough to do something like

lock inc [CommonCounter]

in all these threads to be sure they don't modify counter at the same time without knowing that value has already been changed by another thread (for example in other core's cache but not yet written back to RAM) ?
This will work as you would expect without a problem. The cache coherency protocol will handle it properly. However there is a small caveat when dealing with multi-socket mobos (not multi-core CPUs). Each counter should be placed in a separate cache line from other counters.
SokilOff wrote:
And the same story with something like:

mov eax, [valueToCompare]
mov ecx, [newValue]
mov esi, requiredAddress
lock cmpxchg [esi], ecx ; <---- this one

Will just be enough to use prefix 'lock' in all threads to be sure that content of [requiredAddress] won't be changed more than once ?

In theory it should be (atomic operations are used in both cases). But maybe I missed something...
cmpxchg is designed to be atomic so the same applies, it will work as you suggest.
Post 11 May 2014, 11:37
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.