flat assembler
Message board for the users of flat assembler.

Index > Windows > Disable SMT for my process?

Goto page 1, 2, 3, 4, 5, 6, 7  Next
Author
Thread Post new topic Reply to topic
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
I think it can benefit from multiple cores.. but it sounds like SMT will make it slower
Quote:
SMT looks for instruction parallelism in two threads instead of just one, with the goal of leaving as few units unused as possible. This approach can be extremely effective when the two threads are executing tasks that are highly separate. On the other hand, two threads involving intensive calculation, for example, will only increase the pressure on the same calculating units, putting them in competition with each other for access to the cache. It goes without saying that SMT is of no interest in this type of situation, and can even negatively impact performance.


Is there a way to disable it just for threads in my process? So that it will only run one thread per core? Or can it only be adjusted system-wide (in BIOS)? Confused
Post 31 Aug 2009, 18:14
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
in BIOS just disable HT. But note that some Pentium4 and Core i7 are the ones supporting this. If I remember right no Core/Core 2 support HT.
Post 31 Aug 2009, 18:25
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Aww.. so there is no way to do it per process? I've been thinking of getting an i7 but I think HT is going to be bad for my program.. but I don't want to disable it for everything.. :/


P.S. HT and SMT are the same right? Just different names?
Post 31 Aug 2009, 18:29
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
http://en.wikipedia.org/wiki/Simultaneous_multithreading

However, perhaps a more generally speaking it is about implementing this using physical cores too, in which case you could set the affinity mask of the process to just one of them.

I don't know if it is possible to set the affinity mask to only one logical processor per core.
Post 31 Aug 2009, 18:41
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Would I just set it using the affinity mask thing in taskman? Like uncheck every-other one? Or do you mean something else?
Post 31 Aug 2009, 18:47
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
http://msdn.microsoft.com/en-us/library/ms686223%28VS.85%29.aspx (hey, now firefox automatically converted (VS.85) into %28VS.85%29)

It is very likely you'll have to ignore the first community content.
Post 31 Aug 2009, 18:52
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Thanks Smile
Post 31 Aug 2009, 18:56
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Benchmark.

There's supposedly a big difference between P4 and i7 SMT performance - and even on P4 you could get a speed boost if you knew what you were doing.

Also, if you benchmark and find that i7 SMT slows you down, you can use CPUID to query about the cpu structure and use affinity mask to only run on non-SMT cores (dunno if there's any API calls to do the querying).
Post 02 Sep 2009, 05:23
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
f0dder, but how could you get "non-SMT cores"? The best you can do would be selecting the "left" or "right" logical core from each physical core.
Post 02 Sep 2009, 07:21
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
LocoDelAssembly wrote:
f0dder, but how could you get "non-SMT cores"? The best you can do would be selecting the "left" or "right" logical core from each physical core.
Well, tbh I haven't read op on the CPUID topology information retrieval, but afaik it tells you whether a logical core is SMT or not - it should be possible to map this information to a Windows cpu number (this might be a bit hacky, though?)

If you're willing to limit the OSes you run on (or OSes where you can detect SMT), check out GetLogicalProcessorInformation(). Vista/Server2003/XP64 and later only.

_________________
Image - carpe noctem
Post 02 Sep 2009, 07:33
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
BTW I don't recommend getting an i7 until it moves to 32nm...
Post 02 Sep 2009, 14:54
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Ya that's what I'm waiting for ^^ I like the looks of AES-NI.
Post 02 Sep 2009, 14:58
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
1) I've assembled many i7 platforms (many for sale) and they're all greatful for it. HT or not, its a nice CPU and DDR3 doesn't hurt
2) HT means there are double the register on a current CPU. Both logical ones are actually the same CPU. There is NO SUCH thing as SMT CPU. Like LocoDelAssembly said, you can go for left or right and they perform the same.

32nm is of course better, but then you'd have to wait for the "bugfixes" (tick-tock the Intel goes).
There's actually a fresh i7 aswell on the 45nm, which is more power-efficient. I believe the very first models in 32nm-world aren't THAT great and you'll have to wait just a bit more.


(PS. Very pleased with my 8 threads and SSE4.2 PCMPxSTRx instructions)

EDIT: [sarcasm] Some say... i7 w/ SSE4.2 can make your O(N) algorithms O(1)! [/sarcasm]
Post 02 Sep 2009, 16:25
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
If N is a low enough number, then that statement could be true..




Personally though I'm looking forward to the AES and AVX instructions more.
Post 02 Sep 2009, 20:26
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Azu wrote:
If N is a low enough number, then that statement could be true..
No it wouldn't. Math works differently than intuition, even if it appears that way.

O(N)=O(1) has only one solution: N=1. But N isn't always 1 though, and in math, it has to apply ALWAYS. Wink

And i7s are too hot, 130W at max load. Intel released Core 2 Quads at "green" version though, but they aren't better than an underclocked Core 2 Quad, and they cost more. Rolling Eyes

Madis731 wrote:
1) I've assembled many i7 platforms (many for sale) and they're all greatful for it. HT or not, its a nice CPU and DDR3 doesn't hurt
price does. Or should I say, performance/price. Wink

DDR3 also means more expensive mobo. And for me at least, there's no DDR3 with ECC, except for server mobos with Xeon processors, which I won't go into that territory ($$$$).

_________________
Previously known as The_Grey_Beast
Post 03 Sep 2009, 16:38
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Borsuc wrote:
Azu wrote:
If N is a low enough number, then that statement could be true..
No it wouldn't. Math works differently than intuition, even if it appears that way.

O(N)=O(1) has only one solution: N=1. But N isn't always 1 though, and in math, it has to apply ALWAYS. Wink
I thought SIMD instructions took the same amount of time to run whether you just give them 1 element or the max they support?

Borsuc wrote:
i7s are too hot, 130W at max load.
Shouldn't Westmere fix that? Confused
If not, ouch.
Post 03 Sep 2009, 16:46
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:

I thought SIMD instructions took the same amount of time to run whether you just give them 1 element or the max they support?

But after replacing that SIMD instruction by several non-SIMD instruction the algorithm will remain O(1) (if it previously was).
Post 03 Sep 2009, 17:18
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
LocoDelAssembly wrote:
Quote:

I thought SIMD instructions took the same amount of time to run whether you just give them 1 element or the max they support?

But after replacing that SIMD instruction by several non-SIMD instruction the algorithm will remain O(1) (if it previously was).
Huh? We're talking about taking O(N) algorithms using non-SIMD and turning them into O(1) with SIMD, where N is low enough.
Post 03 Sep 2009, 17:23
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17476
Location: In your JS exploiting you and your system
revolution
Azu wrote:
Huh? We're talking about taking O(N) algorithms using non-SIMD and turning them into O(1) with SIMD, where N is low enough.
You can't change the big-O notation with a CPU instruction set. SIMD can at best give you linear speed up, this has no effect whatsoever on the algorithmic complexity measurement.
Post 03 Sep 2009, 17:27
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
revolution wrote:
Azu wrote:
Huh? We're talking about taking O(N) algorithms using non-SIMD and turning them into O(1) with SIMD, where N is low enough.
You can't change the big-O notation with a CPU instruction set. SIMD can at best give you linear speed up, this has no effect whatsoever on the algorithmic complexity measurement.
It can if N is small enough.


Here's an example;

Let's say you have a non-SIMD algorithm that takes 1 clock cycle per addition, and then you have a SIMD algorithm that does 16 additions at once in 2 clock cycles.

The non-SIMD one is O(N), but if N<17 then the SIMD one is O(1).
Post 03 Sep 2009, 17:37
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4, 5, 6, 7  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.