flat assembler
Message board for the users of flat assembler.

Index > Windows > Disable SMT for my process?

Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author
Thread Post new topic Reply to topic
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
But that is how I mean..
Post 03 Sep 2009, 19:45
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Azu, in math, when studying asymptotic functions, you use a partial derivative, which is a measure of how a function changes if you change a parameter.

So let's say we have 3 functions, one has 1 add, second has 10000 adds, and the third has 1000000000000 adds. (doesn't matter, just a really big number). All of these, though, do not have a parameter: the number of additions is always constant, in this example.

Function 3 will always have 1000000000000 because it has no parameter to adjust the function. The derivative is 0. Hence the function is constant, hence its complexity is O(1). Yes, function 1 and function 3 are both O(1) as is function 2, not O(1), O(10000), and O(1000000000000) as you might think.

If you have a function that has a parameter, like let's say, the "number of items", then it's not constant anymore. You analyze the derivative to see how the function's timing changes if you vary the parameter.

If it varies linearly, that is, doubling the "number of items" for any number of items makes the execution 2 times slower, then it is O(N) because it has a linear relationship.

If it actually needs 4 times the amount of time for twice the "number of items" (again, for any number of items, not "exceptions"), then it is O(N^2) because it is a quadratic relationship: you see, N^2 is the time required for any N number of items, mathematically speaking.

It has nothing to do with execution time on a given CPU, it has everything to do with measuring the derivative and the number of steps it has to perform compared to the previous parameter value (that's what the derivative does). If it's the same, the difference is 0, the derivative is 0, and the function is O(1).

This is why ALL constant functions are O(1), no matter how many operations they have.

_________________
Previously known as The_Grey_Beast
Post 03 Sep 2009, 19:46
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
In my example the parameter is how many additions you are doing..
Post 03 Sep 2009, 19:48
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Then it is O(N) Wink

O(1) means that it is O(1), no matter the parameter (number of additions in this case), which is false, because it isn't constant.

Bubble Sort is O(N^2), guess why.

_________________
Previously known as The_Grey_Beast
Post 03 Sep 2009, 19:49
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Because that algorithm must run N*N times!


If I have a SIMD instruction that can do up to 16 additions at once, then it is running 1 time, as long as 16 or less additions need done, where as non-SIMD way would need ran 16 times to do 16 additions, or 2 times to do 2 additions, or whatever.....
Post 03 Sep 2009, 19:51
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17467
Location: In your JS exploiting you and your system
revolution
Azu wrote:
In my example the parameter is how many additions you are doing..
Okay, so SIMD still does the same number of additions as non-SIMD. Doing them in parallel is an implementation thing, it has nothing to do with mathematical complexity. In my simple example on the previous page the complexity is still 4 whether in parallel or series, it makes no difference to the complexity value.
Post 03 Sep 2009, 19:52
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
@Azu: More correct would be to say that, because as you increase N, the amount of "steps" (let's keep it at abstract mathematical level) required increases by N^2. Or in other words, doubling N will quadruple the number of steps required.

Azu, it is about the number of "steps" needed. Doing them in parallel doesn't change anything, there's still 2 steps whether you do both at the same time, or whether you do them twice as fast.

_________________
Previously known as The_Grey_Beast
Post 03 Sep 2009, 19:52
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
No no no

The SIMD instruction will do the same amount of work whether you utilize the max amount of inputs it can work on at once or just give it 1

It's just that the excess will be wasted


If it can do up to 16 at once and you give it say 3, it will be the same as if you give it 16.. it will take same amount of time, use same amount of resources, same complexity and everything. Different input but the complexity is constant.
Post 03 Sep 2009, 19:56
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
O(N) is theoretical mathematical representation -- there is no such thing as "waste" in math! Wink

But that's irrelevant anyway. O(N) is asymptotical analyzation. This means as N tends to infinity you take the limit (and derivative). It's the mathematical term for how it "tends".

You know, with limits (lim x->oo, stuff like that), which is x^2 for N^2, and 1 for O(1), not even using x! (no parameter change after all)

Asymptotes
Post 03 Sep 2009, 23:13
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
What I mean is, SIMD instructions do the same amount of work whether you are making full use of how much they can do or not. It doesn't vary based on how much work you need done. It can either handle however many operations you need or not. Hence O(1) whenever one of them is enough.
Post 03 Sep 2009, 23:17
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Yes you are correct, but that has nothing to do with Big-O notation. See my post above why Smile

_________________
Previously known as The_Grey_Beast
Post 04 Sep 2009, 01:17
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Okay.. so what notation should I use instead?

I just want to express how many times an algorithm needs ran based on how much stuff it works on.


Like this;

loopy:
mov al,byte[edi]
add al,byte[esi]
sub ecx,1
mov byte[edi],al
lea edi,[edi+1]
lea esi,[esi+1]
je loopy

Needs ran ecx times where as this;

movdqa xmm0,dqword[esi]
movdqa xmm1,dqword[edi]
paddb xmm0,xmm1
movdqa dqword[edi],xmm0

Needs ran 1 time..


What is the correct notation that I should use to compare these?
Post 04 Sep 2009, 01:21
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17467
Location: In your JS exploiting you and your system
revolution
AFAIK there is no notation for such a thing. Mathematicians are generally not interested in linear speed-ups so I doubt they have developed any special notation for it (if someone knows otherwise I would be happy to learn). Only computer programmers seem interested in linear speed-ups, so perhaps the measure of runtime, or clock counts, or instruction count, or byte count, or something, is the best you can get.
Post 04 Sep 2009, 05:04
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Linear speadups? I'm talking about how many times you have to run the algorithm when you increase the number of inputs to it. That has absolutely nothing to do with runtime, clock counts, or instruction count. Look at my last example (before you ask, it's in the post right above yours).
Post 04 Sep 2009, 07:24
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17467
Location: In your JS exploiting you and your system
revolution
Azu wrote:
Linear speadups? I'm talking about how many times you have to run the algorithm when you increase the number of inputs to it. That has absolutely nothing to do with runtime, clock counts, or instruction count. Look at my last example (before you ask, it's in the post right above yours).
Your example is purely a linear speed relation. Double the number of inputs and you double the number of required operations. For specific cases of a specific input size then just divide by the loop size and round up.
Post 04 Sep 2009, 07:33
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
No.. the required operations is constant as long as less than 17 inputs. Actually, it won't even work for >16 inputs, since I didn't put in any logic to handle that... so it's going to be constant no matter how many, lol. Where as the number of times the non-SIMD one will run is exactly how many inputs.

What is the notation for this? I think that, like I said, it should be "O(n)" for the non-SIMD one, and O(1) for the SIMD one if less than 17 inputs. Or
x=16;how many it can do at once
O(ceil(n/x))
to handle cases all the way up to infinitely, like I said a few pages back. (okay actually I changed it a little just now to make it more compact, but same result)

You all keep saying that I am using the wrong notation so please tell me what is the right one to use for this.
Post 04 Sep 2009, 07:36
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17467
Location: In your JS exploiting you and your system
revolution
Okay, so you have 16 operations per loop. That means:

0=1 loops
1-16=1 loop
17-32=2 loops
33-48=3 loops
etc.

That is how linear works.
Post 04 Sep 2009, 07:46
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
For 0 through 16 it is 1 "loop" (not really a loop when it is just ran once though), like I said. Hence the "when N<17" qualifier I added to all my posts.


Last edited by Azu on 04 Sep 2009, 07:48; edited 1 time in total
Post 04 Sep 2009, 07:47
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17467
Location: In your JS exploiting you and your system
revolution
Azu wrote:
You all keep saying that I am using the wrong notation so please tell me what is the right one to use for this.
Azu, please don't keep asking the same question over and over. I have already answered you above. If someone else has a better answer then I am sure they will see your question and will answer if they feel inclined.
Post 04 Sep 2009, 07:48
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
revolution wrote:
Azu wrote:
You all keep saying that I am using the wrong notation so please tell me what is the right one to use for this.
Azu, please don't keep asking the same question over and over. I have already answered you above. If someone else has a better answer then I am sure they will see your question and will answer if they feel inclined.
You answered a question I never asked. I'm asking for the one for how many times the whole thing needs ran/looped/whatever based on the number of inputs. The "linear speed-up" you mentioned would be a comparison between these
Code:
;func 1
loopy:
loop loopy
;func 2
loopy:
sub ecx,1
je loopy    
Post 04 Sep 2009, 07:49
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.