flat assembler
Message board for the users of flat assembler.

Index > Windows > Disable SMT for my process?

Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
Azu wrote:
You answered a question I never asked.
Well sorry, I must have misunderstood your question. I was just trying to help.
Azu wrote:
I'm asking for the one for how many times it needs ran/looped/whatever as the number of inputs increases, not linear speedups.
Doesn't this answer your Q?
revolution wrote:
For specific cases of a specific input size then just divide by the loop size and round up.
If not, then I have no idea what you are asking for Sad
Post 04 Sep 2009, 07:53
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
Let me make a simpler example



Code:
;func 1
loopy:
lea ecx,[ecx]
lea ecx,[ecx]
lea ecx,[ecx]
lea ecx,[ecx]
lea ecx,[ecx]
lea ecx,[ecx]
lea ecx,[ecx]
lea ecx,[ecx]
loop loopy
;func 2
loopy:
sub ecx,1
je loopy    
I think that these are both O(n), even though the first one is a lot slower.




Code:
;func 1
loopy:
nop
loop loopy
;func 2
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
;func 3
mov eax,ecx
loopyA:
loopyB:
mov ecx,eax
nop
loop loopyB
sub eax,1
je loopyA    
I think that the first one is O(n) and the second one is O(1), and the third one is O(log(n)).. what I am asking is, since you all assert I am using the wrong notation for this, it must be true. But.. my question is.. what then is the right notation to use for this stuff?


Last edited by Azu on 04 Sep 2009, 08:02; edited 1 time in total
Post 04 Sep 2009, 07:55
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
Well you asked the same question again (3rd time now), and rather than try to answer in yet another different way I just want to know if this is even remotely an answer to what you want to know:
revolution wrote:
AFAIK there is no notation for such a thing. Mathematicians are generally not interested in linear speed-ups so I doubt they have developed any special notation for it (if someone knows otherwise I would be happy to learn). Only computer programmers seem interested in linear speed-ups, so perhaps the measure of runtime, or clock counts, or instruction count, or byte count, or something, is the best you can get.
Post 04 Sep 2009, 08:01
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
I'm not asking about linear speed-ups. I'm asking about how many times you need to run the whole algorithm as n (ecx) changes. Look at the examples I gave.. how can you say their differences are linear?


P.S. here is an article that uses big-O the same way I do! http://en.wikipedia.org/wiki/Bubble_sort

You should fix it, then less people will be confused. Because that is how I learned the big-O notation. From that page, and a bunch of other pages they have about different sorting algorithms. Confused
Post 04 Sep 2009, 08:03
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
Sorry, Azu your question is very vague and seems to change each time. You first wanted to argue that SIMD changes O(n) into O(1) in certain cases. It was shown to you that this was wrong and you agreed. Now you seem to be asking what is the notation for the algorithm (not the implementation) in which case the O(something) is the proper notation. But you cannot confuse all this with SIMD or 64 vs 32 bit registers or something. Your previous example with a 16-way SIMD vs 1-way loop showed linear speed up, so I answered based upon that. But using SIMD is not an algorithmic change.
Post 04 Sep 2009, 08:12
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
???

Yes it was

If you gave it 0 or 1 inputs, it would have been slower in relation to the non-SIMD way posted. 2 to 16 would have been faster in relation to it. All 0-16 would run the exact same amount of stuff, once. No loop or anything. How is that linear speedup? Why isn't it O(1)? And when did I agree to any kind of wrongness about what I said in this thread? I made a different example because you asked for another one, not because I disagreed with the SIMD vs non-SIMD ones I posted. (damn, you're going to quote those last 10 words out of context, aren't you? Well I don't feel like re-wording it sorry, so I'll just leave this note here to call you on it if you do)
Post 04 Sep 2009, 08:16
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
The big-O notation is not a measure of speed. O(1) is not necessarily faster than O(N), the speed depends upon the implementation details. SIMD has nothing to do with O(anything), it only confuses the issue.

1. Big-O measures algorithmic performance.
2. SIMD give you linear speed-up.

Those two points above are completely unrelated to each other and neither affects the other.
Post 04 Sep 2009, 08:21
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
I am not wanting to measure speed. I have mentioned earlier, and just now, that the SIMD way is slower in certain situations. But algebraically.. it only is ran one time as long as n<17.. so in that situation I think it should be O(1), where as the non-SIMD way is always O(n).. since it has to run as many times as there are inputs no matter what.. regardless of which is faster, of how much RAM they take, of how many clock cycles they complete it in, or any of that other linear stuff you keep bringing up.

I just don't understand how changing the algorithm from a loop to a single constant block of code ran once, is a linear speedup and not an algebraic one. Can you please explain to me why this is instead of just saying it is?
Post 04 Sep 2009, 08:24
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
Azu: I don't know how to word any differently than the previous messages. So I will try to ask some basic questions to see where your understanding is going wrong.

Do you understand that SIMD does not change the algorithm?
Do you understand that big-O does not measure implementation speed?
Post 04 Sep 2009, 08:32
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
No and yes, respectively.



I think that this algorithm;
Azu wrote:
Code:
loopy: 
mov al,byte[edi] 
add al,byte[esi] 
sub ecx,1 
mov byte[edi],al 
lea edi,[edi+1] 
lea esi,[esi+1] 
je loopy     
Needs ran n times (okay, I should have put a jecxz in there, but that's not the point..) where as this one

Azu wrote:
Code:
movdqa xmm0,dqword[esi] 
movdqa xmm1,dqword[edi] 
paddb xmm0,xmm1 
movdqa dqword[edi],xmm0    
Only needs ran once (as long as n<17)




I think that algorithmic efficiency = how many times it has to run based on how many inputs you give it and linear efficiency = how many clock cycles (or whatever) it takes to run
Post 04 Sep 2009, 08:35
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
In your example the algorithm is the same. The algorithm is to take a byte, add it to another byte, store the result. But the SIMD has not changed the algorithm, it merely parallelised it. It still takes a byte, add it to another byte, store the result, but doing 16 elements at a times. The algorithm describes what to do (adding bytes) - not how to do it, or what order to do it in.
Post 04 Sep 2009, 08:40
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
So these should all have the same big-O notation then? Since they "all the same thing just in different ways"???
Azu wrote:
Code:
;func 1 
loopy: 
nop 
loop loopy 
;func 2 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
;func 3 
mov eax,ecx 
loopyA: 
loopyB: 
mov ecx,eax 
nop 
loop loopyB 
sub eax,1 
je loopyA    





And these should have different ones? Since even though they run the same amount of times, they do different things?
Code:
;func 1
loopy:
inc eax
loop loopy
;func 2
loopy:
xor eax,eax
loop loopy    




I don't like this.. Sad
I want to have a notation that I can use for saying how many iterations something must make to handle how many inputs..
I thought big-O was the one but you have proved me wrong.. Sad
Post 04 Sep 2009, 08:42
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
Azu wrote:
So these should all have the same big-O notation then? Since they "all the same thing just in different ways"???
Azu wrote:
;func 1
loopy:
nop
loop loopy
;func 2
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
;func 3
mov eax,ecx
loopyA:
loopyB:
mov ecx,eax
nop
loop loopyB
sub eax,1
je loopyA
Well to be properly mathematically correct they ARE all the same because they are all O(0), i.e. they have no complexity at all because they do no work Razz

But you have shown three different algorithms. 1) Loop O(N), 2) Lookup (i assume?) O(1), 3) loop within loop O(N^2). But can you see that using SIMD in any of those three function would not change the algorithms.
Post 04 Sep 2009, 08:45
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
Oh and also, all sorting algorithms do the same thing in the end, just using different number of iterations, so.. they should all have the same big-O notation?


And jump tables should have the same big-O as looping through a list of numbers, comparing them one by one, and jumping when finding one that matches? Since it does the same thing in the end just with different number of loops?


I really really hate big-O now.. it is completely useless if it is how you describe...
Post 04 Sep 2009, 08:48
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
Azu wrote:
Oh and also, all sorting algorithms do the same thing in the end, just using different number of iterations, so.. they should all have the same big-O notation?
Is it not about the end result, it is about what operations you do to get to the end result. Some sorting algorithms are very bad, some are very good. The end result is the same, but getting there might mean a lot of detours if you choose the wrong algorithm.
Post 04 Sep 2009, 08:52
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
But my example of SIMD vs non-SIMD uses different operations. And different algorithms (one loops n times, one runs once).. but you said they have the same big-O notation since in the end they both do the same thing.. Confused
Post 04 Sep 2009, 08:54
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
Azu wrote:
But my example of SIMD vs non-SIMD uses different operations. And different algorithms (one loops n times, one runs once).. but you said they have the same big-O notation since in the end they both do the same thing.. Confused
They do the same thing, they both add bytes. But conceivably there could be another algorithm that achieves the same end results but instead of adding bytes it does a table lookup of all possible x+y combinations. Lookups are O(1) but clearly a 128bitx128bit table would not be practical and might even be slower if you had to store it on HDD.
Post 04 Sep 2009, 09:01
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
So shouldn't the SIMD way be O(1) until n>16 ( at which point it would be O(ceiling(n/16)) )?
Post 04 Sep 2009, 09:04
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7725
Location: Kraków, Poland
Tomasz Grysztar
revolution wrote:
Azu wrote:
But my example of SIMD vs non-SIMD uses different operations. And different algorithms (one loops n times, one runs once).. but you said they have the same big-O notation since in the end they both do the same thing.. Confused
They do the same thing, they both add bytes. But conceivably there could be another algorithm that achieves the same end results but instead of adding bytes it does a table lookup of all possible x+y combinations. Lookups are O(1) but clearly a 128bitx128bit table would not be practical and might even be slower if you had to store it on HDD.

You cannot talk about O() notation when clearly your data size is limited. O() notation is the asimptotic one, and therefore requires algorithms working on data of arbitrary size. And even if you use lookup tables of 128x128 bit size, you'd have to do this for every 128-bit part of the input and thus get O(N) in the end, anyway.

The same mistake Azu made, when he said "only needs ran once (as long as n<17)". Well, the point is, when you take assumption like "n<17", you can not say anything about O() complexity. The O() notation applies only to the case, when your algorithm is able to handle input of ANY size. And clearly, to process the hundred megabytes of data, you'd still have to run your SIMD instruction in a loop.
Post 04 Sep 2009, 09:06
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
Azu wrote:
So shouldn't the SIMD way be O(1) until n>16 ( at which point it would be O(ceiling(n/how many it does at once)) )?
No, SIMD is merely an implementation detail. Please don't make us go through this all again. I encourage you to do some independent reading.
Post 04 Sep 2009, 09:06
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.