flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Main > Finding max clique size and C++ is better than asm.

Goto page 1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Finding max clique size and C++ is better than asm.
So it comes up a lot here that there's no good reason to write in HLLs because you can just write it in assembly anyway, so I thought I drop this mindfuck and see if any of you guys could translate it to assembly. (Or feel free to write an equivalent algorithm from scratch if you wish.)

https://ideone.com/Os2Gwk

The algorithm's functionality is well documented in the code. I'll send .15 BTC (~60USD) to author of the fastest implementation posted within two weeks (by 11:59PM UTC on May 1), subject to the following rules:

* You must beat my C++ in terms of time by at least 5%. My time is about 3.5s (edit: I read the number wrong and was off by an OOM. It's actually .35s) for the sample data posted.

* You will be tested against my code compiled with Clang++ 3.6.2-3 with the options: "-O3 -flto -march=native -std=c++11."

* The code will be tested on my computer, which has a Core i7 3610QM. Feel free to super optimize with some AVX and stuff. (I don't think it's possible to vectorize by much.)

* The test apparatus is here: https://ideone.com/j9jh8J

* You cannot reduce the complexity of the problem. I know part of the benefit of asm is avoiding HLL constructs, so I'll be flexible if you ask. To give an example of a solution that would be rejected because it "reduced the complexity," you can not just print the results from hard coded constants. I have other test data that I will test your code with and it must give the same output as mine.

* Your "max_clique" function must be use System V AMD64 ABI. It must be of the same signature as mine, except: you can accept a pointer to "connects" for the last argument, in place of "adj," if you want.

* The only symbol you may (and must) export is "max_clique." It should be exported in a way that can link with C (not C++, that's too hard) (under AMD64 ABI, again).

* Run time data will be collected with the Unix time command. 21 samples will be collected for each submission and the median will be used for comparison against others.

* Submissions which yield invalid results will be disqualified.

* If you're serious about wanting to win the prize, I suggest you PM your solution so as to avoid being one upped by an insignificant change. (I'm not going to play intellectual property police and pick which of two submitters of almost the same code actually owns the code. Whoever wins by the time and by the rules wins, regardless of "theft.")

* The rules are subject to change for the next 4 days. This is to leave room for changes you guys request and I find reasonable.

I will announce rankings when I get around to running them. I have finals next week, so it may be the week after the deadline. I'm PMing revolution my email to contact me if I totally forget to come back (which is unlikely, I've been visiting more frequently lately). EDIT: I sent it to Tomasz instead, since I doubt revolution would want to deal with trying to email me without exposing his identity or something.

May the force be with you.


Last edited by Tyler on 18 Apr 2016, 13:35; edited 1 time in total
Post 17 Apr 2016, 16:07
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
BTW, "my code" refers to the C++ implementation, not an assembly translation that I wrote (I wrote no such translation). I wrote the C++ a few years ago for a few math professors doing research in Ramsey Theory (subfield of graph theory, basically).
Post 17 Apr 2016, 23:38
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 262
Location: Australia
Just a quick tip for anyone who bothers Wink The C++ compiler will happily turn the max_clique function into its constant property variants. What this means is that the initial link to the source w/ the main function is not a valid timing reference. The compiler doesn't get to use such trickery if it ALSO must remain ABI compliant (which it isn't in that case).

_________________
2 Ton Digital - https://2ton.com.au/
Post 18 Apr 2016, 03:54
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15096
Location: The Unicomplex
What is a clique in this context? What specifically does the code look for? I am hard of reading when it comes to C++, I couldn't see any description of an algorithm, instead I saw lots of comments about the code execution and internal structuring of some data, but nothing to say what it is trying to achieve.

Tyler wrote:
EDIT: I sent [the email] to Tomasz instead, since I doubt revolution would want to deal with trying to email me without exposing his identity or something.

Actually no problem. My email doesn't reveal anything about me. But thanks for thinking of me. Smile
Post 18 Apr 2016, 10:10
View user's profile Send private message Visit poster's website Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
revolution, a clique is a subgraph which is "complete." "Complete" meaning that each node in the subgraph is connected to each other node. The purpose of the code is to find the size of the largest clique.

Those 2d arrays are known as adjacency matrices. connects[I][j] is true iff node i is adjacent (connected) to node j. Your goal is to look for the largest subset of nodes which are all connected to each other.

It's an NP complete problem. One of the original 21, actually.

redsock, would it be more fair if I compiled the max_clique C++ as another module, another .o, and then linked it to main? This would replicate the process which I will use with submissions. And, of course, I'll turn LTO off.
Post 18 Apr 2016, 11:05
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15096
Location: The Unicomplex
Further on the email thing: You can tick the box "Notify me when a reply is posted". That way instead of waiting for someone to remember to email you, you can have the board automatically email you if someone replies. That would also reduce the workload of the mods here, they could easily forget about the arbitrary deadlines that were stated. Razz
Post 18 Apr 2016, 13:37
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 6835
Location: ˛                              ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣ Posts: 6699
this looks interesting, but too bad, i don't even got the slight idea how it works,
would love to understand if it doesn't requires heavy maths.
Post 18 Apr 2016, 16:06
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2624
Location: dank orb
Checking each subset is NOT efficient, nor is the data storage method. Just thinking in my head, but imagine an algorithm which is O(n ln n ln n) on average (assuming a random graph). Some reference(s) for others:

https://en.wikipedia.org/wiki/Clique_problem

The C++ code seems to be a textbook description of the problem - with no optimization. Since we aren't listing all the clique's a polynomial solution is possible?
Post 18 Apr 2016, 17:27
View user's profile Send private message Visit poster's website Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
sleepsleep, the concepts of the math are pretty easy. Graphs, in this area of math, are not the kind of graph you're probably thinking of. It's not a plot of a function. It's a set of "nodes" and "edges." Nodes are like computers on a network and edges are like network connections between them. The problem the algorithm is solving is to find the largest group of computers that all have a direct connection to each other.
Post 18 Apr 2016, 18:11
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA

bitRAKE wrote:
Checking each subset is NOT efficient, nor is the data storage method. Just thinking in my head, but imagine an algorithm which is O(n ln n ln n) on average (assuming a random graph). Some reference(s) for others:

https://en.wikipedia.org/wiki/Clique_problem

The C++ code seems to be a textbook description of the problem - with no optimization. Since we aren't listing all the clique's a polynomial solution is possible?

The optimization is only in how the C++ code is written, expressing a recursive algorithm with iterative control structures and manually managing a stack. A truely textbook implementation would have you copy a vector/set/subgraph-representation with every recursive call.

And yeah, I could bit pack the adjacency matrix. Not a bad idea. Thanks.

I'll allow bit packing if anyone wants to use it to their solution, but I will also modify the C++ to use it. (And in general, I will reserve the right to change the C++ code. The idea isn't to beat me or my C++ code, it's to beat C++ with assembly. Unless this really pisses people off, then I might be willing to commit to one version.)

There may be some theoretical optimizations, but there aren't any better than exponential. The best you could do is sort the nodes to get better average or best case. You might also be able to improve branch prediction. http://stackoverflow.com/questions/3004193/clique-number-of-a-graph
Post 18 Apr 2016, 18:34
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 6835
Location: ˛                              ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣ Posts: 6699

Tyler wrote:
sleepsleep, the concepts of the math are pretty easy. Graphs, in this area of math, are not the kind of graph you're probably thinking of. It's not a plot of a function. It's a set of "nodes" and "edges." Nodes are like computers on a network and edges are like network connections between them. The problem the algorithm is solving is to find the largest group of computers that all have a direct connection to each other.



please bear with my questions,

i am trying to relate this problem with my understanding,

assume each node is pc with webserver and webclient, can receive connections, and can open connections to other node, (am i correct?)

so,
pc1 listen 5 conns, open A conns,

pc2 listen 8 conns, open B conns,

pc3 listen 9 conns, open C conns,

pc4 listen ? conns, open D conns to complete the whole connections.

the direct connection to each other seems kinda blur,

or you mean, pc4 is most connected to everybody if,
open 3 conns to pc1,
open 5 conns to pc2,
open 5 conns to pc3

so, even if pc2 open 20 conns, but let say pc2 open 10 conns to pc1, 10 conns to pc2, 0 conns to pc3 and pc4,

pc4 still the most connected pc to everybody?

or each pc only can open 1 conns?
Post 18 Apr 2016, 19:15
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 6835
Location: ˛                              ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣ Posts: 6699
or it is like facebook circles,

assume in a group of 100 person,
some friend with 10, some 20, some 30,

the one that everybody friend would be 100 friends, am i right?
so we want to find who got 100 friends? looping each human one by one to get their friend counter?
Post 18 Apr 2016, 19:29
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Say you have people A,B,C,D,E, and F.

A is friends with B, C, and D.
B is friends with C, D, and F.
C is friends with D and E.
D is friends with F.
E is friends with F.

Just a note: Being friends goes both ways. Friendship is what we call a "symmetric relation." If A is friends with B, then B is also friends with A. And vice versa.

In this case, the "maximum clique" is A, B, C, and D. They are all friends with each other.

An example of a group that is not a clique is B, C, D, and E. D is not fiends with E. For it to be a clique, every member of the group has to be friends with every other member of the group.

You can use the code below to visualize this graph here: http://sandbox.kidstrythisathome.com/erdos/


Code:

graph {
// A is friends with BCand D.
A--B;
A--C;
A--D;

// B is friends with CDand F.
B--C;
B--D
B--F;

// C is friends with D and E.
C--D;
C--E;

//D is friends with F
D--F;

//E is friends with F
E--F;
}


Post 18 Apr 2016, 20:04
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 262
Location: Australia

Tyler wrote:
...The idea isn't to beat me or my C++ code, it's to beat C++ with assembly...

This isn't difficult to achieve as a general rule, even for highly optimised C/C++. I have "hand compiled" vast amounts of code over the years, and without employing any insane optimisation techniques, the average is about a 20% performance gain. In my experience Clang vs. gcc is a coin toss. See https://2ton.com.au/library_as_html/zlib_deflate.inc.html as one of my public efforts in this regard that fits the general rule.

0.15BTC isn't worth the effort however, but for academic purposes I think blanket sentiments like yours re: C++ being better, it is a valuable one nonetheless.

I too have written a lot of C++ Wink

Tyler wrote:
redsock, would it be more fair if I compiled the max_clique C++ as another module, another .o, and then linked it to main? This would replicate the process which I will use with submissions. And, of course, I'll turn LTO off.

Amusingly, I did just that (turned your max_clique into a shared library along with fPIC), and g++ with LTO got it entirely wrong (optimization-wise). The shared object version actually performs faster than its LTO. The same thing can be witnessed with -O versus {2,3,etc}. Optimisation is not one-size-fits-all Wink

My point was only that the C++ code in your stated objective needs to be drop-in compatible with the comparative assembly language variants, and as you had/have it, it is not and thus isn't a valid timing reference.

Cheers

_________________
2 Ton Digital - https://2ton.com.au/
Post 18 Apr 2016, 20:41
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 262
Location: Australia
An additional $0.02 for you to consider:

Ideally, your test data/runtime should be long enough to completely eliminate power management issues. On my multicore machines for example, a "execute this process 20 times in a row" ends up being scheduled across 20 different CPU cores, none of which run at 100% long enough to kick the processors out of power management.

On my desktop machine here for example, your example only runs for 450ms, and even watching the "perf stat" output shows different CPU frequencies.

These sorts of things cause havoc if the only metric you are watching is the wall clock Smile

$0.02 thusly.

_________________
2 Ton Digital - https://2ton.com.au/
Post 18 Apr 2016, 21:03
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2624
Location: dank orb
Bit arrays are native in x86. So, bool adj(int n1, int n2) becomes:

Code:
bt [n1+adj.array],n2

Of course, all the addressing mode arithmetic allow various size arrays. The bit arrays also reduce adj_all to a couple logic operations? I'd loop through nodes finding the longest clique including that node (using logical operations).

https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation

...is useful to test permutations of a particular size.
Post 18 Apr 2016, 21:13
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 6835
Location: ˛                              ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣ Posts: 6699
thanks Tyler, much appreciate the example you gave,
i like the visual graph too!

Image
if we know the counter of relationship lines, maybe easier to check their relationship?

idk is the problem is to find the group that contained "maximum" member, that connect each other?

4 lines, B,C,D,
check
BC, ok
BD, ok
CD, ok

then test those who got (B,C,D)-1 lines, A, F
A,B ok
A,C ok
A,D ok

F,B x
F,C x
F,D ok

then we no need to test the rest,
Post 18 Apr 2016, 21:36
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 262
Location: Australia
Was just perusing your C++ code while discussing this with a friend of mine, and noticed lines 90/91 are invalid references when the index drops below zero. Thanks to your using signed integers, it isn't causing a segfault, but it is unsafe code anyway.

Might be better to add something like if (!index) return max_len and change the outer while loop to for (;Wink and remove the unreached return max_len at the bottom Wink

_________________
2 Ton Digital - https://2ton.com.au/
Post 18 Apr 2016, 23:41
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
redsock, thanks, but I already caught that and forgot to update the online version.


Code:

                while(!complete && index >= 0) {
                        while(!complete && ++stack[index] < num_nodes) {
                                // If we get hereno suitable neighbor could be found for the top node on the 
                                // stack.
                                // So we will permute the last neighbor on the stack.
                                complete = adj_all(stack[index], stackindexadj);
                        }

                        if(stack[index] >= num_nodes) {
                                // If we get herewe've tried all possibilities for top node on the stack and 
                                // none of them were complete with the rest of the stackso we need to pop the 
                                // node off the stack and start permuting the node below it.
                                index--; // Pop the stack.
                        }
                }




After index becomes negative, it will fall out of the loop the if/decrement-index is in and out of the outermost loop and return.

The loop index in the "down the tree" for loop is also suboptimal. It would be better to start with "i=start[index] + 1" and remove all the references to "start[index]+offset" is other places.

This was a rewrite that missed a few things. I had to rewrite it because the old version was completely incomprehensible.

What do you and your friend think? Am I an alright C/C++ guy? I think I'm above average, but I know average ain't so good either. I passed a technical interview on the subject, at least. Razz
Post 18 Apr 2016, 23:52
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
redsock,

Also, sorry about the insignificant prize. I'm about to graduate and move across the country. Money ain't cheap right now. Razz

I would pay more if the winning submission was going to help me in some way, but it pretty much isn't. I'm not doing this research anymore. It ended 2 years ago. And the C++ itself is more than enough for my resume. (Most new grads cant do recursion, let alone emulate it with loops and an array-stack.)
Post 19 Apr 2016, 00:43
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4  Next

< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2005 phpBB Group.

Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2016, Tomasz Grysztar.