flat assembler
Message board for the users of flat assembler.

Index > Main > Expertise needed to convert C to assembly

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
PeExecutable



Joined: 26 Jun 2015
Posts: 181
PeExecutable
I need help creating a very fast 'FASM' variant of this code (in x86) that tests whether points are inside a polygon or not. I basically need a very fast variant of this code in FASM so I can store it in my library, my precious library.

Here is the code:

Quote:

// Globals which should be set before calling these functions:
//
// int polyCorners = how many corners the polygon has (no repeats)
// float polyX[] = horizontal coordinates of corners
// float polyY[] = vertical coordinates of corners
// float x, y = point to be tested
//
// The following global arrays should be allocated before calling these functions:
//
// float constant[] = storage for precalculated constants (same size as polyX)
// float multiple[] = storage for precalculated multipliers (same size as polyX)
//
// (Globals are used in this example for purposes of speed. Change as
// desired.)
//
// USAGE:
// Call precalc_values() to initialize the constant[] and multiple[] arrays,
// then call pointInPolygon(x, y) to determine if the point is in the polygon.
//
// The function will return YES if the point x,y is inside the polygon, or
// NO if it is not. If the point is exactly on the edge of the polygon,
// then the function may return YES or NO.
//
// Note that division by zero is avoided because the division is protected
// by the "if" clause which surrounds it.

void precalc_values() {

int i, j=polyCorners-1 ;

for(i=0; i<polyCorners; i++) {
if(polyY[j]==polyY[i]) {
constant[i]=polyX[i];
multiple[i]=0; }
else {
constant[i]=polyX[i]-(polyY[i]*polyX[j])/(polyY[j]-polyY[i])+(polyY[i]*polyX[i])/(polyY[j]-polyY[i]);
multiple[i]=(polyX[j]-polyX[i])/(polyY[j]-polyY[i]); }
j=i; }}

bool pointInPolygon() {

int i, j=polyCorners-1 ;
bool oddNodes=NO ;

for (i=0; i<polyCorners; i++) {
if ((polyY[i]< y && polyY[j]>=y
|| polyY[j]< y && polyY[i]>=y)) {
oddNodes^=(y*multiple[i]+constant[i]<x); }
j=i; }

return oddNodes; }


I found that code over there Arrow http://alienryderflex.com/polygon/.

If there are anyone who want to help me convert this into a very fast variant for FASM (and optimize it to the best it can be optimized). Very Happy

This is some piece of code that is very useful to have around, and can benefit other people too on the forum. (Help greatly appreciated)
Post 29 Jul 2015, 14:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17714
Location: In your JS exploiting you and your system
revolution
PeExecutable wrote:
Quote:
// Note that division by zero is avoided because the division is protected
// by the "if" clause which surrounds it.
But that doesn't mean it never overflows.
Post 29 Jul 2015, 15:58
View user's profile Send private message Visit poster's website Reply with quote
PeExecutable



Joined: 26 Jun 2015
Posts: 181
PeExecutable
It's basically a simple algorithm, it looks messed up because I picked the one which is optimized. The algorithm is quite simple in its original form. Help would be appreciated. (Not just for people who understand C but for the optimization guru know-how to get it running fast, optimization isn't my strongest side) It's one of those functions that is part of a performance critical path Shocked

I could have translated it into fasm first and then take it from there but, based on experience, it's better to let the guru figure out the basic assembly structure before converting it. (That is why I put the C in there), or to put it in a different way, chances are very high he have to toss the asm code and create his own from scratch. In optimization terms, you often have to get things right from the start.

I found a different example and they seem to be the same kind, at least the function names are the same:

Code:
 //check the intersections
if (((polyK.Y > currentPoint.Y) != (polyJ.Y > currentPoint.Y)) &&
     (currentPoint.X < (polyJ.X - polyK.X) * (currentPoint.Y - polyK.Y) / (polyJ.Y - polyK.Y) + polyK.X)) 
    


That was taken from here http://www.codeproject.com/Tips/626992/Check-if-a-Point-is-Inside-the-Polygon-Using-Ray-C
Post 29 Jul 2015, 16:34
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
Hahahahahahahahahahahahaha.

What of it when the God Of C and Assembly seeketh help from mortals?

Divine hilarity.

------------

But seriously, ever heard of the phrase "crawl, walk, run"?

Show us what you have first.
Post 29 Jul 2015, 20:05
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17714
Location: In your JS exploiting you and your system
revolution
typedef wrote:
What of it when the God Of C and Assembly seeketh help from mortals?
Optimisation is hard so I am not surprised here. But I do think the OP needs to state the required metrics before anyone can take on the task. The CPU/mobo system specs would be required before we can know how to optimise. Plus, just putting this single routine is isolation won't be enough. We would need to know the input data patterns, cache utilisation and the other surrounding code also. Without proper performance monitoring of the whole program it could only be a best guess at the most. And since it is using global variables I assume this can only be a single threaded task so already the code is hindered by an arbitrary restriction that instead of giving it "speed" is actually slowing it down.
Post 30 Jul 2015, 06:32
View user's profile Send private message Visit poster's website Reply with quote
PeExecutable



Joined: 26 Jun 2015
Posts: 181
PeExecutable
It's hard indeed Very Happy ( typedef, i'm still walking in terms of optimization, If I ask for help does not mean I've stopped walking )

I really enjoy watching the work of people who know more about optimization than myself, it's very educative too.

The specs Exclamation ( General x86 use, no special feature, the speed gains should span equally on all processors/MOBO in the modern x86 family processors, but don't hesitate to use extra features if the gain is simply too great to avoid it, I can alter it later to fit the more "general-way")

But, its for general use in the x86 family of processors. But don't feel any pressure on this, it's just a silly function I found, don't feel obliged to anything. But if you want to do it, thats a very appreciative thing Very Happy
Post 30 Jul 2015, 07:35
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17714
Location: In your JS exploiting you and your system
revolution
What is a general x86? They are all different and have differing clock speeds, instruction throughputs and latencies, core counts, cache sizes/speeds, memory bus differences, etc. Did you want 32-bit or 64-bit (I hope not 16-bit)?
Post 30 Jul 2015, 08:50
View user's profile Send private message Visit poster's website Reply with quote
PeExecutable



Joined: 26 Jun 2015
Posts: 181
PeExecutable
If you can find the average in modern x86 family of processors (Not as in 'general x86', but the 'family' of x86 processors, as long as you don't go below a 386 that would be fine), and 32-bit.

Cache memory, just use an 8 way set associative cache with the assumption every cache line is 64 bytes, the total size is irrelevant when dealing with repetitive functions, as in the data that is being used in the program can't be reduced or increased, so you just have to stick with whatever he has got.

I hope this is enough info. Good luck, and thanks if you go ahead with it. Very Happy
Post 30 Jul 2015, 11:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17714
Location: In your JS exploiting you and your system
revolution
Your definition of 'average x86' is not clear. The set of x86 CPUs changes frequently. Which subset are you targeting? Everything above 80386 would mean no FPU and all your floats have to use integer routines/libraries.

BTW: This is why optimising is hard. Because each CPU/RAM/mobo combo requires its own set of rules for optimising. And each program (not an isolated function) requires its own tuning and performance tests.
Post 30 Jul 2015, 11:39
View user's profile Send private message Visit poster's website Reply with quote
PeExecutable



Joined: 26 Jun 2015
Posts: 181
PeExecutable
If you need to use an extended feature, just use it Very Happy

Use FPU if you need to, preferably not simd instructions, unless it is SSE2 or lower.


Last edited by PeExecutable on 30 Jul 2015, 11:46; edited 1 time in total
Post 30 Jul 2015, 11:44
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 365
Location: Australia
redsock
So we are supposed to do this for sport? fun? challenge? I don't speak for the rest of the regulars around here, but I can say that I am not a pimply-faced teenager looking for ways to amuse my ego.

I very much agree with revo's prior comments that much more detailed information is necessary to form a starting baseline and that generic x86 may be enough, depending on what that baseline is.

In either case, "will you work for me for free for the fun of it" doesn't cut it in my book.

Hmmm, maybe I am freshly grumpy though, so I apologise in advance Smile

Controlled environment, concise parameters, and a reward and I think you'd get countless lurkers here to sort you out with what you want.

My $0.02 thusly Smile

Cheers
Post 30 Jul 2015, 11:46
View user's profile Send private message Reply with quote
PeExecutable



Joined: 26 Jun 2015
Posts: 181
PeExecutable
The thread continued in an inquisitive way so I assumed there were an interest in helping. I don't know if that's "free work", to me it looks like volunteers. There is a lot of skepticism, but some of it crosses a separation line into what seems to be paranoia. The choice is easy, is it interesting, take part, if not, don't comment. Laughing
Post 30 Jul 2015, 11:53
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 365
Location: Australia
redsock
PeExecutable wrote:
I need help creating a very fast 'FASM' variant of [snip] (Help greatly appreciated)
Need is an interesting choice of words here. It would be awesome if all of you volunteered your time for free to help me make my own code faster variants of my own extensive C/C++ libraries.

Perhaps what you missed is the why you need it.

I do not mean to be condescending, but your original sentiment of needing help without specifying the reason why is a rather important distinction IMO.

I would much rather ask the question to you: how bad do you need it?

Smile

Cheers as usual
Post 30 Jul 2015, 12:00
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17714
Location: In your JS exploiting you and your system
revolution
I think at a bare minimum the OP should provide a test-bed where potential contributors can plug in their uber-optimised function and test the performance on their system. At least then everyone has some way to do basic comparisons. Then they can get a feel for the data set sizes and tune caching and data streaming from the given test set.

But single threading is old-school now and perhaps you could allow the data to be local and make use of the extra cores. You could get a fourfold or more increase almost for free with just that one change.
Post 30 Jul 2015, 12:00
View user's profile Send private message Visit poster's website Reply with quote
PeExecutable



Joined: 26 Jun 2015
Posts: 181
PeExecutable
Just trust me, it's not for evil, it's for good. Prove your worth, it's good sport too.
Post 30 Jul 2015, 12:12
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 365
Location: Australia
redsock
PeExecutable wrote:
Just trust me, it's not for evil, it's for good. Prove your worth, it's good sport too.
Revolution: you willing to moderate it? For sport I'll happily participate with a "generic x64" variant against any C/C++ compiler, but we'll need some kind of moderated dataset, input parameters, and baseline, and I have no intention of sharing the code with anyone other than the moderator, who can then share results publicly and without bias.

Hmmmphf I say. PeExecutable: what you are asking for is a part of my own business model, as well as countless other machine-level optimiser experts. I'll have a go for public sport, but for public benefit not yours.

Smile Challenge accepted, with a few provisos.
Post 30 Jul 2015, 12:17
View user's profile Send private message Reply with quote
PeExecutable



Joined: 26 Jun 2015
Posts: 181
PeExecutable
I appreciate that, but I would also appreciate if you made it in 32-bit so I can use the code. (But if you have been too much into 64-bit and have lost some of the skills in the 32-bit world, I understand it if thats the case)
Post 30 Jul 2015, 12:20
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17714
Location: In your JS exploiting you and your system
revolution
redsock: I am willing to act as an intermediary. But for test results we would have to limit our 'average CPU' to the AMD E350 I am currently travelling with. I can run 32-bit and 64-bit Windows code. But I would prefer not to receive bare binaries, fasm source code is best. Naturally I would provide complete confidentiality for anyone that wants it.
Post 30 Jul 2015, 12:29
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 365
Location: Australia
redsock
LOL (haha, is that permitted here? it feels so... wrong...)

Ok, Mr. PeExecutable, some input data and test parameters are required. Whatever techniques I employ, I will be sure to make runnable on our awesome moderator's hardware and we can go from there.

Disuse of my 32bit register starvation nightmare aside, I will take your input code in C/C++ on my native platform (x64), and happily write fasm code to circle around our C/C++ compilers.

WITHIN REASON. If the test/saddle code is ridiculous, then I should say our moderator will set me free.

Otherwise, game on, and perhaps more entertaining for all of us if more than just me participates. I am quite certain there are other coders here lurking that are better than I.
Post 30 Jul 2015, 12:42
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17714
Location: In your JS exploiting you and your system
revolution
I should probably add that there is an instance of MingW on this machine. I am not expert at operating it but I'm sure with some experimentation I could also compile some C source and link with an assembly file.
Post 30 Jul 2015, 12:54
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.