flat assembler
Message board for the users of flat assembler.
Index
> Main > Expertise needed to convert C to assembly Goto page 1, 2 Next |
Author |
|
PeExecutable 29 Jul 2015, 14:43
I need help creating a very fast 'FASM' variant of this code (in x86) that tests whether points are inside a polygon or not. I basically need a very fast variant of this code in FASM so I can store it in my library, my precious library.
Here is the code: Quote:
I found that code over there http://alienryderflex.com/polygon/. If there are anyone who want to help me convert this into a very fast variant for FASM (and optimize it to the best it can be optimized). This is some piece of code that is very useful to have around, and can benefit other people too on the forum. (Help greatly appreciated) |
|||
29 Jul 2015, 14:43 |
|
PeExecutable 29 Jul 2015, 16:34
It's basically a simple algorithm, it looks messed up because I picked the one which is optimized. The algorithm is quite simple in its original form. Help would be appreciated. (Not just for people who understand C but for the optimization guru know-how to get it running fast, optimization isn't my strongest side) It's one of those functions that is part of a performance critical path
I could have translated it into fasm first and then take it from there but, based on experience, it's better to let the guru figure out the basic assembly structure before converting it. (That is why I put the C in there), or to put it in a different way, chances are very high he have to toss the asm code and create his own from scratch. In optimization terms, you often have to get things right from the start. I found a different example and they seem to be the same kind, at least the function names are the same: Code: //check the intersections if (((polyK.Y > currentPoint.Y) != (polyJ.Y > currentPoint.Y)) && (currentPoint.X < (polyJ.X - polyK.X) * (currentPoint.Y - polyK.Y) / (polyJ.Y - polyK.Y) + polyK.X)) That was taken from here http://www.codeproject.com/Tips/626992/Check-if-a-Point-is-Inside-the-Polygon-Using-Ray-C |
|||
29 Jul 2015, 16:34 |
|
typedef 29 Jul 2015, 20:05
Hahahahahahahahahahahahaha.
What of it when the God Of C and Assembly seeketh help from mortals? Divine hilarity. ------------ But seriously, ever heard of the phrase "crawl, walk, run"? Show us what you have first. |
|||
29 Jul 2015, 20:05 |
|
revolution 30 Jul 2015, 06:32
typedef wrote: What of it when the God Of C and Assembly seeketh help from mortals? |
|||
30 Jul 2015, 06:32 |
|
PeExecutable 30 Jul 2015, 07:35
It's hard indeed ( typedef, i'm still walking in terms of optimization, If I ask for help does not mean I've stopped walking )
I really enjoy watching the work of people who know more about optimization than myself, it's very educative too. The specs ( General x86 use, no special feature, the speed gains should span equally on all processors/MOBO in the modern x86 family processors, but don't hesitate to use extra features if the gain is simply too great to avoid it, I can alter it later to fit the more "general-way") But, its for general use in the x86 family of processors. But don't feel any pressure on this, it's just a silly function I found, don't feel obliged to anything. But if you want to do it, thats a very appreciative thing |
|||
30 Jul 2015, 07:35 |
|
revolution 30 Jul 2015, 08:50
What is a general x86? They are all different and have differing clock speeds, instruction throughputs and latencies, core counts, cache sizes/speeds, memory bus differences, etc. Did you want 32-bit or 64-bit (I hope not 16-bit)?
|
|||
30 Jul 2015, 08:50 |
|
PeExecutable 30 Jul 2015, 11:24
If you can find the average in modern x86 family of processors (Not as in 'general x86', but the 'family' of x86 processors, as long as you don't go below a 386 that would be fine), and 32-bit.
Cache memory, just use an 8 way set associative cache with the assumption every cache line is 64 bytes, the total size is irrelevant when dealing with repetitive functions, as in the data that is being used in the program can't be reduced or increased, so you just have to stick with whatever he has got. I hope this is enough info. Good luck, and thanks if you go ahead with it. |
|||
30 Jul 2015, 11:24 |
|
revolution 30 Jul 2015, 11:39
Your definition of 'average x86' is not clear. The set of x86 CPUs changes frequently. Which subset are you targeting? Everything above 80386 would mean no FPU and all your floats have to use integer routines/libraries.
BTW: This is why optimising is hard. Because each CPU/RAM/mobo combo requires its own set of rules for optimising. And each program (not an isolated function) requires its own tuning and performance tests. |
|||
30 Jul 2015, 11:39 |
|
PeExecutable 30 Jul 2015, 11:44
If you need to use an extended feature, just use it
Use FPU if you need to, preferably not simd instructions, unless it is SSE2 or lower. Last edited by PeExecutable on 30 Jul 2015, 11:46; edited 1 time in total |
|||
30 Jul 2015, 11:44 |
|
redsock 30 Jul 2015, 11:46
So we are supposed to do this for sport? fun? challenge? I don't speak for the rest of the regulars around here, but I can say that I am not a pimply-faced teenager looking for ways to amuse my ego.
I very much agree with revo's prior comments that much more detailed information is necessary to form a starting baseline and that generic x86 may be enough, depending on what that baseline is. In either case, "will you work for me for free for the fun of it" doesn't cut it in my book. Hmmm, maybe I am freshly grumpy though, so I apologise in advance Controlled environment, concise parameters, and a reward and I think you'd get countless lurkers here to sort you out with what you want. My $0.02 thusly Cheers |
|||
30 Jul 2015, 11:46 |
|
PeExecutable 30 Jul 2015, 11:53
The thread continued in an inquisitive way so I assumed there were an interest in helping. I don't know if that's "free work", to me it looks like volunteers. There is a lot of skepticism, but some of it crosses a separation line into what seems to be paranoia. The choice is easy, is it interesting, take part, if not, don't comment.
|
|||
30 Jul 2015, 11:53 |
|
redsock 30 Jul 2015, 12:00
PeExecutable wrote: I need help creating a very fast 'FASM' variant of [snip] (Help greatly appreciated) Perhaps what you missed is the why you need it. I do not mean to be condescending, but your original sentiment of needing help without specifying the reason why is a rather important distinction IMO. I would much rather ask the question to you: how bad do you need it? Cheers as usual |
|||
30 Jul 2015, 12:00 |
|
revolution 30 Jul 2015, 12:00
I think at a bare minimum the OP should provide a test-bed where potential contributors can plug in their uber-optimised function and test the performance on their system. At least then everyone has some way to do basic comparisons. Then they can get a feel for the data set sizes and tune caching and data streaming from the given test set.
But single threading is old-school now and perhaps you could allow the data to be local and make use of the extra cores. You could get a fourfold or more increase almost for free with just that one change. |
|||
30 Jul 2015, 12:00 |
|
PeExecutable 30 Jul 2015, 12:12
Just trust me, it's not for evil, it's for good. Prove your worth, it's good sport too.
|
|||
30 Jul 2015, 12:12 |
|
redsock 30 Jul 2015, 12:17
PeExecutable wrote: Just trust me, it's not for evil, it's for good. Prove your worth, it's good sport too. Hmmmphf I say. PeExecutable: what you are asking for is a part of my own business model, as well as countless other machine-level optimiser experts. I'll have a go for public sport, but for public benefit not yours. Challenge accepted, with a few provisos. |
|||
30 Jul 2015, 12:17 |
|
PeExecutable 30 Jul 2015, 12:20
I appreciate that, but I would also appreciate if you made it in 32-bit so I can use the code. (But if you have been too much into 64-bit and have lost some of the skills in the 32-bit world, I understand it if thats the case)
|
|||
30 Jul 2015, 12:20 |
|
revolution 30 Jul 2015, 12:29
redsock: I am willing to act as an intermediary. But for test results we would have to limit our 'average CPU' to the AMD E350 I am currently travelling with. I can run 32-bit and 64-bit Windows code. But I would prefer not to receive bare binaries, fasm source code is best. Naturally I would provide complete confidentiality for anyone that wants it.
|
|||
30 Jul 2015, 12:29 |
|
redsock 30 Jul 2015, 12:42
LOL (haha, is that permitted here? it feels so... wrong...)
Ok, Mr. PeExecutable, some input data and test parameters are required. Whatever techniques I employ, I will be sure to make runnable on our awesome moderator's hardware and we can go from there. Disuse of my 32bit register starvation nightmare aside, I will take your input code in C/C++ on my native platform (x64), and happily write fasm code to circle around our C/C++ compilers. WITHIN REASON. If the test/saddle code is ridiculous, then I should say our moderator will set me free. Otherwise, game on, and perhaps more entertaining for all of us if more than just me participates. I am quite certain there are other coders here lurking that are better than I. |
|||
30 Jul 2015, 12:42 |
|
revolution 30 Jul 2015, 12:54
I should probably add that there is an instance of MingW on this machine. I am not expert at operating it but I'm sure with some experimentation I could also compile some C source and link with an assembly file.
|
|||
30 Jul 2015, 12:54 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.