flat assembler
Message board for the users of flat assembler.

Index > Linux > Multithreaded Quaternion Julia Sets renderer

Goto page Previous  1, 2, 3  Next
Author
Thread Post new topic Reply to topic
comrade



Joined: 16 Jun 2003
Posts: 1137
Location: Russian Federation
comrade
randall wrote:
Yes, but it requires SSE 4 support. I have old Core2 Duo at home so only SSSE3.


An interesting project (somebody was talking about compos on this board) is emulating extended instruction sets on older CPUs. There are several approaches to try out performance wise (similar to VMs):

1) Dynamic code rewriting. Some of these extended instructions are so long that you can easily patch them over with a call to a subroutine.
2) Emulation via the illegal instruction exception.

_________________
comrade (comrade64@live.com; http://comrade.ownz.com/)
Post 24 Mar 2013, 08:44
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17437
Location: In your JS exploiting you and your system
revolution
comrade wrote:
1) Dynamic code rewriting. Some of these extended instructions are so long that you can easily patch them over with a call to a subroutine.
2) Emulation via the illegal instruction exception.
Well you can combine both by rewriting upon an exception. However, with AVs being particularly fussy and inaccurate, writing to your code section can cause problems with false triggering. !@#@$% AVs, should be banned anyway, they are useless, provide no protection, gives users a false sense of security with bad programs, gives users a false sense of danger for perfectly good programs, they harm performance and create havoc when they detect your own OS as malware. And many people even pay to put up with such nonsense!
Post 24 Mar 2013, 09:54
View user's profile Send private message Visit poster's website Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall
comrade wrote:

An interesting project (somebody was talking about compos on this board) is emulating extended instruction sets on older CPUs. There are several approaches to try out performance wise (similar to VMs):

1) Dynamic code rewriting. Some of these extended instructions are so long that you can easily patch them over with a call to a subroutine.
2) Emulation via the illegal instruction exception.


Intel has such a tool:
http://software.intel.com/en-us/articles/intel-software-development-emulator
Post 24 Mar 2013, 11:46
View user's profile Send private message Visit poster's website Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall
I have updated my program. Now it is fully multithreaded. See starting post for more details.
Post 03 Apr 2013, 11:01
View user's profile Send private message Visit poster's website Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1180
Location: Unknown
HaHaAnonymous
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 21:07; edited 1 time in total
Post 03 Apr 2013, 16:01
View user's profile Send private message Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69
keantoken
Well then how does one protect their computer? You have to do it somehow.

I will certainly study the multithreading code! Thanks a lot Randall!

Here's the results with my FX8350 at 4GHz with 1880MHz RAM:

EDIT: Actually here's the results:

$ time ./qjulia -v

real 0m1.200s
user 0m8.864s
sys 0m0.016s

And here's the results with a 2560x1440 resolution like the original version:

$ time ./qjulia -v

real 0m4.671s
user 0m35.442s
sys 0m0.043s
Post 13 Apr 2013, 14:52
View user's profile Send private message Reply with quote
Turbo Lover



Joined: 22 Feb 2013
Posts: 32
Turbo Lover
this is awesome! how long did it take you to develop this??? you must be very experienced in asm!
Post 14 Apr 2013, 17:06
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall
Thanks! It took my three or four days for single threaded version and another three days for multithreading code. keantoken motivated me to implement multithreaded version (thanks Smile).

The code is rather simple. You just have to know the technique. This is 'standard' distance field raymarching:
http://www.iquilezles.org/www/articles/raymarchingdf/raymarchingdf.htm

I knew the algo because some time ago I have written similar program using C++ and OpenGL. Using GPUs it can run in real-time.
Post 15 Apr 2013, 16:44
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 730
tthsqe
Ok, I just want to get some verification on how this works:
So, at point (X,Y,Z) in three space,
You form the quaternions
Code:
z'=1+0i+0j+0k
z=0+Xi+Yj+Zk    

Then you iterate
Code:
z'=2*z'*z
z=z*z+c    

After a few iterations, you estimate the distance from the point (X,Y,Z) to the surface by
Code:
DE = 0.5* log(|z|)*|z|/|z'|    

I see that c is your g_Quat parameter, but is the choice
Code:
z=0+Xi+Yj+Zk    

arbitrary?
Post 07 Jun 2013, 00:32
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 730
tthsqe
Also, how do you know that the iteration for z' is not
Code:
z'=z'*z+z*z'    
?
Post 07 Jun 2013, 02:51
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall
'c' parameter is arbitrary, different values generate different shapes just like in standard 2D Julia Sets.

It can be shown that derivative of Zn+1 = Zn*Zn + c is simply Z'n+1 = 2*Z'n*Zn.

Theory:
http://www.iquilezles.org/www/articles/juliasets3d/juliasets3d.htm
http://paulbourke.net/fractals/quatjulia/
http://devmaster.net/forums/topic/3432-ray-tracing-quaternion-julia-sets-on-the-gpu/
Post 07 Jun 2013, 09:53
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 730
tthsqe
Ah, sorry for my long post. Even though (z*z)' = 2*z*z' is not correct for quaternions, it seems that is has been shown that the iteration of z' = 2*z*z' does give a correct lower bound on the distance to the set. My bad.

In that case, you can probably save a lot of time by not keeping track of z', but only its modulus |z'|, or even better the square of its modulus.
Post 07 Jun 2013, 18:51
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall
I have tested my program on Haswell CPU (Core i7-4770K CPU @ 3.50GHz).

It takes 870 ms to generate 1280x720 image. It is almost 10x faster than my previous CPU (Core2 Duo @ 1.86GHz).
Post 09 Jun 2013, 12:05
View user's profile Send private message Visit poster's website Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 71
Melissa
bmaxa@maxa:~/fasm/examples/qjulia$ time ./qjulia

real 0m1.190s
user 0m4.628s
sys 0m0.000s

Wow 50% faster then i5 ivy bridge Wink
Post 09 Jun 2013, 14:29
View user's profile Send private message Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69
keantoken
AMD Ryzen 5 1500X 3.5GHz:

$ time ./qjulia

real 0m0.768s
user 0m5.970s
sys 0m0.004s
Post 02 Aug 2017, 17:07
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall
keantoken wrote:
AMD Ryzen 5 1500X 3.5GHz:

$ time ./qjulia

real 0m0.768s
user 0m5.970s
sys 0m0.004s


Nice, Ryzen seems quite fast.
Post 02 Aug 2017, 18:11
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1488
Furs
This is pretty amazing stuff. I've always been fascinated by quaternions since I can't really understand them (I'm not much of a math guy). Smile
Post 02 Aug 2017, 21:37
View user's profile Send private message Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69
keantoken
Using a tile size of 146 (default 80) gives about 30% performance gain for my CPU:

$ time ./qjulia

real 0m0.668s
user 0m4.683s
sys 0m0.004s
Post 03 Aug 2017, 05:29
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall
keantoken wrote:
Using a tile size of 146 (default 80) gives about 30% performance gain for my CPU:

$ time ./qjulia

real 0m0.668s
user 0m4.683s
sys 0m0.004s


Ryzen 5 has 4 cores or more?
Post 03 Aug 2017, 09:29
View user's profile Send private message Visit poster's website Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69
keantoken
It has 4 cores, 2 threads per core. I checked the core counter in qjulia and it reported 8. Oddly enough however, the CPU only reports 25% usage while qjulia is running.

EDIT: Nevermind, it uses the whole CPU, my CPU usage program showed incorrectly.
Post 04 Aug 2017, 18:03
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.