flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Linux > Multithreaded Quaternion Julia Sets renderer

Goto page Previous  1, 2, 3  Next
Author
Thread Post new topic Reply to topic
comrade



Joined: 16 Jun 2003
Posts: 1113
Location: Russian Federation


randall wrote:
Yes, but it requires SSE 4 support. I have old Core2 Duo at home so only SSSE3.



An interesting project (somebody was talking about compos on this board) is emulating extended instruction sets on older CPUs. There are several approaches to try out performance wise (similar to VMs):

1) Dynamic code rewriting. Some of these extended instructions are so long that you can easily patch them over with a call to a subroutine.
2) Emulation via the illegal instruction exception.

_________________
comrade (comrade64@live.com; http://comrade.ownz.com/)
Post 24 Mar 2013, 08:44
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15324
Location: Bigweld Industries


comrade wrote:
1) Dynamic code rewriting. Some of these extended instructions are so long that you can easily patch them over with a call to a subroutine.
2) Emulation via the illegal instruction exception.

Well you can combine both by rewriting upon an exception. However, with AVs being particularly fussy and inaccurate, writing to your code section can cause problems with false triggering. !@#@$% AVs, should be banned anyway, they are useless, provide no protection, gives users a false sense of security with bad programs, gives users a false sense of danger for perfectly good programs, they harm performance and create havoc when they detect your own OS as malware. And many people even pay to put up with such nonsense!
Post 24 Mar 2013, 09:54
View user's profile Send private message Visit poster's website Reply with quote
randall



Joined: 03 Dec 2011
Posts: 152
Location: Poland


comrade wrote:

An interesting project (somebody was talking about compos on this board) is emulating extended instruction sets on older CPUs. There are several approaches to try out performance wise (similar to VMs):

1) Dynamic code rewriting. Some of these extended instructions are so long that you can easily patch them over with a call to a subroutine.
2) Emulation via the illegal instruction exception.



Intel has such a tool:
http://software.intel.com/en-us/articles/intel-software-development-emulator
Post 24 Mar 2013, 11:46
View user's profile Send private message Visit poster's website Reply with quote
randall



Joined: 03 Dec 2011
Posts: 152
Location: Poland

I have updated my program. Now it is fully multithreaded. See starting post for more details.
Post 03 Apr 2013, 11:01
View user's profile Send private message Visit poster's website Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1171
Location: Unknown

Stupid post removed.


Last edited by HaHaAnonymous on 28 Feb 2015, 21:07; edited 1 time in total
Post 03 Apr 2013, 16:01
View user's profile Send private message Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69

Well then how does one protect their computer? You have to do it somehow.

I will certainly study the multithreading code! Thanks a lot Randall!

Here's the results with my FX8350 at 4GHz with 1880MHz RAM:

EDIT: Actually here's the results:

$ time ./qjulia -v

real 0m1.200s
user 0m8.864s
sys 0m0.016s

And here's the results with a 2560x1440 resolution like the original version:

$ time ./qjulia -v

real 0m4.671s
user 0m35.442s
sys 0m0.043s
Post 13 Apr 2013, 14:52
View user's profile Send private message Reply with quote
Turbo Lover



Joined: 22 Feb 2013
Posts: 32

this is awesome! how long did it take you to develop this??? you must be very experienced in asm!
Post 14 Apr 2013, 17:06
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 152
Location: Poland

Thanks! It took my three or four days for single threaded version and another three days for multithreading code. keantoken motivated me to implement multithreaded version (thanks Smile).

The code is rather simple. You just have to know the technique. This is 'standard' distance field raymarching:
http://www.iquilezles.org/www/articles/raymarchingdf/raymarchingdf.htm

I knew the algo because some time ago I have written similar program using C++ and OpenGL. Using GPUs it can run in real-time.
Post 15 Apr 2013, 16:44
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 701

Ok, I just want to get some verification on how this works:
So, at point (X,Y,Z) in three space,
You form the quaternions

Code:
z'=1+0i+0j+0k
z=0+Xi+Yj+Zk


Then you iterate

Code:
z'=2*z'*z
z=z*z+c


After a few iterations, you estimate the distance from the point (X,Y,Z) to the surface by

Code:
DE = 0.5log(|z|)*|z|/|z'|


I see that c is your g_Quat parameter, but is the choice

Code:
z=0+Xi+Yj+Zk


arbitrary?
Post 07 Jun 2013, 00:32
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 701

Also, how do you know that the iteration for z' is not

Code:
z'=z'*z+z*z'

?
Post 07 Jun 2013, 02:51
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 152
Location: Poland

'c' parameter is arbitrary, different values generate different shapes just like in standard 2D Julia Sets.

It can be shown that derivative of Zn+1 = Zn*Zn + c is simply Z'n+1 = 2*Z'n*Zn.

Theory:
http://www.iquilezles.org/www/articles/juliasets3d/juliasets3d.htm
http://paulbourke.net/fractals/quatjulia/
http://devmaster.net/forums/topic/3432-ray-tracing-quaternion-julia-sets-on-the-gpu/
Post 07 Jun 2013, 09:53
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 701

Ah, sorry for my long post. Even though (z*z)' = 2*z*z' is not correct for quaternions, it seems that is has been shown that the iteration of z' = 2*z*z' does give a correct lower bound on the distance to the set. My bad.

In that case, you can probably save a lot of time by not keeping track of z', but only its modulus |z'|, or even better the square of its modulus.
Post 07 Jun 2013, 18:51
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 152
Location: Poland

I have tested my program on Haswell CPU (Core i7-4770K CPU @ 3.50GHz).

It takes 870 ms to generate 1280x720 image. It is almost 10x faster than my previous CPU (Core2 Duo @ 1.86GHz).
Post 09 Jun 2013, 12:05
View user's profile Send private message Visit poster's website Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 28

bmaxa@maxa:~/fasm/examples/qjulia$ time ./qjulia

real 0m1.190s
user 0m4.628s
sys 0m0.000s

Wow 50% faster then i5 ivy bridge Wink
Post 09 Jun 2013, 14:29
View user's profile Send private message Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69

AMD Ryzen 5 1500X 3.5GHz:

$ time ./qjulia

real 0m0.768s
user 0m5.970s
sys 0m0.004s
Post 02 Aug 2017, 17:07
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 152
Location: Poland


keantoken wrote:
AMD Ryzen 5 1500X 3.5GHz:

$ time ./qjulia

real 0m0.768s
user 0m5.970s
sys 0m0.004s



Nice, Ryzen seems quite fast.

_________________
https://github.com/michal-z
Post 02 Aug 2017, 18:11
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 899

This is pretty amazing stuff. I've always been fascinated by quaternions since I can't really understand them (I'm not much of a math guy). Smile
Post 02 Aug 2017, 21:37
View user's profile Send private message Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69

Using a tile size of 146 (default 80) gives about 30% performance gain for my CPU:

$ time ./qjulia

real 0m0.668s
user 0m4.683s
sys 0m0.004s
Post 03 Aug 2017, 05:29
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 152
Location: Poland


keantoken wrote:
Using a tile size of 146 (default 80) gives about 30% performance gain for my CPU:

$ time ./qjulia

real 0m0.668s
user 0m4.683s
sys 0m0.004s



Ryzen 5 has 4 cores or more?

_________________
https://github.com/michal-z
Post 03 Aug 2017, 09:29
View user's profile Send private message Visit poster's website Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69

It has 4 cores, 2 threads per core. I checked the core counter in qjulia and it reported 8. Oddly enough however, the CPU only reports 25% usage while qjulia is running.

EDIT: Nevermind, it uses the whole CPU, my CPU usage program showed incorrectly.
Post 04 Aug 2017, 18:03
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3  Next

< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2005 phpBB Group.

Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2017, Tomasz Grysztar.