flat assembler
Message board for the users of flat assembler.

Index > Linux > Multithreaded Quaternion Julia Sets renderer

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 03 Dec 2011, 15:30
I have written assembly program to render Quaternion Julia Sets. Program uses no external library (only Linux syscalls) and saves rendered image to the TGA file. I hope it will be useful for someone. Code and example image is included in an attachment.

UPDATE.

I have updated my program. Now it uses all CPU cores to generate the image. Only Linux syscalls have been used (no external libs).

On Intel® Core™ i7-4770K CPU @ 3.50GHz (8 threads) it takes about 870ms to generate 1280x720 image.

On Intel® Core™ i7 975 @ 3.33GHz (8 threads) it takes about 1300ms to generate 1280x720 image.

On Intel® Core™ 2 Duo E6300 @ 1.86GHz (2 threads) it takes about 8000ms to generate 1280x720 image.

I am using tile rendering method. Tile size is 80x80 pixels (can be changed via TILE_SIZE constant in the code). Program spawns as many worker threads as there are CPU cores on the machine. Each thread renders one tile at a time. When tile is completed by the thread atomic tile counter (g_imgtile variable) is incremented and next tile is taken. Each thread is terminated when there are no more tiles in the global pool.

g_Quat variable can be changed to produce different shapes. For example set it to: -0.2,0.4,-0.4,-0.4
If you are interested see this: http://paulbourke.net/fractals/quatjulia/

Thanks.


Description:
Filesize: 296.54 KB
Viewed: 25815 Time(s)

qjulia0.png


Description:
Download
Filename: qjulia.asm
Filesize: 35.43 KB
Downloaded: 1133 Time(s)



Last edited by randall on 09 Jun 2013, 21:44; edited 12 times in total
Post 03 Dec 2011, 15:30
View user's profile Send private message Visit poster's website Reply with quote
pelaillo
Missing in inaction


Joined: 19 Jun 2003
Posts: 878
Location: Colombia
pelaillo 05 Dec 2011, 15:16
Very nice contribution. Thanks for sharing.

p.s. I think you have a very clean coding style!
Post 05 Dec 2011, 15:16
View user's profile Send private message Yahoo Messenger Reply with quote
TmX



Joined: 02 Mar 2006
Posts: 843
Location: Jakarta, Indonesia
TmX 07 Dec 2011, 14:55
How do you produce the graphic? I ran the executable, but apparently it didn't do anything?
Post 07 Dec 2011, 14:55
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 07 Dec 2011, 15:29
You need to wait, generating the image take some time (it takes about 90 sec. on Core2 Duo 1.86 GHz). You can also change the resolution in the code to make it faster (by default it is 2560 x 1440).
Post 07 Dec 2011, 15:29
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 07 Dec 2011, 18:06
randall wrote:
You need to wait, generating the image take some time (it takes about 90 sec. on Core2 Duo 1.86 GHz). You can also change the resolution in the code to make it faster (by default it is 2560 x 1440).


so it's a benchmark ? Wink
Post 07 Dec 2011, 18:06
View user's profile Send private message Visit poster's website Reply with quote
TmX



Joined: 02 Mar 2006
Posts: 843
Location: Jakarta, Indonesia
TmX 09 Dec 2011, 15:10
I changed the dimension into 1280x800, and it took about 10 secs to finish on my Core2 Duo 2 GHz.
Post 09 Dec 2011, 15:10
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 11 Dec 2011, 15:10
You can also change g_Quat variable to get different shapes. For example set it to: -0.2,0.4,-0.4,-0.4
If you are interested see this: http://paulbourke.net/fractals/quatjulia/
Post 11 Dec 2011, 15:10
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 11 Dec 2011, 15:43
here's a way to measure time accurately using kernel's HRT on linux:

Code:
/*
compile this little piece of code with the command:
 gcc -lrt -Wall -o timertest timertest.c
*/

#include <stdio.h>
#include <stdlib.h> // for floating point output
#include <stdint.h> // for uintXX_t declarations
#include <math.h>
#include <time.h> // let's use HRT
#include <unistd.h> // for - usleep
#include <linux/unistd.h> // for - usleep
#include <sys/time.h> // let's use HRT

#define NSEC_PER_SEC       1000000000
#define TIMER_RELTIME     0

static inline long calcdiff_long(struct timespec t1, struct timespec t2)
{
 long diff;
  diff = NSEC_PER_SEC * ((long) t1.tv_sec - (long) t2.tv_sec);
        diff += ((long) t1.tv_nsec - (long) t2.tv_nsec);
    return diff;
}

static inline double_t calcdiff_double(struct timespec t1, struct timespec t2)
{
      double_t diff;
      diff = (t1.tv_sec - t2.tv_sec);
     diff += ((double_t)(t1.tv_nsec - t2.tv_nsec)) / ((double_t)NSEC_PER_SEC);
   return diff;
}

static inline int addtime(struct timespec tin, uint64_t delta, struct timespec *tout)
{
       uint64_t ldelta;
    ldelta=(uint64_t)delta+(uint64_t)tin.tv_nsec;
       tout->tv_nsec=(uint64_t)ldelta % (uint64_t)NSEC_PER_SEC;
 ldelta-=tout->tv_nsec;
   if (ldelta>0){
              tout->tv_sec=(uint64_t)tin.tv_sec+(uint64_t)ldelta/(uint64_t)NSEC_PER_SEC;
       } else {
                  tout->tv_sec=(uint64_t)tin.tv_sec;
                       }
return 0;
}

int main(int argc, char *argv[]) 
{ 
struct timespec past, now, future,zerotime={0,0};
int ret;
double delay;
uint64_t udelay,ndelay;

delay=0.1;
ndelay=delay*(uint64_t)NSEC_PER_SEC;
udelay=ndelay/1000;

ret=clock_getres(CLOCK_MONOTONIC,&now); // or CLOCK_REALTIME
printf("Timer resolution: %lu ns\n",now.tv_nsec);

/* Get current time */
clock_gettime(CLOCK_MONOTONIC, &past);
usleep(udelay);
/* Get current time */
clock_gettime(CLOCK_MONOTONIC, &now);
printf("usleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past));

clock_gettime(CLOCK_MONOTONIC, &past);
addtime(zerotime, ndelay, &future);
// relative time wait, monolitic is preferred
clock_nanosleep(CLOCK_MONOTONIC, TIMER_RELTIME, &future, NULL);
clock_gettime(CLOCK_MONOTONIC, &now);
printf("relative nanosleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past));

clock_gettime(CLOCK_MONOTONIC, &past);
addtime(past, ndelay, &future);
// absolute time wait
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &future, NULL);
clock_gettime(CLOCK_MONOTONIC, &now);
printf("absolute nanosleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past));

 return 0; 
} 

    


it gives <70us precision using a realtime-preemted kernel.
Post 11 Dec 2011, 15:43
View user's profile Send private message Visit poster's website Reply with quote
GordonK



Joined: 25 Dec 2011
Posts: 9
Location: USA
GordonK 25 Dec 2011, 12:25
Nice! Took about 29s in an Ubuntu VM on a hexacore i7. This is single threaded though, right?
Post 25 Dec 2011, 12:25
View user's profile Send private message Visit poster's website Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 26 Dec 2011, 00:07
GordonK wrote:
Nice! Took about 29s in an Ubuntu VM on a hexacore i7. This is single threaded though, right?


Thanks. Glad you like it. Yes, this is single threaded.
Post 26 Dec 2011, 00:07
View user's profile Send private message Visit poster's website Reply with quote
catafest



Joined: 05 Aug 2010
Posts: 129
catafest 28 Dec 2011, 09:13
error with :
./qjulia
bash: ./qjulia: cannot execute binary file
also
ll qjulia
-rwxr-xr-x. .... qjulia
I have a AMD Athlon XP ...
Very nice to code for mmx ... I need more docs about this .
Thank you. Regards .
Post 28 Dec 2011, 09:13
View user's profile Send private message Visit poster's website Yahoo Messenger Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 28 Dec 2011, 12:34
catafest wrote:
error with :
./qjulia
bash: ./qjulia: cannot execute binary file
also
ll qjulia
-rwxr-xr-x. .... qjulia
I have a AMD Athlon XP ...
Very nice to code for mmx ... I need more docs about this .
Thank you. Regards .


Hi. Program requires 64 bit processor with SSE 3 support.
Post 28 Dec 2011, 12:34
View user's profile Send private message Visit poster's website Reply with quote
catafest



Joined: 05 Aug 2010
Posts: 129
catafest 28 Dec 2011, 14:21
randall wrote:
catafest wrote:
error with :
./qjulia
bash: ./qjulia: cannot execute binary file
also
ll qjulia
-rwxr-xr-x. .... qjulia
I have a AMD Athlon XP ...
Very nice to code for mmx ... I need more docs about this .
Thank you. Regards .


Hi. Program requires 64 bit processor with SSE 3 support.


And 32 bits source code ....
I try to change somthing :
First I use format ELF executable 3
I got this error : mov r10d,0x02+0x20 ; flags = MAP_PRIVATE | MAP_ANONYMOUS
I hinking is much to change ( registers and mnemonics of 32 bits versus 64 bits).
Thank you. Regards.
Post 28 Dec 2011, 14:21
View user's profile Send private message Visit poster's website Yahoo Messenger Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69
keantoken 09 Mar 2013, 18:41
This is cool!

Here's what I got on my AMD FX-8350:

Code:
$ time ./qjulia -v

real    0m32.919s
user    0m32.648s
sys     0m0.065s
    
Post 09 Mar 2013, 18:41
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 20 Mar 2013, 20:15
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 21:22; edited 1 time in total
Post 20 Mar 2013, 20:15
View user's profile Send private message Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 350
Location: Poland
macgub 22 Mar 2013, 07:40
Some time ago I ported this piece of art into KolibriOS and MenuetOS64.
http://macgub.co.pl/menuet/qjulia.zip -> code and binaries for MeOS64 and KolibriOS.
http://macgub.co.pl/menuet/qjulia_big.jpg -> screenshot.


Last edited by macgub on 15 Feb 2022, 17:14; edited 2 times in total
Post 22 Mar 2013, 07:40
View user's profile Send private message Visit poster's website Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 22 Mar 2013, 18:29
Thanks for your comments. I am very glad you like it.
Post 22 Mar 2013, 18:29
View user's profile Send private message Visit poster's website Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69
keantoken 23 Mar 2013, 02:08
I was thinking, there is an instruction dpps that I think could make this program much quicker.
Post 23 Mar 2013, 02:08
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 23 Mar 2013, 13:18
Yes, but it requires SSE 4 support. I have old Core2 Duo at home so only SSSE3.
Post 23 Mar 2013, 13:18
View user's profile Send private message Visit poster's website Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 23 Mar 2013, 16:42
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 21:21; edited 1 time in total
Post 23 Mar 2013, 16:42
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.