flat assembler
Message board for the users of flat assembler.
Index
> Linux > Multithreaded Quaternion Julia Sets renderer Goto page 1, 2, 3 Next |
Author |
|
randall 03 Dec 2011, 15:30
I have written assembly program to render Quaternion Julia Sets. Program uses no external library (only Linux syscalls) and saves rendered image to the TGA file. I hope it will be useful for someone. Code and example image is included in an attachment.
UPDATE. I have updated my program. Now it uses all CPU cores to generate the image. Only Linux syscalls have been used (no external libs). On Intel® Core™ i7-4770K CPU @ 3.50GHz (8 threads) it takes about 870ms to generate 1280x720 image. On Intel® Core™ i7 975 @ 3.33GHz (8 threads) it takes about 1300ms to generate 1280x720 image. On Intel® Core™ 2 Duo E6300 @ 1.86GHz (2 threads) it takes about 8000ms to generate 1280x720 image. I am using tile rendering method. Tile size is 80x80 pixels (can be changed via TILE_SIZE constant in the code). Program spawns as many worker threads as there are CPU cores on the machine. Each thread renders one tile at a time. When tile is completed by the thread atomic tile counter (g_imgtile variable) is incremented and next tile is taken. Each thread is terminated when there are no more tiles in the global pool. g_Quat variable can be changed to produce different shapes. For example set it to: -0.2,0.4,-0.4,-0.4 If you are interested see this: http://paulbourke.net/fractals/quatjulia/ Thanks.
Last edited by randall on 09 Jun 2013, 21:44; edited 12 times in total |
||||||||||||||||||||
03 Dec 2011, 15:30 |
|
pelaillo 05 Dec 2011, 15:16
Very nice contribution. Thanks for sharing.
p.s. I think you have a very clean coding style! |
|||
05 Dec 2011, 15:16 |
|
TmX 07 Dec 2011, 14:55
How do you produce the graphic? I ran the executable, but apparently it didn't do anything?
|
|||
07 Dec 2011, 14:55 |
|
Matrix 07 Dec 2011, 18:06
randall wrote: You need to wait, generating the image take some time (it takes about 90 sec. on Core2 Duo 1.86 GHz). You can also change the resolution in the code to make it faster (by default it is 2560 x 1440). so it's a benchmark ? |
|||
07 Dec 2011, 18:06 |
|
TmX 09 Dec 2011, 15:10
I changed the dimension into 1280x800, and it took about 10 secs to finish on my Core2 Duo 2 GHz.
|
|||
09 Dec 2011, 15:10 |
|
randall 11 Dec 2011, 15:10
You can also change g_Quat variable to get different shapes. For example set it to: -0.2,0.4,-0.4,-0.4
If you are interested see this: http://paulbourke.net/fractals/quatjulia/ |
|||
11 Dec 2011, 15:10 |
|
Matrix 11 Dec 2011, 15:43
here's a way to measure time accurately using kernel's HRT on linux:
Code: /* compile this little piece of code with the command: gcc -lrt -Wall -o timertest timertest.c */ #include <stdio.h> #include <stdlib.h> // for floating point output #include <stdint.h> // for uintXX_t declarations #include <math.h> #include <time.h> // let's use HRT #include <unistd.h> // for - usleep #include <linux/unistd.h> // for - usleep #include <sys/time.h> // let's use HRT #define NSEC_PER_SEC 1000000000 #define TIMER_RELTIME 0 static inline long calcdiff_long(struct timespec t1, struct timespec t2) { long diff; diff = NSEC_PER_SEC * ((long) t1.tv_sec - (long) t2.tv_sec); diff += ((long) t1.tv_nsec - (long) t2.tv_nsec); return diff; } static inline double_t calcdiff_double(struct timespec t1, struct timespec t2) { double_t diff; diff = (t1.tv_sec - t2.tv_sec); diff += ((double_t)(t1.tv_nsec - t2.tv_nsec)) / ((double_t)NSEC_PER_SEC); return diff; } static inline int addtime(struct timespec tin, uint64_t delta, struct timespec *tout) { uint64_t ldelta; ldelta=(uint64_t)delta+(uint64_t)tin.tv_nsec; tout->tv_nsec=(uint64_t)ldelta % (uint64_t)NSEC_PER_SEC; ldelta-=tout->tv_nsec; if (ldelta>0){ tout->tv_sec=(uint64_t)tin.tv_sec+(uint64_t)ldelta/(uint64_t)NSEC_PER_SEC; } else { tout->tv_sec=(uint64_t)tin.tv_sec; } return 0; } int main(int argc, char *argv[]) { struct timespec past, now, future,zerotime={0,0}; int ret; double delay; uint64_t udelay,ndelay; delay=0.1; ndelay=delay*(uint64_t)NSEC_PER_SEC; udelay=ndelay/1000; ret=clock_getres(CLOCK_MONOTONIC,&now); // or CLOCK_REALTIME printf("Timer resolution: %lu ns\n",now.tv_nsec); /* Get current time */ clock_gettime(CLOCK_MONOTONIC, &past); usleep(udelay); /* Get current time */ clock_gettime(CLOCK_MONOTONIC, &now); printf("usleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past)); clock_gettime(CLOCK_MONOTONIC, &past); addtime(zerotime, ndelay, &future); // relative time wait, monolitic is preferred clock_nanosleep(CLOCK_MONOTONIC, TIMER_RELTIME, &future, NULL); clock_gettime(CLOCK_MONOTONIC, &now); printf("relative nanosleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past)); clock_gettime(CLOCK_MONOTONIC, &past); addtime(past, ndelay, &future); // absolute time wait clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &future, NULL); clock_gettime(CLOCK_MONOTONIC, &now); printf("absolute nanosleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past)); return 0; } it gives <70us precision using a realtime-preemted kernel. |
|||
11 Dec 2011, 15:43 |
|
GordonK 25 Dec 2011, 12:25
Nice! Took about 29s in an Ubuntu VM on a hexacore i7. This is single threaded though, right?
|
|||
25 Dec 2011, 12:25 |
|
randall 26 Dec 2011, 00:07
GordonK wrote: Nice! Took about 29s in an Ubuntu VM on a hexacore i7. This is single threaded though, right? Thanks. Glad you like it. Yes, this is single threaded. |
|||
26 Dec 2011, 00:07 |
|
catafest 28 Dec 2011, 09:13
error with :
./qjulia bash: ./qjulia: cannot execute binary file also ll qjulia -rwxr-xr-x. .... qjulia I have a AMD Athlon XP ... Very nice to code for mmx ... I need more docs about this . Thank you. Regards . |
|||
28 Dec 2011, 09:13 |
|
randall 28 Dec 2011, 12:34
catafest wrote: error with : Hi. Program requires 64 bit processor with SSE 3 support. |
|||
28 Dec 2011, 12:34 |
|
catafest 28 Dec 2011, 14:21
randall wrote:
And 32 bits source code .... I try to change somthing : First I use format ELF executable 3 I got this error : mov r10d,0x02+0x20 ; flags = MAP_PRIVATE | MAP_ANONYMOUS I hinking is much to change ( registers and mnemonics of 32 bits versus 64 bits). Thank you. Regards. |
|||
28 Dec 2011, 14:21 |
|
keantoken 09 Mar 2013, 18:41
This is cool!
Here's what I got on my AMD FX-8350: Code: $ time ./qjulia -v real 0m32.919s user 0m32.648s sys 0m0.065s |
|||
09 Mar 2013, 18:41 |
|
HaHaAnonymous 20 Mar 2013, 20:15
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 21:22; edited 1 time in total |
|||
20 Mar 2013, 20:15 |
|
macgub 22 Mar 2013, 07:40
Some time ago I ported this piece of art into KolibriOS and MenuetOS64.
http://macgub.co.pl/menuet/qjulia.zip -> code and binaries for MeOS64 and KolibriOS. http://macgub.co.pl/menuet/qjulia_big.jpg -> screenshot. Last edited by macgub on 15 Feb 2022, 17:14; edited 2 times in total |
|||
22 Mar 2013, 07:40 |
|
randall 22 Mar 2013, 18:29
Thanks for your comments. I am very glad you like it.
|
|||
22 Mar 2013, 18:29 |
|
keantoken 23 Mar 2013, 02:08
I was thinking, there is an instruction dpps that I think could make this program much quicker.
|
|||
23 Mar 2013, 02:08 |
|
randall 23 Mar 2013, 13:18
Yes, but it requires SSE 4 support. I have old Core2 Duo at home so only SSSE3.
|
|||
23 Mar 2013, 13:18 |
|
HaHaAnonymous 23 Mar 2013, 16:42
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 21:21; edited 1 time in total |
|||
23 Mar 2013, 16:42 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.