flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2, 3 Next |
Author |
|
randall
I have written assembly program to render Quaternion Julia Sets. Program uses no external library (only Linux syscalls) and saves rendered image to the TGA file. I hope it will be useful for someone. Code and example image is included in an attachment.
UPDATE. I have updated my program. Now it uses all CPU cores to generate the image. Only Linux syscalls have been used (no external libs). On Intel® Core™ i7-4770K CPU @ 3.50GHz (8 threads) it takes about 870ms to generate 1280x720 image. On Intel® Core™ i7 975 @ 3.33GHz (8 threads) it takes about 1300ms to generate 1280x720 image. On Intel® Core™ 2 Duo E6300 @ 1.86GHz (2 threads) it takes about 8000ms to generate 1280x720 image. I am using tile rendering method. Tile size is 80x80 pixels (can be changed via TILE_SIZE constant in the code). Program spawns as many worker threads as there are CPU cores on the machine. Each thread renders one tile at a time. When tile is completed by the thread atomic tile counter (g_imgtile variable) is incremented and next tile is taken. Each thread is terminated when there are no more tiles in the global pool. g_Quat variable can be changed to produce different shapes. For example set it to: -0.2,0.4,-0.4,-0.4 If you are interested see this: http://paulbourke.net/fractals/quatjulia/ Thanks.
Last edited by randall on 09 Jun 2013, 21:44; edited 12 times in total |
||||||||||||||||||||
![]() |
|
pelaillo
Very nice contribution. Thanks for sharing.
p.s. I think you have a very clean coding style! |
|||
![]() |
|
TmX
How do you produce the graphic? I ran the executable, but apparently it didn't do anything?
|
|||
![]() |
|
Matrix
randall wrote: You need to wait, generating the image take some time (it takes about 90 sec. on Core2 Duo 1.86 GHz). You can also change the resolution in the code to make it faster (by default it is 2560 x 1440). so it's a benchmark ? ![]() |
|||
![]() |
|
TmX
I changed the dimension into 1280x800, and it took about 10 secs to finish on my Core2 Duo 2 GHz.
|
|||
![]() |
|
randall
You can also change g_Quat variable to get different shapes. For example set it to: -0.2,0.4,-0.4,-0.4
If you are interested see this: http://paulbourke.net/fractals/quatjulia/ |
|||
![]() |
|
Matrix
here's a way to measure time accurately using kernel's HRT on linux:
Code: /* compile this little piece of code with the command: gcc -lrt -Wall -o timertest timertest.c */ #include <stdio.h> #include <stdlib.h> // for floating point output #include <stdint.h> // for uintXX_t declarations #include <math.h> #include <time.h> // let's use HRT #include <unistd.h> // for - usleep #include <linux/unistd.h> // for - usleep #include <sys/time.h> // let's use HRT #define NSEC_PER_SEC 1000000000 #define TIMER_RELTIME 0 static inline long calcdiff_long(struct timespec t1, struct timespec t2) { long diff; diff = NSEC_PER_SEC * ((long) t1.tv_sec - (long) t2.tv_sec); diff += ((long) t1.tv_nsec - (long) t2.tv_nsec); return diff; } static inline double_t calcdiff_double(struct timespec t1, struct timespec t2) { double_t diff; diff = (t1.tv_sec - t2.tv_sec); diff += ((double_t)(t1.tv_nsec - t2.tv_nsec)) / ((double_t)NSEC_PER_SEC); return diff; } static inline int addtime(struct timespec tin, uint64_t delta, struct timespec *tout) { uint64_t ldelta; ldelta=(uint64_t)delta+(uint64_t)tin.tv_nsec; tout->tv_nsec=(uint64_t)ldelta % (uint64_t)NSEC_PER_SEC; ldelta-=tout->tv_nsec; if (ldelta>0){ tout->tv_sec=(uint64_t)tin.tv_sec+(uint64_t)ldelta/(uint64_t)NSEC_PER_SEC; } else { tout->tv_sec=(uint64_t)tin.tv_sec; } return 0; } int main(int argc, char *argv[]) { struct timespec past, now, future,zerotime={0,0}; int ret; double delay; uint64_t udelay,ndelay; delay=0.1; ndelay=delay*(uint64_t)NSEC_PER_SEC; udelay=ndelay/1000; ret=clock_getres(CLOCK_MONOTONIC,&now); // or CLOCK_REALTIME printf("Timer resolution: %lu ns\n",now.tv_nsec); /* Get current time */ clock_gettime(CLOCK_MONOTONIC, &past); usleep(udelay); /* Get current time */ clock_gettime(CLOCK_MONOTONIC, &now); printf("usleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past)); clock_gettime(CLOCK_MONOTONIC, &past); addtime(zerotime, ndelay, &future); // relative time wait, monolitic is preferred clock_nanosleep(CLOCK_MONOTONIC, TIMER_RELTIME, &future, NULL); clock_gettime(CLOCK_MONOTONIC, &now); printf("relative nanosleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past)); clock_gettime(CLOCK_MONOTONIC, &past); addtime(past, ndelay, &future); // absolute time wait clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &future, NULL); clock_gettime(CLOCK_MONOTONIC, &now); printf("absolute nanosleep(+%.9f s) took: %.9f seconds\n", delay, calcdiff_double(now,past)); return 0; } it gives <70us precision using a realtime-preemted kernel. |
|||
![]() |
|
GordonK
Nice! Took about 29s in an Ubuntu VM on a hexacore i7. This is single threaded though, right?
|
|||
![]() |
|
randall
GordonK wrote: Nice! Took about 29s in an Ubuntu VM on a hexacore i7. This is single threaded though, right? Thanks. Glad you like it. Yes, this is single threaded. |
|||
![]() |
|
catafest
error with :
./qjulia bash: ./qjulia: cannot execute binary file also ll qjulia -rwxr-xr-x. .... qjulia I have a AMD Athlon XP ... Very nice to code for mmx ... I need more docs about this . Thank you. Regards . |
|||
![]() |
|
randall
catafest wrote: error with : Hi. Program requires 64 bit processor with SSE 3 support. |
|||
![]() |
|
catafest
randall wrote:
And 32 bits source code .... I try to change somthing : First I use format ELF executable 3 I got this error : mov r10d,0x02+0x20 ; flags = MAP_PRIVATE | MAP_ANONYMOUS I hinking is much to change ( registers and mnemonics of 32 bits versus 64 bits). Thank you. Regards. |
|||
![]() |
|
keantoken
This is cool!
Here's what I got on my AMD FX-8350: Code: $ time ./qjulia -v real 0m32.919s user 0m32.648s sys 0m0.065s |
|||
![]() |
|
HaHaAnonymous
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 21:22; edited 1 time in total |
|||
![]() |
|
macgub
Some time ago I ported this piece of art into KolibriOS and MenuetOS64.
http://macgub.co.pl/menuet/qjulia.zip -> code and binaries for MeOS64 and KolibriOS. http://macgub.co.pl/menuet/qjulia_big.jpg -> screenshot. Last edited by macgub on 15 Feb 2022, 17:14; edited 2 times in total |
|||
![]() |
|
randall
Thanks for your comments. I am very glad you like it.
|
|||
![]() |
|
keantoken
I was thinking, there is an instruction dpps that I think could make this program much quicker.
|
|||
![]() |
|
randall
Yes, but it requires SSE 4 support. I have old Core2 Duo at home so only SSSE3.
|
|||
![]() |
|
HaHaAnonymous
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 21:21; edited 1 time in total |
|||
![]() |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.