flat assembler
Message board for the users of flat assembler.
Index
> Linux > Mandelbrot renderer Goto page Previous 1, 2 |
Author |
|
randall 20 Mar 2012, 15:55
I have uploaded new version. Faster compare and faster sign changing (was mulps now it is just xorps).
real 0m1.764s user 0m1.750s sys 0m0.010s |
|||
20 Mar 2012, 15:55 |
|
Madis731 21 Mar 2012, 06:45
How much is the compiler faster? Twice as fast or 10% faster? Depending on that, we can start to search for reasons.
|
|||
21 Mar 2012, 06:45 |
|
randall 21 Mar 2012, 09:13
Madis731 wrote: How much is the compiler faster? Twice as fast or 10% faster? Depending on that, we can start to search for reasons. Now the results are (Core2 Duo 6300 @ 1.86 GHz): ASM version: real 0m1.764s user 0m1.830s sys 0m0.010s C++ version (with -O3 flag): real 0m1.133s user 0m1.120s sys 0m0.010s |
|||
21 Mar 2012, 09:13 |
|
tthsqe 22 Mar 2012, 17:21
randall, thanks for the info on the gpu.
I feel I should help you alittle with your cpu version. First I should mention http://board.flatassembler.net/topic.php?t=12722 where I and Kuemmel have creaded a super fast mandelbrot renderer for windows. For example, the image shown on that page is 1080x1920 and took just 0.05 sec. If you want to increase the speed you have three independent options: - multithread (use multiple coures) - vectorize (use mulps instead of mulss) - parallelize (unroll loops and handle multiple points per loop) The downside is that is going to increase the complexity of your code... |
|||
22 Mar 2012, 17:21 |
|
tthsqe 22 Mar 2012, 17:26
For example, this looks esp. slow:
Code: ; dz = 2.0 * z * dz + (1.0,0.0) movaps xmm0,xmm14 movaps xmm1,xmm15 shufps xmm0,xmm0,01000100b shufps xmm1,xmm1,00010100b mulps xmm0,xmm1 xorps xmm0,dqword [g_inv_y_sign] movaps xmm1,xmm0 shufps xmm0,xmm0,00001000b shufps xmm1,xmm1,00001101b addps xmm0,xmm1 addps xmm0,xmm0 addss xmm0,[g_1_0] movaps xmm15,xmm0 ; z = z * z + c movaps xmm0,xmm14 movaps xmm1,xmm0 shufps xmm0,xmm0,00000100b shufps xmm1,xmm1,01010100b mulps xmm0,xmm1 xorps xmm0,dqword [g_inv_y_sign] movaps xmm1,xmm0 shufps xmm0,xmm0,00001000b shufps xmm1,xmm1,00001101b addps xmm0,xmm1 addps xmm0,xmm13 movaps xmm14,xmm0 I would NOT store the real and imaginary parts in the same vector as then shuf in needed to move things around. |
|||
22 Mar 2012, 17:26 |
|
randall 22 Mar 2012, 19:43
tthsqe wrote: randall, thanks for the info on the gpu. Impressive work. And thanks for the tips. |
|||
22 Mar 2012, 19:43 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.