flat assembler
Message board for the users of flat assembler.
Index
> Main > Perlin noise, the making of |
Author |
|
Madis731 31 Jul 2008, 22:26
Lets start off easy, this sample is a Java implementation of what I call "Perlin in a square" 'cuz of its formatting. Has some weird stuff, that I don't understand, but I get the jist.
The problem is there just isn't a good way to solve it in ASM. Or maybe in any language. It has got mixed integer, float stuff and LUT that you can't SIMD properly. Well it really is fast (compared to previous versions), but all this code is for ONE PIXEL! Okay, so what else can we find on the Internet (Google...). There's a FAQ that describes somewhat, what's behind. This guy is in the middle of making a really impressive game (billions of purely generated planets) and he has optimized it a bit (though SSE4.1 has got rounding that makes the conversions a bit easier). It wasn't until this code that I started to ponder about SSE implementation (more functions here). To get the very true nature of this noise, there's one sentence that I came up with: Pick some pseudo-random values, space them evenly on a pixel grid and interpolate other pixels using weighed average. For the sake of clarity, lets deal only with 2D now! Lets meet the words: 1) random - we're playing with a range of 0..255 (black to white) 2) evenly - random pixel every 2^n (where n<N) pixels, where a generated image is 2^N*2^N pixels 3) weighed - if it were linear, the image would look like pyramids, but we'll use what's called the "ease curve" on the FAQ. I'm not very goot at C, but the tests that I made with Intel's otherwise perfect SIMDer compiler concluded that Perlin's noise function is too big of a nut to crack. So I figure I need to do it manually. Here's the Intel's version of hand-made (robotbastards code) intrinsics code. All inlined because you can't put SSE in a function (i.e. its really hard). Ugly heh? Code: ;438 lines - prologue and epilogue stripped movaps xmm2, XMMWORD PTR _2il0floatpacket$3(rip) ;137.10 movaps xmm1, xmm0 ;137.10 subps xmm1, XMMWORD PTR _2il0floatpacket$1(rip) ;137.10 movdqa xmm6, XMMWORD PTR [rsp+192] ;137.10 movaps xmm11, XMMWORD PTR [rsp+304] ;137.10 movaps xmm8, XMMWORD PTR [rsp+256] ;137.10 cvtps2dq xmm3, xmm1 ;137.10 movdqa xmm13, XMMWORD PTR _2il0floatpacket$8(rip) ;137.10 cvtdq2ps xmm1, xmm3 ;137.10 pand xmm3, XMMWORD PTR _2il0floatpacket$2(rip) ;137.10 paddd xmm6, xmm3 ;137.10 paddd xmm3, XMMWORD PTR [rsp+272] ;137.10 movaps xmm9, XMMWORD PTR _2il0floatpacket$9(rip) ;137.10 subps xmm0, xmm1 ;137.10 movaps xmm12, XMMWORD PTR _2il0floatpacket$10(rip) ;137.10 movaps xmm10, XMMWORD PTR _2il0floatpacket$12(rip) ;137.10 mulps xmm2, xmm0 ;137.10 subps xmm2, XMMWORD PTR _2il0floatpacket$4(rip) ;137.10 movaps XMMWORD PTR [rsp+336], xmm0 ;137.10 movaps xmm1, xmm0 ;137.10 mulps xmm1, xmm0 ;137.10 mulps xmm2, xmm0 ;137.10 addps xmm2, XMMWORD PTR _2il0floatpacket$5(rip) ;137.10 mulps xmm1, xmm0 ;137.10 mulps xmm2, xmm1 ;137.10 movdqa xmm1, XMMWORD PTR [rsp+240] ;137.10 movdqa xmm5, xmm1 ;137.10 movaps XMMWORD PTR [rsp+352], xmm2 ;137.10 paddd xmm5, xmm6 ;137.10 paddd xmm6, xmm14 ;137.10 movdqa xmm4, xmm1 ;137.10 paddd xmm6, xmm1 ;137.10 paddd xmm4, xmm3 ;137.10 paddd xmm3, xmm14 ;137.10 paddd xmm3, xmm1 ;137.10 movdqa XMMWORD PTR [rsp+368], xmm3 ;137.10 movaps xmm1, xmm0 ;137.10 subps xmm1, XMMWORD PTR _2il0floatpacket$7(rip) ;137.10 movaps XMMWORD PTR [rsp+384], xmm1 ;137.10 movdqa xmm1, xmm14 ;137.10 paddd xmm1, xmm5 ;137.10 pand xmm5, xmm13 ;137.10 cvtdq2ps xmm7, xmm5 ;137.10 movaps xmm2, xmm7 ;137.10 movdqa XMMWORD PTR [rsp+400], xmm1 ;137.10 movdqa xmm1, xmm14 ;137.10 paddd xmm1, xmm4 ;137.10 cmpltps xmm2, xmm9 ;137.10 pand xmm4, xmm13 ;137.10 movdqa XMMWORD PTR [rsp+416], xmm1 ;137.10 movdqa xmm1, xmm14 ;137.10 paddd xmm1, xmm6 ;137.10 pand xmm6, xmm13 ;137.10 movdqa XMMWORD PTR [rsp+432], xmm1 ;137.10 movdqa xmm1, xmm14 ;137.10 paddd xmm1, xmm3 ;137.10 movaps xmm3, xmm7 ;137.10 cmpltps xmm3, xmm12 ;137.10 movaps xmm15, xmm3 ;137.10 movdqa XMMWORD PTR [rsp+448], xmm1 ;137.10 movaps xmm1, xmm11 ;137.10 andps xmm1, xmm2 ;137.10 andnps xmm2, xmm0 ;137.10 orps xmm1, xmm2 ;137.10 movaps xmm2, XMMWORD PTR _2il0floatpacket$11(rip) ;137.10 andps xmm15, xmm0 ;137.10 cmpeqps xmm2, xmm7 ;137.10 cmpeqps xmm7, xmm10 ;137.10 orps xmm2, xmm7 ;137.10 movaps xmm7, xmm11 ;137.10 andps xmm7, xmm2 ;137.10 andnps xmm2, xmm8 ;137.10 orps xmm7, xmm2 ;137.10 movdqa xmm2, XMMWORD PTR _2il0floatpacket$13(rip) ;137.10 andnps xmm3, xmm7 ;137.10 orps xmm15, xmm3 ;137.10 pand xmm2, xmm5 ;137.10 cvtdq2ps xmm3, xmm2 ;137.10 pand xmm5, xmm14 ;137.10 cvtdq2ps xmm2, xmm5 ;137.10 xorps xmm5, xmm5 ;137.10 xorps xmm7, xmm7 ;137.10 cmpeqps xmm3, xmm5 ;137.10 movaps xmm5, xmm1 ;137.10 andps xmm5, xmm3 ;137.10 subps xmm7, xmm1 ;137.10 andnps xmm3, xmm7 ;137.10 movaps xmm7, XMMWORD PTR [rsp+320] ;137.10 orps xmm5, xmm3 ;137.10 xorps xmm1, xmm1 ;137.10 cmpeqps xmm2, xmm1 ;137.10 movaps xmm3, xmm15 ;137.10 andps xmm3, xmm2 ;137.10 xorps xmm1, xmm1 ;137.10 subps xmm1, xmm15 ;137.10 andnps xmm2, xmm1 ;137.10 orps xmm3, xmm2 ;137.10 addps xmm5, xmm3 ;137.10 movaps xmm3, xmm7 ;137.10 cvtdq2ps xmm1, xmm4 ;137.10 movaps xmm15, xmm1 ;137.10 movaps xmm2, xmm1 ;137.10 cmpltps xmm15, xmm9 ;137.10 andps xmm3, xmm15 ;137.10 andnps xmm15, xmm0 ;137.10 orps xmm3, xmm15 ;137.10 cmpltps xmm2, xmm12 ;137.10 movaps xmm15, xmm2 ;137.10 andps xmm15, xmm0 ;137.10 movaps xmm0, XMMWORD PTR _2il0floatpacket$11(rip) ;137.10 cmpeqps xmm0, xmm1 ;137.10 cmpeqps xmm1, xmm10 ;137.10 orps xmm0, xmm1 ;137.10 movaps xmm1, xmm7 ;137.10 andps xmm1, xmm0 ;137.10 andnps xmm0, xmm8 ;137.10 orps xmm1, xmm0 ;137.10 movdqa xmm0, XMMWORD PTR _2il0floatpacket$13(rip) ;137.10 andnps xmm2, xmm1 ;137.10 orps xmm15, xmm2 ;137.10 pand xmm0, xmm4 ;137.10 cvtdq2ps xmm2, xmm0 ;137.10 pand xmm4, xmm14 ;137.10 cvtdq2ps xmm1, xmm4 ;137.10 xorps xmm0, xmm0 ;137.10 xorps xmm4, xmm4 ;137.10 cmpeqps xmm2, xmm0 ;137.10 movaps xmm0, xmm3 ;137.10 andps xmm0, xmm2 ;137.10 subps xmm4, xmm3 ;137.10 andnps xmm2, xmm4 ;137.10 orps xmm0, xmm2 ;137.10 xorps xmm2, xmm2 ;137.10 cmpeqps xmm1, xmm2 ;137.10 movaps xmm3, xmm15 ;137.10 andps xmm3, xmm1 ;137.10 xorps xmm2, xmm2 ;137.10 subps xmm2, xmm15 ;137.10 andnps xmm1, xmm2 ;137.10 movaps xmm2, XMMWORD PTR [rsp+384] ;137.10 orps xmm3, xmm1 ;137.10 movaps xmm4, xmm11 ;137.10 movaps xmm1, XMMWORD PTR _2il0floatpacket$11(rip) ;137.10 addps xmm0, xmm3 ;137.10 movaps xmm3, XMMWORD PTR [rsp+288] ;137.10 subps xmm0, xmm5 ;137.10 mulps xmm0, xmm3 ;137.10 addps xmm5, xmm0 ;137.10 movaps XMMWORD PTR [rsp+464], xmm5 ;137.10 cvtdq2ps xmm5, xmm6 ;137.10 movaps xmm0, xmm5 ;137.10 movaps xmm15, xmm5 ;137.10 cmpltps xmm0, xmm9 ;137.10 andps xmm4, xmm0 ;137.10 andnps xmm0, xmm2 ;137.10 orps xmm4, xmm0 ;137.10 cmpltps xmm15, xmm12 ;137.10 movaps xmm0, xmm15 ;137.10 andps xmm0, xmm2 ;137.10 cmpeqps xmm1, xmm5 ;137.10 cmpeqps xmm5, xmm10 ;137.10 orps xmm1, xmm5 ;137.10 movaps xmm5, xmm11 ;137.10 andps xmm5, xmm1 ;137.10 andnps xmm1, xmm8 ;137.10 orps xmm5, xmm1 ;137.10 movdqa xmm1, XMMWORD PTR _2il0floatpacket$13(rip) ;137.10 andnps xmm15, xmm5 ;137.10 orps xmm0, xmm15 ;137.10 pand xmm1, xmm6 ;137.10 cvtdq2ps xmm15, xmm1 ;137.10 pand xmm6, xmm14 ;137.10 cvtdq2ps xmm5, xmm6 ;137.10 xorps xmm1, xmm1 ;137.10 xorps xmm6, xmm6 ;137.10 cmpeqps xmm15, xmm1 ;137.10 movaps xmm1, xmm4 ;137.10 andps xmm1, xmm15 ;137.10 subps xmm6, xmm4 ;137.10 andnps xmm15, xmm6 ;137.10 orps xmm1, xmm15 ;137.10 xorps xmm4, xmm4 ;137.10 cmpeqps xmm5, xmm4 ;137.10 movaps xmm6, xmm0 ;137.10 andps xmm6, xmm5 ;137.10 xorps xmm4, xmm4 ;137.10 subps xmm4, xmm0 ;137.10 andnps xmm5, xmm4 ;137.10 orps xmm6, xmm5 ;137.10 movdqa xmm5, XMMWORD PTR [rsp+368] ;137.10 pand xmm5, xmm13 ;137.10 addps xmm1, xmm6 ;137.10 movaps xmm6, xmm7 ;137.10 cvtdq2ps xmm0, xmm5 ;137.10 movaps xmm15, xmm0 ;137.10 movaps xmm4, xmm0 ;137.10 cmpltps xmm15, xmm9 ;137.10 andps xmm6, xmm15 ;137.10 andnps xmm15, xmm2 ;137.10 orps xmm6, xmm15 ;137.10 cmpltps xmm4, xmm12 ;137.10 movaps xmm15, xmm4 ;137.10 andps xmm15, xmm2 ;137.10 movaps xmm2, XMMWORD PTR _2il0floatpacket$11(rip) ;137.10 cmpeqps xmm2, xmm0 ;137.10 cmpeqps xmm0, xmm10 ;137.10 orps xmm2, xmm0 ;137.10 movaps xmm0, xmm7 ;137.10 andps xmm0, xmm2 ;137.10 andnps xmm2, xmm8 ;137.10 orps xmm0, xmm2 ;137.10 andnps xmm4, xmm0 ;137.10 movdqa xmm0, XMMWORD PTR _2il0floatpacket$13(rip) ;137.10 orps xmm15, xmm4 ;137.10 pand xmm0, xmm5 ;137.10 cvtdq2ps xmm4, xmm0 ;137.10 pand xmm5, xmm14 ;137.10 cvtdq2ps xmm2, xmm5 ;137.10 xorps xmm0, xmm0 ;137.10 xorps xmm5, xmm5 ;137.10 cmpeqps xmm4, xmm0 ;137.10 movaps xmm0, xmm6 ;137.10 andps xmm0, xmm4 ;137.10 subps xmm5, xmm6 ;137.10 andnps xmm4, xmm5 ;137.10 orps xmm0, xmm4 ;137.10 xorps xmm4, xmm4 ;137.10 cmpeqps xmm2, xmm4 ;137.10 movaps xmm5, xmm15 ;137.10 andps xmm5, xmm2 ;137.10 xorps xmm4, xmm4 ;137.10 subps xmm4, xmm15 ;137.10 movdqa xmm15, XMMWORD PTR [rsp+400] ;137.10 andnps xmm2, xmm4 ;137.10 movaps xmm4, XMMWORD PTR [rsp+336] ;137.10 orps xmm5, xmm2 ;137.10 addps xmm0, xmm5 ;137.10 pand xmm15, xmm13 ;137.10 movaps xmm2, xmm11 ;137.10 movaps xmm5, XMMWORD PTR _2il0floatpacket$11(rip) ;137.10 subps xmm0, xmm1 ;137.10 mulps xmm0, xmm3 ;137.10 cvtdq2ps xmm8, xmm15 ;137.10 movaps xmm6, xmm8 ;137.10 addps xmm1, xmm0 ;137.10 movaps xmm0, XMMWORD PTR [rsp+464] ;137.10 subps xmm1, xmm0 ;137.10 mulps xmm1, XMMWORD PTR [rsp+352] ;137.10 cmpltps xmm6, xmm12 ;137.10 cmpeqps xmm5, xmm8 ;137.10 addps xmm0, xmm1 ;137.10 movaps XMMWORD PTR [rsp+464], xmm0 ;137.10 movaps xmm0, xmm8 ;137.10 movaps xmm1, xmm6 ;137.10 andps xmm1, xmm4 ;137.10 cmpltps xmm0, xmm9 ;137.10 andps xmm2, xmm0 ;137.10 andnps xmm0, xmm4 ;137.10 orps xmm2, xmm0 ;137.10 movaps xmm0, XMMWORD PTR [rsp+208] ;137.10 cmpeqps xmm8, xmm10 ;137.10 orps xmm5, xmm8 ;137.10 movaps xmm8, xmm11 ;137.10 andps xmm8, xmm5 ;137.10 andnps xmm5, xmm0 ;137.10 orps xmm8, xmm5 ;137.10 movdqa xmm5, XMMWORD PTR _2il0floatpacket$13(rip) ;137.10 andnps xmm6, xmm8 ;137.10 orps xmm1, xmm6 ;137.10 pand xmm5, xmm15 ;137.10 cvtdq2ps xmm8, xmm5 ;137.10 pand xmm15, xmm14 ;137.10 cvtdq2ps xmm6, xmm15 ;137.10 xorps xmm5, xmm5 ;137.10 xorps xmm15, xmm15 ;137.10 cmpeqps xmm8, xmm5 ;137.10 movaps xmm5, xmm2 ;137.10 andps xmm5, xmm8 ;137.10 subps xmm15, xmm2 ;137.10 andnps xmm8, xmm15 ;137.10 orps xmm5, xmm8 ;137.10 xorps xmm2, xmm2 ;137.10 cmpeqps xmm6, xmm2 ;137.10 movaps xmm8, xmm1 ;137.10 andps xmm8, xmm6 ;137.10 xorps xmm2, xmm2 ;137.10 subps xmm2, xmm1 ;137.10 movdqa xmm1, XMMWORD PTR [rsp+416] ;137.10 andnps xmm6, xmm2 ;137.10 orps xmm8, xmm6 ;137.10 pand xmm1, xmm13 ;137.10 addps xmm5, xmm8 ;137.10 movaps xmm2, xmm7 ;137.10 cvtdq2ps xmm15, xmm1 ;137.10 movaps xmm6, xmm15 ;137.10 movaps xmm8, xmm15 ;137.10 cmpltps xmm6, xmm9 ;137.10 andps xmm2, xmm6 ;137.10 andnps xmm6, xmm4 ;137.10 orps xmm2, xmm6 ;137.10 movaps xmm6, XMMWORD PTR _2il0floatpacket$11(rip) ;137.10 cmpltps xmm8, xmm12 ;137.10 andps xmm4, xmm8 ;137.10 cmpeqps xmm6, xmm15 ;137.10 cmpeqps xmm15, xmm10 ;137.10 orps xmm6, xmm15 ;137.10 movaps xmm15, xmm7 ;137.10 andps xmm15, xmm6 ;137.10 andnps xmm6, xmm0 ;137.10 orps xmm15, xmm6 ;137.10 movdqa xmm6, XMMWORD PTR _2il0floatpacket$13(rip) ;137.10 andnps xmm8, xmm15 ;137.10 orps xmm4, xmm8 ;137.10 pand xmm6, xmm1 ;137.10 cvtdq2ps xmm8, xmm6 ;137.10 pand xmm1, xmm14 ;137.10 cvtdq2ps xmm6, xmm1 ;137.10 xorps xmm1, xmm1 ;137.10 xorps xmm15, xmm15 ;137.10 cmpeqps xmm8, xmm1 ;137.10 movaps xmm1, xmm2 ;137.10 andps xmm1, xmm8 ;137.10 subps xmm15, xmm2 ;137.10 andnps xmm8, xmm15 ;137.10 orps xmm1, xmm8 ;137.10 xorps xmm2, xmm2 ;137.10 cmpeqps xmm6, xmm2 ;137.10 movaps xmm8, xmm4 ;137.10 andps xmm8, xmm6 ;137.10 xorps xmm2, xmm2 ;137.10 subps xmm2, xmm4 ;137.10 andnps xmm6, xmm2 ;137.10 orps xmm8, xmm6 ;137.10 movaps xmm6, XMMWORD PTR _2il0floatpacket$11(rip) ;137.10 movaps xmm4, xmm11 ;137.10 addps xmm1, xmm8 ;137.10 subps xmm1, xmm5 ;137.10 mulps xmm1, xmm3 ;137.10 addps xmm5, xmm1 ;137.10 movdqa xmm1, XMMWORD PTR [rsp+432] ;137.10 movaps XMMWORD PTR [rsp+400], xmm5 ;137.10 pand xmm1, xmm13 ;137.10 movaps xmm5, XMMWORD PTR [rsp+384] ;137.10 cvtdq2ps xmm8, xmm1 ;137.10 movaps xmm2, xmm8 ;137.10 movaps xmm15, xmm8 ;137.10 cmpltps xmm2, xmm9 ;137.10 andps xmm4, xmm2 ;137.10 andnps xmm2, xmm5 ;137.10 orps xmm4, xmm2 ;137.10 cmpltps xmm15, xmm12 ;137.10 movaps xmm2, xmm15 ;137.10 andps xmm2, xmm5 ;137.10 cmpeqps xmm6, xmm8 ;137.10 cmpeqps xmm8, xmm10 ;137.10 orps xmm6, xmm8 ;137.10 andps xmm11, xmm6 ;137.10 andnps xmm6, xmm0 ;137.10 orps xmm11, xmm6 ;137.10 movdqa xmm6, XMMWORD PTR _2il0floatpacket$13(rip) ;137.10 andnps xmm15, xmm11 ;137.10 orps xmm2, xmm15 ;137.10 pand xmm6, xmm1 ;137.10 cvtdq2ps xmm6, xmm6 ;137.10 pand xmm1, xmm14 ;137.10 xorps xmm8, xmm8 ;137.10 cvtdq2ps xmm1, xmm1 ;137.10 xorps xmm11, xmm11 ;137.10 cmpeqps xmm6, xmm8 ;137.10 movaps xmm8, xmm4 ;137.10 andps xmm8, xmm6 ;137.10 subps xmm11, xmm4 ;137.10 andnps xmm6, xmm11 ;137.10 orps xmm8, xmm6 ;137.10 xorps xmm4, xmm4 ;137.10 cmpeqps xmm1, xmm4 ;137.10 movaps xmm6, xmm2 ;137.10 andps xmm6, xmm1 ;137.10 xorps xmm4, xmm4 ;137.10 subps xmm4, xmm2 ;137.10 movdqa xmm2, XMMWORD PTR [rsp+448] ;137.10 andnps xmm1, xmm4 ;137.10 orps xmm6, xmm1 ;137.10 pand xmm2, xmm13 ;137.10 addps xmm8, xmm6 ;137.10 movaps xmm1, xmm7 ;137.10 cvtdq2ps xmm6, xmm2 ;137.10 movaps xmm4, xmm6 ;137.10 cmpltps xmm4, xmm9 ;137.10 movaps xmm9, xmm6 ;137.10 andps xmm1, xmm4 ;137.10 andnps xmm4, xmm5 ;137.10 cmpltps xmm9, xmm12 ;137.10 orps xmm1, xmm4 ;137.10 movaps xmm4, XMMWORD PTR _2il0floatpacket$11(rip) ;137.10 andps xmm5, xmm9 ;137.10 cmpeqps xmm4, xmm6 ;137.10 cmpeqps xmm6, xmm10 ;137.10 orps xmm4, xmm6 ;137.10 andps xmm7, xmm4 ;137.10 andnps xmm4, xmm0 ;137.10 movdqa xmm0, XMMWORD PTR _2il0floatpacket$13(rip) ;137.10 orps xmm7, xmm4 ;137.10 andnps xmm9, xmm7 ;137.10 orps xmm5, xmm9 ;137.10 pand xmm0, xmm2 ;137.10 cvtdq2ps xmm4, xmm0 ;137.10 pand xmm2, xmm14 ;137.10 xorps xmm0, xmm0 ;137.10 cvtdq2ps xmm2, xmm2 ;137.10 movaps xmm6, xmm1 ;137.10 cmpeqps xmm4, xmm0 ;137.10 andps xmm6, xmm4 ;137.10 xorps xmm0, xmm0 ;137.10 subps xmm0, xmm1 ;137.10 andnps xmm4, xmm0 ;137.10 orps xmm6, xmm4 ;137.10 xorps xmm0, xmm0 ;137.10 cmpeqps xmm2, xmm0 ;137.10 movaps xmm1, xmm5 ;137.10 andps xmm1, xmm2 ;137.10 xorps xmm0, xmm0 ;137.10 subps xmm0, xmm5 ;137.10 movaps xmm5, XMMWORD PTR [rsp+464] ;137.10 andnps xmm2, xmm0 ;137.10 movaps xmm0, XMMWORD PTR [rsp+400] ;137.10 orps xmm1, xmm2 ;137.10 addps xmm6, xmm1 ;137.10 subps xmm6, xmm8 ;137.10 mulps xmm6, xmm3 ;137.10 addps xmm8, xmm6 ;137.10 subps xmm8, xmm0 ;137.10 mulps xmm8, XMMWORD PTR [rsp+352] ;137.10 addps xmm0, xmm8 ;137.10 subps xmm0, xmm5 ;137.10 mulps xmm0, XMMWORD PTR [rsp+224] ;137.10 addps xmm5, xmm0 ;137.10 divps xmm5, XMMWORD PTR _2il0floatpacket$14(rip) ;137.10 So has anyone got ideas or half-baked code to start from? Might even be in C. Can anyone tell me why is this different or is it? |
|||
31 Jul 2008, 22:26 |
|
tom tobias 01 Aug 2008, 09:00
Thank you, Madis, for teaching us about Perlin Noise, a complete novelty for me, at least. I was impressed with the last link's list of potential applications in 1,2,3 & 4 dimensions. I don't know if one could say the code above is "ugly", but it is certainly unintuitive. To address your question about whether or not hugo elias' applications of Perlin Noise are distinctive from Perlin's own applications, why not send them both an inquiry? Since Perlin teaches at NYU, and Elias' web page has not been updated for the past five years, it could well be the case that these guys are not going to be offended by your inquiry--they may even be willing to share their code, or offer some assistance to you, on one of your many interesting projects....
|
|||
01 Aug 2008, 09:00 |
|
Madis731 01 Aug 2008, 10:35
tom tobias wrote: send them both an inquiry? Yeah, I was planning on doing that exact thing, but I thought people on this board will be less offended and might come to an asm realization of the code quicker. I'm just fishing here for an easy way out, but if there really aren't any helping leads soon, I will contact them about this optimization challenge Actually I'm quite amazed that there is so little info about this great algorithm. Its variants are used here and here and look at the astonishing images a few bytes are capable of doing here. Perlin is behind all of these! |
|||
01 Aug 2008, 10:35 |
|
Madis731 15 Aug 2008, 07:27
About octaves I've calculated the maximum to achieve is:
Octaves = log(2,Image_side_in_pixels) so 12 octaves would be an overkill even for 4096x4096px image. I've used 256x256 as the image and 6 or 7 octaves. Okay, but atleast we're getting somewhere. With float (as Perlin itself did) you have all the precision you need and I think its not always needed. When you go integers, you start experiencing rounding errors (unless you are really careful). The fixed point would be the way to go and then you can even change the precision on the fly without losing penalty to SSE flt=>int int=>flt conversions. The ideal I'd like to see is the same number format all through the algorithm. If its not possible, then I have to kill all the latency by hiding them between faster non-dependent instructions. |
|||
15 Aug 2008, 07:27 |
|
Madis731 15 Aug 2008, 11:25
Okay something solid for you. Too much theory already spoken :S
Code: format binary as "BMP" ;BMP header db 0x42, 0x4D, 0x36, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x36, 0x00 db 0x00, 0x00, 0x28, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01 db 0x00, 0x00, 0x01, 0x00, 0x18, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 db 0x03, 0x00, 0x13, 0x0B, 0x00, 0x00, 0x13, 0x0B, 0x00, 0x00, 0x00, 0x00 db 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 B32 equ and 0FFFFFFFFh ;clip to 32-bit space!!! FASM uses 64-bit macro RND a { x=(a shl 13 xor a) B32 x=((((x*x) B32)*15731) B32+789221)*x+1376312589 a=(x shr 16) and 255 } repeat 256*256 z=0 rept 7 octave ;FASM doesn't allow 7 to be var instead { a=% and (not ((1 shl octave -1)*0101h)) RND a z=z+a } a=z/7 db a,a,a ;R=G=B end repeat The above code is a sample I'm playing with. You need to manually change the 7 mentioned throughout the code (two places) to change the behaviour.
|
||||||||||
15 Aug 2008, 11:25 |
|
bitRAKE 15 Aug 2008, 15:08
A little less blocky:
Code: repeat 256*256 z=0 rept 7 octave { a=% and (not ((1 shl octave -1)*0101h)) RND a a=a shr ((octave-1)/3) z=z+a } a=z/5 db a,a,a ;R=G=B end repeat
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
||||||||||
15 Aug 2008, 15:08 |
|
Madis731 15 Aug 2008, 18:11
Actually the blocks should be removed with interpolation which I haven't done in FASM-macro-assisted generator.
If my calcs are correct this non-blocky code adds ~20% to the time, but since I'm dealing with .1 seconds, I cannot be accurate enough. I will make some more tests and smooth the output. PS. Sorry for long delays on this topic. I meant to research this theme a lot, but work and vacation intercepted. I will get back to this... |
|||
15 Aug 2008, 18:11 |
|
tom tobias 15 Aug 2008, 22:14
Madis wrote: I will get back to this... |
|||
15 Aug 2008, 22:14 |
|
bitRAKE 17 Sep 2008, 19:02
I had an idea for some Perlin type noise, but it is not so good.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||||||||||
17 Sep 2008, 19:02 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.