flat assembler
Message board for the users of flat assembler.

 Index > Main > Perlin noise, the making of
Author

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Lets start off easy, this sample is a Java implementation of what I call "Perlin in a square" 'cuz of its formatting. Has some weird stuff, that I don't understand, but I get the jist.

The problem is there just isn't a good way to solve it in ASM. Or maybe in any language. It has got mixed integer, float stuff and LUT that you can't SIMD properly. Well it really is fast (compared to previous versions), but all this code is for ONE PIXEL!

Okay, so what else can we find on the Internet (Google...). There's a FAQ that describes somewhat, what's behind. This guy is in the middle of making a really impressive game (billions of purely generated planets) and he has optimized it a bit (though SSE4.1 has got rounding that makes the conversions a bit easier).

It wasn't until this code that I started to ponder about SSE implementation (more functions here).

To get the very true nature of this noise, there's one sentence that I came up with:
Pick some pseudo-random values, space them evenly on a pixel grid and interpolate other pixels using weighed average.
For the sake of clarity, lets deal only with 2D now!
Lets meet the words:
1) random - we're playing with a range of 0..255 (black to white)
2) evenly - random pixel every 2^n (where n<N) pixels, where a generated image is 2^N*2^N pixels
3) weighed - if it were linear, the image would look like pyramids, but we'll use what's called the "ease curve" on the FAQ.

I'm not very goot at C, but the tests that I made with Intel's otherwise perfect SIMDer compiler concluded that Perlin's noise function is too big of a nut to crack. So I figure I need to do it manually. Here's the Intel's version of hand-made (robotbastards code) intrinsics code. All inlined because you can't put SSE in a function (i.e. its really hard). Ugly heh?
Code:
```;438 lines - prologue and epilogue stripped
movaps    xmm2, XMMWORD PTR _2il0floatpacket\$3(rip)     ;137.10
movaps    xmm1, xmm0                                    ;137.10
subps     xmm1, XMMWORD PTR _2il0floatpacket\$1(rip)     ;137.10
movdqa    xmm6, XMMWORD PTR [rsp+192]                   ;137.10
movaps    xmm11, XMMWORD PTR [rsp+304]                  ;137.10
movaps    xmm8, XMMWORD PTR [rsp+256]                   ;137.10
cvtps2dq  xmm3, xmm1                                    ;137.10
movdqa    xmm13, XMMWORD PTR _2il0floatpacket\$8(rip)    ;137.10
cvtdq2ps  xmm1, xmm3                                    ;137.10
pand      xmm3, XMMWORD PTR _2il0floatpacket\$2(rip)     ;137.10
paddd     xmm3, XMMWORD PTR [rsp+272]                   ;137.10
movaps    xmm9, XMMWORD PTR _2il0floatpacket\$9(rip)     ;137.10
subps     xmm0, xmm1                                    ;137.10
movaps    xmm12, XMMWORD PTR _2il0floatpacket\$10(rip)   ;137.10
movaps    xmm10, XMMWORD PTR _2il0floatpacket\$12(rip)   ;137.10
mulps     xmm2, xmm0                                    ;137.10
subps     xmm2, XMMWORD PTR _2il0floatpacket\$4(rip)     ;137.10
movaps    XMMWORD PTR [rsp+336], xmm0                   ;137.10
movaps    xmm1, xmm0                                    ;137.10
mulps     xmm1, xmm0                                    ;137.10
mulps     xmm2, xmm0                                    ;137.10
addps     xmm2, XMMWORD PTR _2il0floatpacket\$5(rip)     ;137.10
mulps     xmm1, xmm0                                    ;137.10
mulps     xmm2, xmm1                                    ;137.10
movdqa    xmm1, XMMWORD PTR [rsp+240]                   ;137.10
movdqa    xmm5, xmm1                                    ;137.10
movaps    XMMWORD PTR [rsp+352], xmm2                   ;137.10
movdqa    xmm4, xmm1                                    ;137.10
movdqa    XMMWORD PTR [rsp+368], xmm3                   ;137.10
movaps    xmm1, xmm0                                    ;137.10
subps     xmm1, XMMWORD PTR _2il0floatpacket\$7(rip)     ;137.10
movaps    XMMWORD PTR [rsp+384], xmm1                   ;137.10
movdqa    xmm1, xmm14                                   ;137.10
pand      xmm5, xmm13                                   ;137.10
cvtdq2ps  xmm7, xmm5                                    ;137.10
movaps    xmm2, xmm7                                    ;137.10
movdqa    XMMWORD PTR [rsp+400], xmm1                   ;137.10
movdqa    xmm1, xmm14                                   ;137.10
cmpltps   xmm2, xmm9                                    ;137.10
pand      xmm4, xmm13                                   ;137.10
movdqa    XMMWORD PTR [rsp+416], xmm1                   ;137.10
movdqa    xmm1, xmm14                                   ;137.10
pand      xmm6, xmm13                                   ;137.10
movdqa    XMMWORD PTR [rsp+432], xmm1                   ;137.10
movdqa    xmm1, xmm14                                   ;137.10
movaps    xmm3, xmm7                                    ;137.10
cmpltps   xmm3, xmm12                                   ;137.10
movaps    xmm15, xmm3                                   ;137.10
movdqa    XMMWORD PTR [rsp+448], xmm1                   ;137.10
movaps    xmm1, xmm11                                   ;137.10
andps     xmm1, xmm2                                    ;137.10
andnps    xmm2, xmm0                                    ;137.10
orps      xmm1, xmm2                                    ;137.10
movaps    xmm2, XMMWORD PTR _2il0floatpacket\$11(rip)    ;137.10
andps     xmm15, xmm0                                   ;137.10
cmpeqps   xmm2, xmm7                                    ;137.10
cmpeqps   xmm7, xmm10                                   ;137.10
orps      xmm2, xmm7                                    ;137.10
movaps    xmm7, xmm11                                   ;137.10
andps     xmm7, xmm2                                    ;137.10
andnps    xmm2, xmm8                                    ;137.10
orps      xmm7, xmm2                                    ;137.10
movdqa    xmm2, XMMWORD PTR _2il0floatpacket\$13(rip)    ;137.10
andnps    xmm3, xmm7                                    ;137.10
orps      xmm15, xmm3                                   ;137.10
pand      xmm2, xmm5                                    ;137.10
cvtdq2ps  xmm3, xmm2                                    ;137.10
pand      xmm5, xmm14                                   ;137.10
cvtdq2ps  xmm2, xmm5                                    ;137.10
xorps     xmm5, xmm5                                    ;137.10
xorps     xmm7, xmm7                                    ;137.10
cmpeqps   xmm3, xmm5                                    ;137.10
movaps    xmm5, xmm1                                    ;137.10
andps     xmm5, xmm3                                    ;137.10
subps     xmm7, xmm1                                    ;137.10
andnps    xmm3, xmm7                                    ;137.10
movaps    xmm7, XMMWORD PTR [rsp+320]                   ;137.10
orps      xmm5, xmm3                                    ;137.10
xorps     xmm1, xmm1                                    ;137.10
cmpeqps   xmm2, xmm1                                    ;137.10
movaps    xmm3, xmm15                                   ;137.10
andps     xmm3, xmm2                                    ;137.10
xorps     xmm1, xmm1                                    ;137.10
subps     xmm1, xmm15                                   ;137.10
andnps    xmm2, xmm1                                    ;137.10
orps      xmm3, xmm2                                    ;137.10
movaps    xmm3, xmm7                                    ;137.10
cvtdq2ps  xmm1, xmm4                                    ;137.10
movaps    xmm15, xmm1                                   ;137.10
movaps    xmm2, xmm1                                    ;137.10
cmpltps   xmm15, xmm9                                   ;137.10
andps     xmm3, xmm15                                   ;137.10
andnps    xmm15, xmm0                                   ;137.10
orps      xmm3, xmm15                                   ;137.10
cmpltps   xmm2, xmm12                                   ;137.10
movaps    xmm15, xmm2                                   ;137.10
andps     xmm15, xmm0                                   ;137.10
movaps    xmm0, XMMWORD PTR _2il0floatpacket\$11(rip)    ;137.10
cmpeqps   xmm0, xmm1                                    ;137.10
cmpeqps   xmm1, xmm10                                   ;137.10
orps      xmm0, xmm1                                    ;137.10
movaps    xmm1, xmm7                                    ;137.10
andps     xmm1, xmm0                                    ;137.10
andnps    xmm0, xmm8                                    ;137.10
orps      xmm1, xmm0                                    ;137.10
movdqa    xmm0, XMMWORD PTR _2il0floatpacket\$13(rip)    ;137.10
andnps    xmm2, xmm1                                    ;137.10
orps      xmm15, xmm2                                   ;137.10
pand      xmm0, xmm4                                    ;137.10
cvtdq2ps  xmm2, xmm0                                    ;137.10
pand      xmm4, xmm14                                   ;137.10
cvtdq2ps  xmm1, xmm4                                    ;137.10
xorps     xmm0, xmm0                                    ;137.10
xorps     xmm4, xmm4                                    ;137.10
cmpeqps   xmm2, xmm0                                    ;137.10
movaps    xmm0, xmm3                                    ;137.10
andps     xmm0, xmm2                                    ;137.10
subps     xmm4, xmm3                                    ;137.10
andnps    xmm2, xmm4                                    ;137.10
orps      xmm0, xmm2                                    ;137.10
xorps     xmm2, xmm2                                    ;137.10
cmpeqps   xmm1, xmm2                                    ;137.10
movaps    xmm3, xmm15                                   ;137.10
andps     xmm3, xmm1                                    ;137.10
xorps     xmm2, xmm2                                    ;137.10
subps     xmm2, xmm15                                   ;137.10
andnps    xmm1, xmm2                                    ;137.10
movaps    xmm2, XMMWORD PTR [rsp+384]                   ;137.10
orps      xmm3, xmm1                                    ;137.10
movaps    xmm4, xmm11                                   ;137.10
movaps    xmm1, XMMWORD PTR _2il0floatpacket\$11(rip)    ;137.10
movaps    xmm3, XMMWORD PTR [rsp+288]                   ;137.10
subps     xmm0, xmm5                                    ;137.10
mulps     xmm0, xmm3                                    ;137.10
movaps    XMMWORD PTR [rsp+464], xmm5                   ;137.10
cvtdq2ps  xmm5, xmm6                                    ;137.10
movaps    xmm0, xmm5                                    ;137.10
movaps    xmm15, xmm5                                   ;137.10
cmpltps   xmm0, xmm9                                    ;137.10
andps     xmm4, xmm0                                    ;137.10
andnps    xmm0, xmm2                                    ;137.10
orps      xmm4, xmm0                                    ;137.10
cmpltps   xmm15, xmm12                                  ;137.10
movaps    xmm0, xmm15                                   ;137.10
andps     xmm0, xmm2                                    ;137.10
cmpeqps   xmm1, xmm5                                    ;137.10
cmpeqps   xmm5, xmm10                                   ;137.10
orps      xmm1, xmm5                                    ;137.10
movaps    xmm5, xmm11                                   ;137.10
andps     xmm5, xmm1                                    ;137.10
andnps    xmm1, xmm8                                    ;137.10
orps      xmm5, xmm1                                    ;137.10
movdqa    xmm1, XMMWORD PTR _2il0floatpacket\$13(rip)    ;137.10
andnps    xmm15, xmm5                                   ;137.10
orps      xmm0, xmm15                                   ;137.10
pand      xmm1, xmm6                                    ;137.10
cvtdq2ps  xmm15, xmm1                                   ;137.10
pand      xmm6, xmm14                                   ;137.10
cvtdq2ps  xmm5, xmm6                                    ;137.10
xorps     xmm1, xmm1                                    ;137.10
xorps     xmm6, xmm6                                    ;137.10
cmpeqps   xmm15, xmm1                                   ;137.10
movaps    xmm1, xmm4                                    ;137.10
andps     xmm1, xmm15                                   ;137.10
subps     xmm6, xmm4                                    ;137.10
andnps    xmm15, xmm6                                   ;137.10
orps      xmm1, xmm15                                   ;137.10
xorps     xmm4, xmm4                                    ;137.10
cmpeqps   xmm5, xmm4                                    ;137.10
movaps    xmm6, xmm0                                    ;137.10
andps     xmm6, xmm5                                    ;137.10
xorps     xmm4, xmm4                                    ;137.10
subps     xmm4, xmm0                                    ;137.10
andnps    xmm5, xmm4                                    ;137.10
orps      xmm6, xmm5                                    ;137.10
movdqa    xmm5, XMMWORD PTR [rsp+368]                   ;137.10
pand      xmm5, xmm13                                   ;137.10
movaps    xmm6, xmm7                                    ;137.10
cvtdq2ps  xmm0, xmm5                                    ;137.10
movaps    xmm15, xmm0                                   ;137.10
movaps    xmm4, xmm0                                    ;137.10
cmpltps   xmm15, xmm9                                   ;137.10
andps     xmm6, xmm15                                   ;137.10
andnps    xmm15, xmm2                                   ;137.10
orps      xmm6, xmm15                                   ;137.10
cmpltps   xmm4, xmm12                                   ;137.10
movaps    xmm15, xmm4                                   ;137.10
andps     xmm15, xmm2                                   ;137.10
movaps    xmm2, XMMWORD PTR _2il0floatpacket\$11(rip)    ;137.10
cmpeqps   xmm2, xmm0                                    ;137.10
cmpeqps   xmm0, xmm10                                   ;137.10
orps      xmm2, xmm0                                    ;137.10
movaps    xmm0, xmm7                                    ;137.10
andps     xmm0, xmm2                                    ;137.10
andnps    xmm2, xmm8                                    ;137.10
orps      xmm0, xmm2                                    ;137.10
andnps    xmm4, xmm0                                    ;137.10
movdqa    xmm0, XMMWORD PTR _2il0floatpacket\$13(rip)    ;137.10
orps      xmm15, xmm4                                   ;137.10
pand      xmm0, xmm5                                    ;137.10
cvtdq2ps  xmm4, xmm0                                    ;137.10
pand      xmm5, xmm14                                   ;137.10
cvtdq2ps  xmm2, xmm5                                    ;137.10
xorps     xmm0, xmm0                                    ;137.10
xorps     xmm5, xmm5                                    ;137.10
cmpeqps   xmm4, xmm0                                    ;137.10
movaps    xmm0, xmm6                                    ;137.10
andps     xmm0, xmm4                                    ;137.10
subps     xmm5, xmm6                                    ;137.10
andnps    xmm4, xmm5                                    ;137.10
orps      xmm0, xmm4                                    ;137.10
xorps     xmm4, xmm4                                    ;137.10
cmpeqps   xmm2, xmm4                                    ;137.10
movaps    xmm5, xmm15                                   ;137.10
andps     xmm5, xmm2                                    ;137.10
xorps     xmm4, xmm4                                    ;137.10
subps     xmm4, xmm15                                   ;137.10
movdqa    xmm15, XMMWORD PTR [rsp+400]                  ;137.10
andnps    xmm2, xmm4                                    ;137.10
movaps    xmm4, XMMWORD PTR [rsp+336]                   ;137.10
orps      xmm5, xmm2                                    ;137.10
pand      xmm15, xmm13                                  ;137.10
movaps    xmm2, xmm11                                   ;137.10
movaps    xmm5, XMMWORD PTR _2il0floatpacket\$11(rip)    ;137.10
subps     xmm0, xmm1                                    ;137.10
mulps     xmm0, xmm3                                    ;137.10
cvtdq2ps  xmm8, xmm15                                   ;137.10
movaps    xmm6, xmm8                                    ;137.10
movaps    xmm0, XMMWORD PTR [rsp+464]                   ;137.10
subps     xmm1, xmm0                                    ;137.10
mulps     xmm1, XMMWORD PTR [rsp+352]                   ;137.10
cmpltps   xmm6, xmm12                                   ;137.10
cmpeqps   xmm5, xmm8                                    ;137.10
movaps    XMMWORD PTR [rsp+464], xmm0                   ;137.10
movaps    xmm0, xmm8                                    ;137.10
movaps    xmm1, xmm6                                    ;137.10
andps     xmm1, xmm4                                    ;137.10
cmpltps   xmm0, xmm9                                    ;137.10
andps     xmm2, xmm0                                    ;137.10
andnps    xmm0, xmm4                                    ;137.10
orps      xmm2, xmm0                                    ;137.10
movaps    xmm0, XMMWORD PTR [rsp+208]                   ;137.10
cmpeqps   xmm8, xmm10                                   ;137.10
orps      xmm5, xmm8                                    ;137.10
movaps    xmm8, xmm11                                   ;137.10
andps     xmm8, xmm5                                    ;137.10
andnps    xmm5, xmm0                                    ;137.10
orps      xmm8, xmm5                                    ;137.10
movdqa    xmm5, XMMWORD PTR _2il0floatpacket\$13(rip)    ;137.10
andnps    xmm6, xmm8                                    ;137.10
orps      xmm1, xmm6                                    ;137.10
pand      xmm5, xmm15                                   ;137.10
cvtdq2ps  xmm8, xmm5                                    ;137.10
pand      xmm15, xmm14                                  ;137.10
cvtdq2ps  xmm6, xmm15                                   ;137.10
xorps     xmm5, xmm5                                    ;137.10
xorps     xmm15, xmm15                                  ;137.10
cmpeqps   xmm8, xmm5                                    ;137.10
movaps    xmm5, xmm2                                    ;137.10
andps     xmm5, xmm8                                    ;137.10
subps     xmm15, xmm2                                   ;137.10
andnps    xmm8, xmm15                                   ;137.10
orps      xmm5, xmm8                                    ;137.10
xorps     xmm2, xmm2                                    ;137.10
cmpeqps   xmm6, xmm2                                    ;137.10
movaps    xmm8, xmm1                                    ;137.10
andps     xmm8, xmm6                                    ;137.10
xorps     xmm2, xmm2                                    ;137.10
subps     xmm2, xmm1                                    ;137.10
movdqa    xmm1, XMMWORD PTR [rsp+416]                   ;137.10
andnps    xmm6, xmm2                                    ;137.10
orps      xmm8, xmm6                                    ;137.10
pand      xmm1, xmm13                                   ;137.10
movaps    xmm2, xmm7                                    ;137.10
cvtdq2ps  xmm15, xmm1                                   ;137.10
movaps    xmm6, xmm15                                   ;137.10
movaps    xmm8, xmm15                                   ;137.10
cmpltps   xmm6, xmm9                                    ;137.10
andps     xmm2, xmm6                                    ;137.10
andnps    xmm6, xmm4                                    ;137.10
orps      xmm2, xmm6                                    ;137.10
movaps    xmm6, XMMWORD PTR _2il0floatpacket\$11(rip)    ;137.10
cmpltps   xmm8, xmm12                                   ;137.10
andps     xmm4, xmm8                                    ;137.10
cmpeqps   xmm6, xmm15                                   ;137.10
cmpeqps   xmm15, xmm10                                  ;137.10
orps      xmm6, xmm15                                   ;137.10
movaps    xmm15, xmm7                                   ;137.10
andps     xmm15, xmm6                                   ;137.10
andnps    xmm6, xmm0                                    ;137.10
orps      xmm15, xmm6                                   ;137.10
movdqa    xmm6, XMMWORD PTR _2il0floatpacket\$13(rip)    ;137.10
andnps    xmm8, xmm15                                   ;137.10
orps      xmm4, xmm8                                    ;137.10
pand      xmm6, xmm1                                    ;137.10
cvtdq2ps  xmm8, xmm6                                    ;137.10
pand      xmm1, xmm14                                   ;137.10
cvtdq2ps  xmm6, xmm1                                    ;137.10
xorps     xmm1, xmm1                                    ;137.10
xorps     xmm15, xmm15                                  ;137.10
cmpeqps   xmm8, xmm1                                    ;137.10
movaps    xmm1, xmm2                                    ;137.10
andps     xmm1, xmm8                                    ;137.10
subps     xmm15, xmm2                                   ;137.10
andnps    xmm8, xmm15                                   ;137.10
orps      xmm1, xmm8                                    ;137.10
xorps     xmm2, xmm2                                    ;137.10
cmpeqps   xmm6, xmm2                                    ;137.10
movaps    xmm8, xmm4                                    ;137.10
andps     xmm8, xmm6                                    ;137.10
xorps     xmm2, xmm2                                    ;137.10
subps     xmm2, xmm4                                    ;137.10
andnps    xmm6, xmm2                                    ;137.10
orps      xmm8, xmm6                                    ;137.10
movaps    xmm6, XMMWORD PTR _2il0floatpacket\$11(rip)    ;137.10
movaps    xmm4, xmm11                                   ;137.10
subps     xmm1, xmm5                                    ;137.10
mulps     xmm1, xmm3                                    ;137.10
movdqa    xmm1, XMMWORD PTR [rsp+432]                   ;137.10
movaps    XMMWORD PTR [rsp+400], xmm5                   ;137.10
pand      xmm1, xmm13                                   ;137.10
movaps    xmm5, XMMWORD PTR [rsp+384]                   ;137.10
cvtdq2ps  xmm8, xmm1                                    ;137.10
movaps    xmm2, xmm8                                    ;137.10
movaps    xmm15, xmm8                                   ;137.10
cmpltps   xmm2, xmm9                                    ;137.10
andps     xmm4, xmm2                                    ;137.10
andnps    xmm2, xmm5                                    ;137.10
orps      xmm4, xmm2                                    ;137.10
cmpltps   xmm15, xmm12                                  ;137.10
movaps    xmm2, xmm15                                   ;137.10
andps     xmm2, xmm5                                    ;137.10
cmpeqps   xmm6, xmm8                                    ;137.10
cmpeqps   xmm8, xmm10                                   ;137.10
orps      xmm6, xmm8                                    ;137.10
andps     xmm11, xmm6                                   ;137.10
andnps    xmm6, xmm0                                    ;137.10
orps      xmm11, xmm6                                   ;137.10
movdqa    xmm6, XMMWORD PTR _2il0floatpacket\$13(rip)    ;137.10
andnps    xmm15, xmm11                                  ;137.10
orps      xmm2, xmm15                                   ;137.10
pand      xmm6, xmm1                                    ;137.10
cvtdq2ps  xmm6, xmm6                                    ;137.10
pand      xmm1, xmm14                                   ;137.10
xorps     xmm8, xmm8                                    ;137.10
cvtdq2ps  xmm1, xmm1                                    ;137.10
xorps     xmm11, xmm11                                  ;137.10
cmpeqps   xmm6, xmm8                                    ;137.10
movaps    xmm8, xmm4                                    ;137.10
andps     xmm8, xmm6                                    ;137.10
subps     xmm11, xmm4                                   ;137.10
andnps    xmm6, xmm11                                   ;137.10
orps      xmm8, xmm6                                    ;137.10
xorps     xmm4, xmm4                                    ;137.10
cmpeqps   xmm1, xmm4                                    ;137.10
movaps    xmm6, xmm2                                    ;137.10
andps     xmm6, xmm1                                    ;137.10
xorps     xmm4, xmm4                                    ;137.10
subps     xmm4, xmm2                                    ;137.10
movdqa    xmm2, XMMWORD PTR [rsp+448]                   ;137.10
andnps    xmm1, xmm4                                    ;137.10
orps      xmm6, xmm1                                    ;137.10
pand      xmm2, xmm13                                   ;137.10
movaps    xmm1, xmm7                                    ;137.10
cvtdq2ps  xmm6, xmm2                                    ;137.10
movaps    xmm4, xmm6                                    ;137.10
cmpltps   xmm4, xmm9                                    ;137.10
movaps    xmm9, xmm6                                    ;137.10
andps     xmm1, xmm4                                    ;137.10
andnps    xmm4, xmm5                                    ;137.10
cmpltps   xmm9, xmm12                                   ;137.10
orps      xmm1, xmm4                                    ;137.10
movaps    xmm4, XMMWORD PTR _2il0floatpacket\$11(rip)    ;137.10
andps     xmm5, xmm9                                    ;137.10
cmpeqps   xmm4, xmm6                                    ;137.10
cmpeqps   xmm6, xmm10                                   ;137.10
orps      xmm4, xmm6                                    ;137.10
andps     xmm7, xmm4                                    ;137.10
andnps    xmm4, xmm0                                    ;137.10
movdqa    xmm0, XMMWORD PTR _2il0floatpacket\$13(rip)    ;137.10
orps      xmm7, xmm4                                    ;137.10
andnps    xmm9, xmm7                                    ;137.10
orps      xmm5, xmm9                                    ;137.10
pand      xmm0, xmm2                                    ;137.10
cvtdq2ps  xmm4, xmm0                                    ;137.10
pand      xmm2, xmm14                                   ;137.10
xorps     xmm0, xmm0                                    ;137.10
cvtdq2ps  xmm2, xmm2                                    ;137.10
movaps    xmm6, xmm1                                    ;137.10
cmpeqps   xmm4, xmm0                                    ;137.10
andps     xmm6, xmm4                                    ;137.10
xorps     xmm0, xmm0                                    ;137.10
subps     xmm0, xmm1                                    ;137.10
andnps    xmm4, xmm0                                    ;137.10
orps      xmm6, xmm4                                    ;137.10
xorps     xmm0, xmm0                                    ;137.10
cmpeqps   xmm2, xmm0                                    ;137.10
movaps    xmm1, xmm5                                    ;137.10
andps     xmm1, xmm2                                    ;137.10
xorps     xmm0, xmm0                                    ;137.10
subps     xmm0, xmm5                                    ;137.10
movaps    xmm5, XMMWORD PTR [rsp+464]                   ;137.10
andnps    xmm2, xmm0                                    ;137.10
movaps    xmm0, XMMWORD PTR [rsp+400]                   ;137.10
orps      xmm1, xmm2                                    ;137.10
subps     xmm6, xmm8                                    ;137.10
mulps     xmm6, xmm3                                    ;137.10
subps     xmm8, xmm0                                    ;137.10
mulps     xmm8, XMMWORD PTR [rsp+352]                   ;137.10
subps     xmm0, xmm5                                    ;137.10
mulps     xmm0, XMMWORD PTR [rsp+224]                   ;137.10
divps     xmm5, XMMWORD PTR _2il0floatpacket\$14(rip)    ;137.10
```

So has anyone got ideas or half-baked code to start from? Might even be in C.
Can anyone tell me why is this different or is it?

_________________
My updated idol http://www.agner.org/optimize/
31 Jul 2008, 22:26
tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 01 Aug 2008, 09:00
Thank you, Madis, for teaching us about Perlin Noise, a complete novelty for me, at least. I was impressed with the last link's list of potential applications in 1,2,3 & 4 dimensions. I don't know if one could say the code above is "ugly", but it is certainly unintuitive. To address your question about whether or not hugo elias' applications of Perlin Noise are distinctive from Perlin's own applications, why not send them both an inquiry? Since Perlin teaches at NYU, and Elias' web page has not been updated for the past five years, it could well be the case that these guys are not going to be offended by your inquiry--they may even be willing to share their code, or offer some assistance to you, on one of your many interesting projects....
01 Aug 2008, 09:00

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
tom tobias wrote:
send them both an inquiry?

Yeah, I was planning on doing that exact thing, but I thought people on this board will be less offended and might come to an asm realization of the code quicker. I'm just fishing here for an easy way out, but if there really aren't any helping leads soon, I will contact them about this optimization challenge

Actually I'm quite amazed that there is so little info about this great algorithm. Its variants are used here and here and look at the astonishing images a few bytes are capable of doing here. Perlin is behind all of these!
01 Aug 2008, 10:35
bitRAKE

Joined: 21 Jul 2003
Posts: 3971
Location: vpcmipstrm
bitRAKE 15 Aug 2008, 03:43
I'd do the whole thing with fixed point fractions [-1,1)=[\$8000,\$7FFF). Or DWORDs, but I don't think they are needed for a few octaves. Something like:
Code:
```        mov ebx,WIDTH - 1
mov edx,0
.0:
mov ecx,OCTAVES - 1
xor edx,ebx ; changed bits from last itteration
.4:     bt edx,ecx
jnc .5

; select new random number for octave[ecx]
; reset interpolation for octave[ecx]

.5:     dec ecx
jns .4

; interpolate octaves in parallel

; accumulate points (weighted?)

; output value

mov edx,ebx
dec ebx
jns .0    ```
...for the 1D Perlin Noise. The octaves would be in XMM registers. Type of interpolation would determine needed registers - 12 DWORD octaves would look really smooth. EBX is the (X) coordinate and the output is (Y). WIDTH and OCTAVES should be a power of two.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
15 Aug 2008, 03:43

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
About octaves I've calculated the maximum to achieve is:
Octaves = log(2,Image_side_in_pixels)
so 12 octaves would be an overkill even for 4096x4096px image.
I've used 256x256 as the image and 6 or 7 octaves.

Okay, but atleast we're getting somewhere. With float (as Perlin itself did) you have all the precision you need and I think its not always needed. When you go integers, you start experiencing rounding errors (unless you are really careful). The fixed point would be the way to go and then you can even change the precision on the fly without losing penalty to SSE flt=>int int=>flt conversions.

The ideal I'd like to see is the same number format all through the algorithm. If its not possible, then I have to kill all the latency by hiding them between faster non-dependent instructions.
15 Aug 2008, 07:27

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Okay something solid for you. Too much theory already spoken :S
Code:
```format binary as "BMP"
db 0x42, 0x4D, 0x36, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x36, 0x00
db 0x00, 0x00, 0x28, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01
db 0x00, 0x00, 0x01, 0x00, 0x18, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
db 0x03, 0x00, 0x13, 0x0B, 0x00, 0x00, 0x13, 0x0B, 0x00, 0x00, 0x00, 0x00
db 0x00, 0x00, 0x00, 0x00, 0x00, 0x00

B32 equ and 0FFFFFFFFh ;clip to 32-bit space!!! FASM uses 64-bit
macro RND a
{
x=(a shl 13 xor a) B32
x=((((x*x) B32)*15731) B32+789221)*x+1376312589
a=(x shr 16) and 255
}

repeat 256*256
z=0
rept 7 octave ;FASM doesn't allow 7 to be var instead
{
a=% and (not ((1 shl octave -1)*0101h))
RND a
z=z+a
}
a=z/7
db a,a,a ;R=G=B
end repeat
```

The above code is a sample I'm playing with. You need to manually change the 7 mentioned throughout the code (two places) to change the behaviour.

 Description: The sample output and histogram Filesize: 46.52 KB Viewed: 6551 Time(s)

_________________
My updated idol http://www.agner.org/optimize/
15 Aug 2008, 11:25
bitRAKE

Joined: 21 Jul 2003
Posts: 3971
Location: vpcmipstrm
bitRAKE 15 Aug 2008, 15:08
A little less blocky:
Code:
```repeat 256*256
z=0
rept 7 octave {
a=% and (not ((1 shl octave -1)*0101h))
RND a
a=a shr ((octave-1)/3)
z=z+a
}
a=z/5
db a,a,a ;R=G=B
end repeat    ```

 Description: Filesize: 28.14 KB Viewed: 6535 Time(s)

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
15 Aug 2008, 15:08

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Actually the blocks should be removed with interpolation which I haven't done in FASM-macro-assisted generator.
If my calcs are correct this non-blocky code adds ~20% to the time, but since I'm dealing with .1 seconds, I cannot be accurate enough. I will make some more tests and smooth the output.

PS. Sorry for long delays on this topic. I meant to research this theme a lot, but work and vacation intercepted. I will get back to this...
15 Aug 2008, 18:11
tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 15 Aug 2008, 22:14
I will get back to this...
15 Aug 2008, 22:14
bitRAKE

Joined: 21 Jul 2003
Posts: 3971
Location: vpcmipstrm
bitRAKE 17 Sep 2008, 19:02
I had an idea for some Perlin type noise, but it is not so good.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
17 Sep 2008, 19:02
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum