flat assembler
Message board for the users of flat assembler.

flat assembler > Linux > 3D Simplex Noise

Author
Thread Post new topic Reply to topic
randall



Joined: 03 Dec 2011
Posts: 153
Location: Poland
I have implemented 3D Simplex noise which is improved version of the Perlin noise algorithm. My implementation is almost two times faster than C++ version from GLM library (http://glm.g-truc.net/). Maybe it will be useful for someone. I will use this function for procedural terrain generation and rendering.

Performance: about 3.2 Msamples/sec on Core2 CPU 6300 @ 1.86GHz


Description:
Filesize: 26.91 KB
Viewed: 2311 Time(s)

snoise.jpg


Description:
Download
Filename: snoise.asm
Filesize: 28.38 KB
Downloaded: 152 Time(s)


_________________
https://github.com/michal-z
Post 11 May 2012, 11:40
View user's profile Send private message Visit poster's website Reply with quote
gunblade



Joined: 19 Feb 2004
Posts: 209
Very neat..

I assume you've probably came accross this before.. but there's a program called Terragen that does just that..

It's been a while since I looked at it/used it, but since I saw this post I looked it up again, and it looks like they've gone all commercial (Although they have also improved it greatly.. its gone well beyond just a land generator..):

http://www.planetside.co.uk

However the "classic" version is still available, and so is a limited free version of the version 2 (limited resolution/quality).. Its just a shame its not open source, would have been a good codebase to compare to..

Still, some kind of raytracer (depends on the speed requirements i guess), and you could shift that from 2D noise to 3D noise/landscapes.

Nice work - my only issue is that white pixel just above the bottom dark blob.. - what's introducing that? Will cause a weird looking glitch if/when that is rendered.. (especially if you use light = higher land).
Post 12 May 2012, 02:18
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 153
Location: Poland
Yes, I know Terragen. Nice program.

I will be implementing simple ray marcher to render landscapes. Something like this: http://www.iquilezles.org/www/articles/terrainmarching/terrainmarching.htm

Yes, these white pixels are odd (there is another one in the middle left) but they are also present in C++ version so I will leave it for now.

Thanks.

_________________
https://github.com/michal-z


Last edited by randall on 12 May 2012, 12:31; edited 1 time in total
Post 12 May 2012, 02:39
View user's profile Send private message Visit poster's website Reply with quote
randall



Joined: 03 Dec 2011
Posts: 153
Location: Poland
I have done some tests and it seems that this odd pixels are present only when z = 0.0 (third component of the input vector to the noise function). This is 3D noise function and for heightmap generation I only need 2D so I can fix the third component. So these pixels shouldn't be problem for me. I am going to continue my work ignoring these pixels for now but thanks for reporting this problem.

_________________
https://github.com/michal-z
Post 12 May 2012, 12:19
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2146
Location: Estonia
I just noticed that ABS() macro. Three instructions is a bit much I think.
Code:
macro ABS { andps xmm0,dqword[absps] ; clear sign-bit } absps dd 7FFFFFFFh,7FFFFFFFh,7FFFFFFFh,7FFFFFFFh


Are you afraid the memory access (even if in cache) is slower than three single-clock (which of two are dependant) instructions? There's a solution for that. Cache absps value in a register (xmm15 or in 32-bit land xmm7) and use
Code:
andps xmm0,xmm15 ; xmm7 in 32-bit land



Then I noticed the DOT4() macro. I used to struggle with it myself wanting to use PSHUFD. First I thought that its for integer and makes it slower, but actually you can get rid of some instructions that way:
Code:
mulps xmm0,xmm1 pshufd xmm1,xmm0,0x55 ; All dwords are taken from the source register pshufd xmm2,xmm0,0xaa pshufd xmm3,xmm0,0xff pshufd xmm0,xmm0,0x00 addps xmm0,xmm1 addps xmm2,xmm3 addps xmm0,xmm2


Tell me if they made any difference to this number:
randall wrote:
Performance: about 3.2 Msamples/sec on Core2 CPU 6300 @ 1.86GHz
Smile
Post 16 May 2012, 10:41
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
Code:
;; DOT4 if you have SSE3 instructions available MULPS xmm0, xmm1 HADDPS xmm0, xmm0 HADDPS xmm0, xmm0
Post 16 May 2012, 16:28
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 205
Location: Poland
I ported randall code into KolibriOS and MenuetOS64.
http://macgub.vxm.pl/menuet/snoise.zip


Last edited by macgub on 13 May 2018, 09:19; edited 1 time in total
Post 17 May 2012, 11:35
View user's profile Send private message Visit poster's website Reply with quote
CandyMan



Joined: 04 Sep 2009
Posts: 269
Location: film "CandyMan" directed through Bernard Rose OR Candy Shop
Code:
format Flat on "vitamin.exe" entry Start32 stack 8k use64 ; Code16 = 90h ; 16-bit compatibility mode code selector Code32 = 38h ; 32-bit compatibility mode code selector Data32 = 30h ; 32-bit compatibility mode data selector Code64 = 28h ; 64-bit code selector BufferSize = 32k ; transfer buffer ; macro int No { int No+80h } ; struc RMCS ;real mode call structure { .rEDI dd ? ;+0 virtual at .rEDI .rDI dw ? end virtual .rESI dd ? ;+4 virtual at .rESI .rSI dw ? end virtual .rEBP dd ? ;+8 virtual at .rEBP .rBP dw ? end virtual .Reserve dd ? ;+12 .rEBX dd ? ;+16 virtual at .rEBX .rBX dw ? end virtual virtual at .rBX .rBL db ? .rBH db ? end virtual .rEDX dd ? ;+20 virtual at .rEDX .rDX dw ? end virtual virtual at .rDX .rDL db ? .rDH db ? end virtual .rECX dd ? ;+24 virtual at .rECX .rCX dw ? end virtual virtual at .rCX .rCL db ? .rCH db ? end virtual .rEAX dd ? ;+28 virtual at .rEAX .rAX dw ? end virtual virtual at .rAX .rAL db ? .rAH db ? end virtual .rFL dw ? ;+32 .rES dw ? ;+34 .rDS dw ? ;+36 .rFS dw ? ;+38 .rGS dw ? ;+40 .rCSIP dd ? ;+42 virtual at .rCSIP .rIP dw ? .rCS dw ? end virtual .rSSSP dd ? ;+46 virtual at .rSSSP .rSP dw ? .rSS dw ? end virtual } ; virtual at 0 RMCS RMCS end virtual ; ;I: rcx,rdx ;O: rax WriteToFile: push r8 rcx rdx rsi rdi rbp xor r8d,r8d jrcxz .End mov rbp,rcx mov rsi,rdx .Loop: mov ecx,BufferSize sub rbp,rcx jnc .Write add rbp,rcx mov ecx,ebp xor ebp,ebp .Write: push rcx mov edi,[LinBuff] shr rcx,3 rep movsq mov cl,[rsp] and cl,111b rep movsb pop rcx mov ah,40h call DosIntWithBufferZero add r8,rax or rbp,rbp jnz .Loop .End: mov rax,r8 pop rbp rdi rsi rdx rcx r8 ret DosIntWithBufferZero: xor edx,edx DosIntWithBuffer: push rbx rcx rdi lea edi,[Regs] mov [rdi+RMCS.rAH],ah mov eax,[SegBuff] mov [rdi+RMCS.rDS],ax mov [rdi+RMCS.rES],ax mov [rdi+RMCS.rEBX],ebx mov [rdi+RMCS.rECX],ecx mov [rdi+RMCS.rEDX],edx call DosInt movzx eax,[rdi+RMCS.rAX] bt dword [rdi+RMCS.rFL],0 pop rdi rcx rbx ret DosInt: mov bl,21h mov ax,0300h xor bh,bh xor ecx,ecx mov [rdi+RMCS.rSSSP],ecx mov [rdi+RMCS.rFL],1 int 31h ret Start32: use32 jmp Code64:Start64 use64 ;------------------------------------------------------------------------------- ; NAME: DOT3 ; IN: xmm0 | ? z0 y0 x0 | ; IN: xmm1 | ? z1 y1 x1 | ; OUT: xmm0 | s s s s | s = x0*x1+y0*y1+z0*z1 ;------------------------------------------------------------------------------- macro DOT3 { mulps xmm0,xmm1 movaps xmm1,xmm0 movaps xmm2,xmm0 shufps xmm0,xmm0,0x00 shufps xmm1,xmm1,0x55 shufps xmm2,xmm2,0xAA addps xmm0,xmm1 addps xmm0,xmm2 } ;------------------------------------------------------------------------------- ; NAME: DOT4 ; IN: xmm0 | w0 z0 y0 x0 | ; IN: xmm1 | w1 z1 y1 x1 | ; OUT: xmm0 | s s s s | s = x0*x1+y0*y1+z0*z1+w0*w1 ;------------------------------------------------------------------------------- macro DOT4 { mulps xmm0,xmm1 movaps xmm1,xmm0 movaps xmm2,xmm0 movaps xmm3,xmm0 shufps xmm0,xmm0,0x00 shufps xmm1,xmm1,0x55 shufps xmm2,xmm2,0xAA shufps xmm3,xmm3,0xFF addps xmm0,xmm1 addps xmm2,xmm3 addps xmm0,xmm2 } ;------------------------------------------------------------------------------- ; NAME: FLOOR ; IN: xmm0 | w z y x | ; OUT: xmm0 | floor(w) floor(z) floor(y) floor(x) | ;------------------------------------------------------------------------------- macro FLOOR { cvttps2dq xmm1,xmm0 psrld xmm0,31 psubd xmm1,xmm0 cvtdq2ps xmm0,xmm1 } ;------------------------------------------------------------------------------- ; NAME: STEP ; IN: xmm0 | ew ez ey ex | edge vector ; IN: xmm1 | w z y x | value vector ; OUT: xmm0 | step(ew,w) step(ez,z) step(ey,y) step(ex,x) | ;------------------------------------------------------------------------------- macro STEP { cmpltps xmm1,xmm0 andnps xmm1,dqword [g_1_0] movaps xmm0,xmm1 } ;------------------------------------------------------------------------------- ; NAME: MOD289 ; IN: xmm0 | w z y x | ; OUT: xmm0 | mod289(w) mod289(z) mod289(y) mod289(x) | ; mod289(s) = s - floor(s * (1.0/289.0)) * 289.0 ;------------------------------------------------------------------------------- macro MOD289 { movaps xmm2,xmm0 mulps xmm0,dqword [g_1_div_289] FLOOR mulps xmm0,dqword [g_289_0] subps xmm2,xmm0 movaps xmm0,xmm2 } ;------------------------------------------------------------------------------- ; NAME: PERMUTE ; IN: xmm0 | w z y x | ; OUT: xmm0 | perm(w) perm(z) perm(y) perm(x) | ; perm(s) = mod289(((s*34.0)+1.0)*s) ;------------------------------------------------------------------------------- macro PERMUTE { movaps xmm1,xmm0 mulps xmm0,dqword [g_34_0] addps xmm0,dqword [g_1_0] mulps xmm0,xmm1 MOD289 } ;------------------------------------------------------------------------------- ; NAME: ABS ; IN: xmm0 | w z y x | ; OUT: xmm0 | abs(w) abs(z) abs(y) abs(x) | ;------------------------------------------------------------------------------- macro ABS { xorps xmm1,xmm1 ; xmm1 = | 0 0 0 0 | subps xmm1,xmm0 ; xmm1 = neg(x) maxps xmm0,xmm1 ; xmm0 = abs(x) } ;------------------------------------------------------------------------------- ; NAME: SNoise3 ; DESC: 3D Simplex noise (https://github.com/ashima/webgl-noise) ; IN: xmm0 | ? z y x | ; OUT: xmm0 | s s s s | s is noise value [-1.0,1.0] ;------------------------------------------------------------------------------- even 16 SNoise3: v equ rbp-16 i equ rbp-32 x0 equ rbp-48 x1 equ rbp-64 x2 equ rbp-80 x3 equ rbp-96 i1 equ rbp-112 i2 equ rbp-128 push rbp mov rbp,rsp sub rsp,256 movaps [v],xmm0 ; save input on the stack ; ; Compute corners (x0, x1, x2, x3) ; ; i = floor(v + dot(v, C.yyy)) movaps xmm1,dqword [g_snoise_C] shufps xmm1,xmm1,0x55 ; xmm1 = C.yyy DOT3 ; xmm0 = dot(xmm0,xmm1) addps xmm0,[v] FLOOR movaps [i],xmm0 ; x0 = v - i + dot(i, C.xxx) movaps xmm3,[v] subps xmm3,xmm0 movaps xmm1,dqword [g_snoise_C] shufps xmm1,xmm1,0x00 ; xmm1 = C.xxx DOT3 ; xmm0 = dot(xmm0,xmm1) addps xmm3,xmm0 movaps [x0],xmm3 ; compute i1 and i2 movaps xmm0,xmm3 shufps xmm0,xmm0,11001001b ; xmm0 = | w x z y | movaps xmm1,xmm3 STEP movaps xmm7,xmm0 ; xmm7 = g movaps xmm6,dqword [g_1_0] subps xmm6,xmm7 ; xmm6 = 1.0 - g = l shufps xmm6,xmm6,11010010b ; xmm6 = | w y x z | movaps xmm0,xmm7 movaps xmm1,xmm7 minps xmm0,xmm6 ; xmm0 = min(g.xyz, l.zxy) maxps xmm1,xmm6 ; xmm1 = max(g.xyz, l.zxy) movaps [i1],xmm0 ; xmm0 = i1 movaps [i2],xmm1 ; xmm1 = i2 ; compute x1, x2 and x3 movaps xmm7,[x0] ; xmm7 = x0 movaps xmm6,xmm7 ; xmm6 = x0 movaps xmm5,xmm7 ; xmm5 = x0 movaps xmm4,dqword [g_snoise_C] movaps xmm3,xmm4 movaps xmm2,dqword [g_snoise_D] shufps xmm4,xmm4,0x00 ; xmm4 = C.xxx shufps xmm3,xmm3,0x55 ; xmm3 = C.yyy shufps xmm2,xmm2,0x55 ; xmm2 = D.yyy subps xmm5,xmm0 ; xmm5 = x0 - i1 subps xmm6,xmm1 ; xmm6 = x0 - i2 subps xmm7,xmm2 ; xmm7 = x0 - D.yyy addps xmm5,xmm4 ; xmm5 = x0 - i1 + C.xxx addps xmm6,xmm3 ; xmm6 = x0 - i2 + C.yyy movaps [x1],xmm5 movaps [x2],xmm6 movaps [x3],xmm7 ; ; Compute permutations (p) ; movaps xmm0,[i] MOD289 movaps xmm7,xmm0 movaps xmm6,xmm0 movaps xmm5,xmm0 shufps xmm7,xmm7,10101010b ; xmm7 = i.zzzz shufps xmm6,xmm6,01010101b ; xmm6 = i.yyyy shufps xmm5,xmm5,00000000b ; xmm5 = i.xxxx movaps xmm4,[i1] ; xmm4 = i1 movaps xmm3,[i2] ; xmm3 = i2 ; movaps xmm0,xmm4 ; xmm0 = i1 movaps xmm1,xmm3 ; xmm1 = i2 shufps xmm0,xmm0,10101010b ; xmm0 = | i1.z i1.z i1.z i1.z | shufps xmm1,xmm1,10101010b ; xmm1 = | i2.z i2.z i2.z i2.z | andps xmm0,dqword [g_mask_0010];xmm0 = | 0 0 i1.z 0 | andps xmm1,dqword [g_mask_0100];xmm1 = | 0 i2.z 0 0 | orps xmm0,xmm1 ; xmm0 = | 0 i2.z i1.z 0 | orps xmm0,dqword [g_1_0_w] ; xmm0 = | 1 i2.z i1.z 0 | addps xmm0,xmm7 ; xmm0 = i.zzzz + | 1 i2.z i1.z 0 | PERMUTE movaps xmm8,xmm0 ; xmm8 = p movaps xmm0,xmm4 ; xmm0 = i1 movaps xmm1,xmm3 ; xmm1 = i2 shufps xmm0,xmm0,01010101b ; xmm0 = | i1.y i1.y i1.y i1.y | shufps xmm1,xmm1,01010101b ; xmm1 = | i2.y i2.y i2.y i2.y | andps xmm0,dqword [g_mask_0010];xmm0 = | 0 0 i1.y 0 | andps xmm1,dqword [g_mask_0100];xmm1 = | 0 i2.y 0 0 | orps xmm0,xmm1 ; xmm0 = | 0 i2.y i1.y 0 | orps xmm0,dqword [g_1_0_w] ; xmm0 = | 1 i2.y i1.y 0 | addps xmm0,xmm6 ; xmm0 = i.yyyy + | 1 i2.y i1.y 0 | addps xmm0,xmm8 PERMUTE movaps xmm8,xmm0 ; xmm8 = p movaps xmm0,xmm4 ; xmm0 = i1 movaps xmm1,xmm3 ; xmm1 = i2 shufps xmm0,xmm0,00000000b ; xmm0 = | i1.x i1.x i1.x i1.x | shufps xmm1,xmm1,00000000b ; xmm1 = | i2.x i2.x i2.x i2.x | andps xmm0,dqword [g_mask_0010];xmm0 = | 0 0 i1.x 0 | andps xmm1,dqword [g_mask_0100];xmm1 = | 0 i2.x 0 0 | orps xmm0,xmm1 ; xmm0 = | 0 i2.x i1.x 0 | orps xmm0,dqword [g_1_0_w] ; xmm0 = | 1 i2.x i1.x 0 | addps xmm0,xmm5 ; xmm0 = i.xxxx + | 1 i2.x i1.x 0 | addps xmm0,xmm8 PERMUTE movaps xmm8,xmm0 ; xmm8 = p ; ; Compute gradients ; movaps xmm0,dqword [g_snoise_D] movaps xmm1,xmm0 shufps xmm0,xmm0,11100111b ; xmm0 = | D.w D.z D.y D.w | shufps xmm1,xmm1,11001000b ; xmm1 = | D.w D.x D.z D.x | mulps xmm0,dqword [g_1_div_7] subps xmm0,xmm1 movaps xmm7,xmm0 ; xmm7 = ns ; xmm8 = j = p - 49.0 * floor(p * ns.z * ns.z) shufps xmm0,xmm0,10101010b ; xmm0 = ns.zzzz mulps xmm0,xmm0 mulps xmm0,xmm8 FLOOR mulps xmm0,dqword [g_49_0] subps xmm8,xmm0 ; xmm8 = j ; x_ = floor(j * ns.zzzz) movaps xmm0,xmm7 shufps xmm0,xmm0,10101010b ; xmm0 = ns.zzzz mulps xmm0,xmm8 FLOOR ; xmm0 = x_ movaps xmm6,xmm0 ; xmm6 = x_ ; y_ = floor(j - 7.0 * x_) mulps xmm0,dqword [g_7_0] movaps xmm1,xmm8 subps xmm1,xmm0 movaps xmm0,xmm1 FLOOR movaps xmm5,xmm0 ; xmm5 = y_ ; x = x_ * ns.xxxx + ns.yyyy ; y = y_ * ns.xxxx + ns.yyyy movaps xmm0,xmm7 ; xmm0 = ns movaps xmm1,xmm7 ; xmm1 = ns shufps xmm0,xmm0,00000000b ; xmm0 = ns.xxxx shufps xmm1,xmm1,01010101b ; xmm1 = ns.yyyy mulps xmm6,xmm0 ; xmm6 = x_ * ns.xxxx mulps xmm5,xmm0 ; xmm5 = y_ * ns.xxxx addps xmm6,xmm1 ; xmm6 = x = x_ * ns.xxxx + ns.yyyy addps xmm5,xmm1 ; xmm5 = y = y_ * ns.xxxx + ns.yyyy ; h = 1.0 - abs(x) - abs(y) movaps xmm4,dqword [g_1_0] ; xmm4 = h = | 1 1 1 1 | movaps xmm0,xmm6 ; xmm0 = x ABS ; xmm0 = abs(x) movaps xmm3,xmm0 ; xmm3 = abs(x) movaps xmm0,xmm5 ; xmm0 = y ABS ; xmm0 = abs(y) subps xmm4,xmm3 ; xmm4 = h = 1.0 - abs(x) subps xmm4,xmm0 ; xmm4 = h = 1.0 - abs(x) - abs(y) ; b0 = vec4(x.xy, y.xy) movaps xmm0,xmm6 ; xmm0 = x movaps xmm1,xmm5 ; xmm1 = y unpcklps xmm0,xmm1 ; xmm0 = | y.y x.y y.x x.x | shufps xmm0,xmm0,11011000b ; xmm0 = | y.y y.x x.y x.x | movaps xmm7,xmm0 ; xmm7 = b0 ; b1 = vec4(x.zw, y.zw) movaps xmm0,xmm6 ; xmm0 = x movaps xmm1,xmm5 ; xmm1 = y unpckhps xmm0,xmm1 ; xmm0 = | y.w x.w y.z x.z | shufps xmm0,xmm0,11011000b ; xmm0 = | y.w y.z x.w x.z | movaps xmm3,xmm0 ; xmm3 = b1 ; s0 = floor(b0) * 2.0 + 1.0 movaps xmm0,xmm7 FLOOR addps xmm0,xmm0 addps xmm0,dqword [g_1_0] movaps xmm15,xmm0 ; xmm15 = s0 ; s1 = floor(b1) * 2.0 + 1.0 movaps xmm0,xmm3 FLOOR addps xmm0,xmm0 addps xmm0,dqword [g_1_0] movaps xmm14,xmm0 ; xmm14 = s1 ; sh = -step(h, vec4(0.0)) movaps xmm0,xmm4 ; xmm0 = h xorps xmm1,xmm1 ; xmm1 = | 0 0 0 0 | STEP xorps xmm1,xmm1 subps xmm1,xmm0 movaps xmm13,xmm1 ; xmm13 = sh ; a0 = b0.xzyw + s0.xzyw * sh.xxyy shufps xmm7,xmm7,11011000b ; xmm7 = b0 = | w y z x | shufps xmm15,xmm15,11011000b ; xmm15 = s0 = | w y z x | movaps xmm0,xmm13 shufps xmm0,xmm0,01010000b ; xmm0 = | y y x x | mulps xmm0,xmm15 addps xmm0,xmm7 movaps xmm7,xmm0 ; xmm7 = a0 ; a1 = b1.xzyw + s1.xzyw * sh.zzww shufps xmm3,xmm3,11011000b ; xmm3 = b1 = | w y z x | shufps xmm14,xmm14,11011000b ; xmm14 = s1 = | w y z x | shufps xmm13,xmm13,11111010b ; xmm13 = sh = | w w z z | mulps xmm14,xmm13 addps xmm14,xmm3 movaps xmm6,xmm14 ; xmm6 = a1 ; p0 = vec3(a0.xy, h.x) movaps xmm0,xmm7 ; xmm0 = a0 shufps xmm0,xmm4,00000100b ; | h.x h.x a0.y a0.x | movaps xmm5,xmm0 ; xmm5 = p0 ; p1 = vec3(a0.zw, h.y) shufps xmm7,xmm4,01011110b ; xmm7 = p1 = | h.y h.y a0.w a0.z | ; p2 = vec3(a1.xy, h.z) movaps xmm0,xmm6 ; xmm0 = a1 shufps xmm0,xmm4,10100100b ; | h.z h.z a1.y a1.x | movaps xmm3,xmm0 ; xmm3 = p2 ; p3 = vec3(a1.zw, h.w) shufps xmm6,xmm4,11111110b ; xmm6 = p3 = | h.w h.w a1.w a1.z | ; movaps xmm4,xmm3 ; xmm4 = p2 ; ; Normalize gradients ; ; xmm5 = p0, xmm7 = p1, xmm4 = p2, xmm6 = p3 ; ; xmm15 = dot(p0, p0) movaps xmm0,xmm5 ; xmm0 = p0 movaps xmm1,xmm5 ; xmm1 = p0 DOT3 movaps xmm15,xmm0 ; xmm15 = dot(p0, p0) ; xmm14 = dot(p1, p1) movaps xmm0,xmm7 ; xmm0 = p1 movaps xmm1,xmm7 ; xmm1 = p1 DOT3 movaps xmm14,xmm0 ; xmm14 = dot(p1, p1) ; xmm13 = dot(p2, p2) movaps xmm0,xmm4 ; xmm0 = p2 movaps xmm1,xmm4 ; xmm1 = p2 DOT3 movaps xmm13,xmm0 ; xmm13 = dot(p2, p2) ; xmm12 = dot(p3, p3) movaps xmm0,xmm6 ; xmm0 = p3 movaps xmm1,xmm6 ; xmm1 = p3 DOT3 movaps xmm12,xmm0 ; xmm12 = dot(p3, p3) ; movaps xmm0,dqword [g_taylor_scale] movaps xmm1,dqword [g_taylor_bias] mulps xmm15,xmm0 mulps xmm14,xmm0 mulps xmm13,xmm0 mulps xmm12,xmm0 addps xmm15,xmm1 addps xmm14,xmm1 addps xmm13,xmm1 addps xmm12,xmm1 ; normalize mulps xmm5,xmm15 ; xmm5 = p0 mulps xmm7,xmm14 ; xmm7 = p1 mulps xmm4,xmm13 ; xmm4 = p2 mulps xmm6,xmm12 ; xmm6 = p3 ; ; Mix final noise value ; ; xmm15 = dot(x0, x0) movaps xmm0,[x0] ; xmm0 = x0 movaps xmm1,xmm0 ; xmm1 = x0 DOT3 movaps xmm15,xmm0 ; xmm15 = dot(x0, x0) ; xmm14 = dot(x1, x1) movaps xmm0,[x1] ; xmm0 = x1 movaps xmm1,xmm0 ; xmm1 = x1 DOT3 movaps xmm14,xmm0 ; xmm14 = dot(x1, x1) ; xmm13 = dot(x2, x2) movaps xmm0,[x2] ; xmm0 = x2 movaps xmm1,xmm0 ; xmm1 = x2 DOT3 movaps xmm13,xmm0 ; xmm13 = dot(x2, x2) ; xmm12 = dot(x3, x3) movaps xmm0,[x3] ; xmm0 = x3 movaps xmm1,xmm0 ; xmm1 = x3 DOT3 movaps xmm12,xmm0 ; xmm12 = dot(x3, x3) ; andps xmm15,dqword [g_mask_0001] andps xmm14,dqword [g_mask_0010] andps xmm13,dqword [g_mask_0100] andps xmm12,dqword [g_mask_1000] orps xmm15,xmm14 orps xmm13,xmm12 orps xmm15,xmm13 movaps xmm0,dqword [g_0_6] subps xmm0,xmm15 maxps xmm0,dqword [g_0_0] ; xmm0 = m mulps xmm0,xmm0 mulps xmm0,xmm0 movaps xmm10,xmm0 ; xmm10 = m^4 ; ; xmm15 = dot(x0, p0) movaps xmm0,[x0] ; xmm0 = x0 movaps xmm1,xmm5 ; xmm1 = p0 DOT3 movaps xmm15,xmm0 ; xmm15 = dot(x0, p0) ; xmm14 = dot(x1, p1) movaps xmm0,[x1] ; xmm0 = x1 movaps xmm1,xmm7 ; xmm1 = p1 DOT3 movaps xmm14,xmm0 ; xmm14 = dot(x1, p1) ; xmm13 = dot(x2, p2) movaps xmm0,[x2] ; xmm0 = x2 movaps xmm1,xmm4 ; xmm1 = p2 DOT3 movaps xmm13,xmm0 ; xmm13 = dot(x2, p2) ; xmm12 = dot(x3, p3) movaps xmm0,[x3] ; xmm0 = x3 movaps xmm1,xmm6 ; xmm1 = p3 DOT3 movaps xmm12,xmm0 ; xmm12 = dot(x3, p3) ; put all above dots into xmm15 andps xmm15,dqword [g_mask_0001] andps xmm14,dqword [g_mask_0010] andps xmm13,dqword [g_mask_0100] andps xmm12,dqword [g_mask_1000] orps xmm15,xmm14 orps xmm13,xmm12 orps xmm15,xmm13 ; movaps xmm0,xmm10 ; xmm0 = m^4 movaps xmm1,xmm15 DOT4 mulps xmm0,dqword [g_42_0] mov rsp,rbp pop rbp ret restore v,i,x0,x1,x2,x3,i1,i2 ;------------------------------------------------------------------------------- ; NAME: Main ; DESC: Program main function. ;------------------------------------------------------------------------------- even 16 Main: ImgPtr equ rbp-8 push rbp mov rbp,rsp sub rsp,128 ; alloc memory for the image lea eax,[MemStrt] mov [ImgPtr],rax mov ebx,eax ; begin loops xor r13d,r13d ; .LoopY index .LoopY: xor r12d,r12d ; .LoopX index .LoopX: ; compute xorps xmm0,xmm0 xorps xmm1,xmm1 cvtsi2ss xmm0,r12d ; xmm0 = | 0 0 0 x | cvtsi2ss xmm1,r13d ; xmm1 = | 0 0 0 y | unpcklps xmm0,xmm1 ; xmm0 = | 0 0 y x | divps xmm0,dqword [g_size] addps xmm0,xmm0 addps xmm0,xmm0 call SNoise3 mulps xmm0,dqword [g_0_5] addps xmm0,dqword [g_0_5] ; clamp to [0.0,1.0] minps xmm0,dqword [g_1_0] maxps xmm0,dqword [g_0_0] ; convert from [0.0,1.0] to [0,255] mulps xmm0,dqword [g_255_0] cvttps2dq xmm0,xmm0 movd eax,xmm0 mov [rbx+2],al ; red pshufd xmm1,xmm0,00000001b movd eax,xmm1 mov [rbx+1],al ; green pshufd xmm1,xmm0,00000010b movd eax,xmm1 mov [rbx+0],al ; blue mov byte [rbx+3],255 ; alpha ; advance pixel pointer add rbx,4-1 ; continue .LoopX inc r12d cmp r12d,SIZE jne .LoopX ; continue .LoopY inc r13d cmp r13d,SIZE jne .LoopY ; create TGA file mov ah,3Ch lea edx,[g_tga_name] xor ecx,ecx int 21h xchg ebx,eax ; write header lea edx,[g_tga_head] mov ecx,18 call WriteToFile ; write pixel data mov edx,[ImgPtr] mov ecx,SIZE*SIZE*3 call WriteToFile ; close file mov ah,3Eh int 21h mov rsp,rbp pop rbp ret restore ImgPtr ;------------------------------------------------------------------------------- ; NAME: Debug ;------------------------------------------------------------------------------- even 16 Debug: v equ rbp-16 push rbp mov rbp,rsp sub rsp,128 mov dword [v+0],1.2 mov dword [v+4],2.4 mov dword [v+8],3.5 mov dword [v+12],0.0 movaps xmm0,[v] call SNoise3 mov rsp,rbp pop rbp ret restore v ;------------------------------------------------------------------------------- ; NAME: Start64 ; DESC: Program entry point. ;------------------------------------------------------------------------------- Start64: push 0 syscall mov [ExitAddr],r8 mov [BufferVar],rcx call Main xor al,al jmp [ExitAddr] ;------------------------------------------------------------------------------- g_tga_name db 'snoise.tga',0 g_tga_head db 0,0,2,9 dup 0 db (SIZE and 0x00FF),(SIZE and 0xFF00) shr 8 db (SIZE and 0x00FF),(SIZE and 0xFF00) shr 8,32-8,0 even 16 SIZE = 600 ;800 g_size dd 4 dup 600.0 ;800.0 g_snoise_C dd 0.166666667,0.333333333,0.0,0.0 g_snoise_D dd 0.0,0.5,1.0,2.0 g_0_0 dd 4 dup 0.0 g_0_5 dd 4 dup 0.5 g_0_6 dd 4 dup 0.6 g_1_0 dd 4 dup 1.0 g_7_0 dd 4 dup 7.0 g_34_0 dd 4 dup 34.0 g_42_0 dd 4 dup 42.0 g_49_0 dd 4 dup 49.0 g_255_0 dd 4 dup 255.0 g_289_0 dd 4 dup 289.0 g_1_div_7 dd 4 dup 0.142857142857 g_1_div_289 dd 4 dup 0.003460208 g_mask_0001 dd 0xFFFFFFFF,0x00000000,0x00000000,0x00000000 g_mask_0010 dd 0x00000000,0xFFFFFFFF,0x00000000,0x00000000 g_mask_0100 dd 0x00000000,0x00000000,0xFFFFFFFF,0x00000000 g_mask_1000 dd 0x00000000,0x00000000,0x00000000,0xFFFFFFFF g_1_0_w dd 0.0,0.0,0.0,1.0 g_taylor_bias dd 4 dup 1.79284291400159 g_taylor_scale dd 4 dup -0.85373472095314 ;------------------------------------------------------------------------------- ExitAddr dq ? ; BufferVar: LinBuff dd ? ;transfer buffer *linear address SegBuff dd ? ;transfer buffer segment address ; Regs RMCS ;------------------------------------------------------------------------------- even 16 MemStrt rb SIZE*SIZE*3 ;

fasm you can download from:
http://board.flatassembler.net/topic.php?t=12811

_________________
smaller is better
Post 17 May 2012, 16:08
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2146
Location: Estonia
I just thought about it today. Don't you only need 3D noise for clouds and really complex terrain? For simple heightmaps a 2D noise will do fine, which I guess is faster in implementation.

An example where 3D noise is used: http://dl.dropbox.com/u/12637402/tutorial1/scale64.png
(from this http://forums.bukkit.org/threads/intermediate-wgen-more-interesting-terrain-using-3d-simplex-noise.71813/)
Post 13 Jul 2012, 05:18
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
randall



Joined: 03 Dec 2011
Posts: 153
Location: Poland
Madis731 wrote:
I just thought about it today. Don't you only need 3D noise for clouds and really complex terrain? For simple heightmaps a 2D noise will do fine, which I guess is faster in implementation.

An example where 3D noise is used: http://dl.dropbox.com/u/12637402/tutorial1/scale64.png
(from this http://forums.bukkit.org/threads/intermediate-wgen-more-interesting-terrain-using-3d-simplex-noise.71813/)


Yes, you are right. 2D noise is enough. I have started to work on 2D simplex noise (which is much simpler and faster) but haven't finished yet. Currently I am more in C++ and OpenGL world...

Anyway thanks for the tip.

_________________
https://github.com/michal-z
Post 13 Jul 2012, 17:58
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2146
Location: Estonia
I dug into http://webstaff.itn.liu.se/~stegu/simplexnoise/simplexnoise.pdf today and thought that the explanation for 2D simplex was really nice and SSE implementation would be feasable and extendable (to AVX for example). Usually textures are power of 2 sized squares (512, 4096) and dealing with SSE/AVX having these constraints is great because anything above 32 will guarantee you don't have problems with alignment or edges.
Post 13 Jul 2012, 19:27
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2146
Location: Estonia
Here's my poke at 2D. I don't like the lookup table very much and that mod 12 would be totally unnecessary if I were to replace the LUT with some bit wizardry. It takes about 220 clocks per iteration right now so making it fully vectorized could potentially mean 55 or 28 clk/px with SSE or AVX respectively (that is ideally - of course).
Code:
snoise2:; xmm0 : ?, ?, y, x ; xmm13: F2 ; xmm14: G2 ; xmm15: G2b2m1 push rbx movd xmm13,[F2] pshufd xmm3,xmm0,01010101b pshufd xmm4,xmm0,00000000b movaps xmm6,xmm3 movaps xmm7,xmm4 movaps xmm2,xmm3 addps xmm2,xmm4 mulps xmm2,xmm13 ; s = (xin+yin)*F2 addps xmm3,xmm2 ; xin+s cvttps2dq xmm0,xmm3 ; TODO: Optimize psrld xmm3,31 psubd xmm0,xmm3 cvtdq2ps xmm3,xmm0 ; i = floor(xin+s) addps xmm4,xmm2 ; yin+s cvttps2dq xmm0,xmm4 ; TODO: Optimize psrld xmm4,31 psubd xmm0,xmm4 cvtdq2ps xmm4,xmm0 ; j = floor(yin+s) movaps xmm5,xmm3 addps xmm5,xmm4 mulps xmm5,xmm14 ; t = (i+j)*G2 subps xmm6,xmm3 ; x0 = xin-i+t addps xmm6,xmm5 subps xmm7,xmm4 ; y0 = yin-j+t addps xmm7,xmm5 movaps xmm12,xmm7 cmpltps xmm12,xmm6 movaps xmm13,xmm6 cmpleps xmm13,xmm7 psrld xmm12,31 cvtdq2ps xmm12,xmm12 ; i1 = y0<x0 ? 1 : 0 psrld xmm13,31 cvtdq2ps xmm13,xmm13 ; j1 = y0<x0 ? 0 : 1 ; movaps xmm13,xmm6 ; cmpleps xmm13,xmm7 ; psrld xmm13,31 ; cvtdq2ps xmm13,xmm13 ; j1 = x0<y0 ? 1 : 0 movaps xmm8,xmm6 subps xmm8,xmm12 addps xmm8,xmm14 ; x1 = x0-i1+G2 movaps xmm9,xmm7 subps xmm9,xmm13 addps xmm9,xmm14 ; y1 = y0-ji+G2 movaps xmm10,xmm6 addps xmm10,xmm15 ; x2 = x0 + (-1+2*G2) movaps xmm11,xmm7 addps xmm11,xmm15 ; y2 = y0 + (-1+2*G2) cvtps2dq xmm3,xmm3 cvtps2dq xmm4,xmm4 cvtps2dq xmm12,xmm12 cvtps2dq xmm13,xmm13 movd ebx,xmm3 and ebx,0FFh ; ii movd ecx,xmm4 and ecx,0FFh ; jj movd r8d,xmm12 and r8d,1 ; i1 movd r9d,xmm13 and r9d,1 ; j1 movzx eax,[perm+rcx] movzx eax,[perm+rbx+rax] call mod12 ; gi0 mov r10d,eax movzx eax,[perm+rcx+r9] add r8,rax movzx eax,[perm+rbx+r8] call mod12 ; gi1 mov r11d,eax movzx eax,[perm+rcx+1] movzx eax,[perm+rbx+rax+1] call mod12 ; gi2 mov r12d,eax movaps xmm0,dqword[g_0_5] movaps xmm1,xmm0 movaps xmm2,xmm0 movaps xmm12,xmm6 mulps xmm12,xmm12 ; x0*x0 movaps xmm13,xmm7 mulps xmm13,xmm13 ; y0*y0 subps xmm0,xmm12 subps xmm0,xmm13 ; t0 = 0.5 - x0*x0 - y0*y0 movaps xmm12,xmm0 cmpltps xmm12,dqword[g_0_0] movd eax,xmm12 cmp eax,0 jne .t0lt0 mulps xmm0,xmm0 ; t0 *= t0 mulps xmm0,xmm0 ; t0 *= t0 mulss xmm6,[grad3+r10*8+0] mulss xmm7,[grad3+r10*8+4] addps xmm6,xmm7 ; g[0]*x+g[1]*y mulps xmm0,xmm6 ; dot(g3[gi0],x0,y0) jmp .t0ge0 .t0lt0: xorps xmm0,xmm0 .t0ge0: movaps xmm12,xmm8 mulps xmm12,xmm12 movaps xmm13,xmm9 mulps xmm13,xmm13 subps xmm1,xmm12 subps xmm1,xmm13 movaps xmm12,xmm1 cmpltps xmm12,dqword[g_0_0] movd eax,xmm12 cmp eax,0 jne .t1lt0 mulps xmm1,xmm1 mulps xmm1,xmm1 mulss xmm8,[grad3+r11*8+0] mulss xmm9,[grad3+r11*8+4] addps xmm8,xmm9 mulps xmm1,xmm8 ;dot(g3[gi1],x1,y1) jmp .t1ge0 .t1lt0: xorps xmm1,xmm1 .t1ge0: movaps xmm12,xmm10 mulps xmm12,xmm12 movaps xmm13,xmm11 mulps xmm13,xmm13 subps xmm2,xmm12 subps xmm2,xmm13 movaps xmm12,xmm2 cmpltps xmm12,dqword[g_0_0] movd eax,xmm12 cmp eax,0 jne .t2lt0 mulps xmm2,xmm2 mulps xmm2,xmm2 mulss xmm10,[grad3+r12*8+0] mulss xmm11,[grad3+r12*8+4] addps xmm10,xmm11 mulps xmm2,xmm10 ;dot(g3[gi2],x2,y2) jmp .t2ge0 .t2lt0: xorps xmm2,xmm2 .t2ge0: addps xmm0,xmm1 addps xmm0,xmm2 mulps xmm0,dqword[g_70_0] pop rbx ret mod12:;Snippet MOD12 (find largest integer and subtract) push rbx mov edx,055555556h mov ebx,eax mul edx ; divide by 3 and edx,0FFFFFFFCh lea edx,[edx*3] sub ebx,edx mov eax,ebx pop rbx ret

Code:
main: mov ebx,screen3 movd xmm13,[F2] movd xmm14,[G2] movd xmm15,[G2b2m1] ; begin loops xor edx,edx ; .LoopY index .LoopY: xor ecx,ecx ; .LoopX index .LoopX: ; compute push rcx rdx shl rdx,32 add rdx,rcx movq xmm0,rdx cvtdq2ps xmm0,xmm0 mulps xmm0,dqword[g_size] call snoise2 pop rdx rcx mulps xmm0,dqword [g_0_5] addps xmm0,dqword [g_0_5] ; clamp to [0.0,1.0] minps xmm0,dqword [g_1_0] maxps xmm0,dqword [g_0_0] ; convert from [0.0,1.0] to [0,255] mulps xmm0,dqword [g_255_0] cvttps2dq xmm0,xmm0 movd eax,xmm0 imul eax,010101h mov [rbx],eax ; advance pixel pointer add ebx,3 ; continue .LoopX inc ecx cmp ecx,SIZE jne .LoopX ; continue .LoopY inc edx cmp edx,SIZE jne .LoopY ret

Defines:
Code:
SIZE = 512 SIZE_D equ 0.0078125 ; 4/512 align 4 F2 dd 4 dup 0.3660254 G2 dd 4 dup 0.2113249 G2b2m1 dd 4 dup -0.5773503 align 16 g_size dd 4 dup SIZE_D g_0_0 dd 4 dup 0.0 g_0_5 dd 4 dup 0.5 g_1_0 dd 4 dup 1.0 g_70_0 dd 4 dup 70.0 g_255_0 dd 4 dup 255.0 grad3 dd 1.0,1.0, -1.0,1.0, 1.0,-1.0,-1.0,-1.0 dd 1.0,0.0, -1.0,0.0, 1.0,0.0, -1.0,0.0 dd 0.0,1.0, 0.0,-1.0, 0.0,1.0, 0.0,-1.0 perm db 151,160,137,91,90,15,131,13,201,95,96,53,194,233,7,225 db 140,36,103,30,69,142,8,99,37,240,21,10,23,190,6,148 db 247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32 db 57,177,33,88,237,149,56,87,174,20,125,136,171,168,68,175 db 74,165,71,134,139,48,27,166,77,146,158,231,83,111,229,122 db 60,211,133,230,220,105,92,41,55,46,245,40,244,102,143,54 db 65,25,63,161,1,216,80,73,209,76,132,187,208,89,18,169 db 200,196,135,130,116,188,159,86,164,100,109,198,173,186,3,64 db 52,217,226,250,124,123,5,202,38,147,118,126,255,82,85,212 db 207,206,59,227,47,16,58,17,182,189,28,42,223,183,170,213 db 119,248,152,2,44,154,163,70,221,153,101,155,167,43,172,9 db 129,22,39,253,19,98,108,110,79,113,224,232,178,185,112,104 db 218,246,97,228,251,34,242,193,238,210,144,12,191,179,162,241 db 81,51,145,235,249,14,239,107,49,192,214,31,181,199,106,157 db 184,84,204,176,115,121,50,45,127,4,150,254,138,236,205,93 db 222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180


Description: Sample
Filesize: 157.88 KB
Viewed: 2059 Time(s)

512.png



_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 18 Jul 2012, 16:57
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
randall



Joined: 03 Dec 2011
Posts: 153
Location: Poland
Very nice! Thanks for sharing.

_________________
https://github.com/michal-z
Post 20 Jul 2012, 14:57
View user's profile Send private message Visit poster's website Reply with quote
kalambong



Joined: 08 Nov 2008
Posts: 165
gunblade wrote:
Very neat..

I assume you've probably came accross this before.. but there's a program called Terragen that does just that..

It's been a while since I looked at it/used it, but since I saw this post I looked it up again, and it looks like they've gone all commercial (Although they have also improved it greatly.. its gone well beyond just a land generator..):

http://www.planetside.co.uk

However the "classic" version is still available, and so is a limited free version of the version 2 (limited resolution/quality).. Its just a shame its not open source, would have been a good codebase to compare to..



There is a closely related program call Terramaker http://www.terraproject.de/terramaker/

It has a lot different kinds of perlins to choose from
Post 06 Sep 2012, 06:30
View user's profile Send private message Reply with quote
catafest



Joined: 05 Aug 2010
Posts: 100
can you make a 32 ELF ? I don't have a 64 procesor . Thank's
Post 07 Dec 2012, 17:11
View user's profile Send private message Visit poster's website Yahoo Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 2004-2018, Tomasz Grysztar.

Powered by rwasa.