flat assembler
Message board for the users of flat assembler.

 Index > Windows > How to access 3D attribute data in a nice way ?
Author
Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
I'm thinking in writing some software 3D-stuff with FASM, so I would need to access for example some coordinate attribute data in the form like p = DOWRD[x,y,z] or p = BYTE[x,y,z]. In other words like:

MOV eax, DWORD / BYTE p[x,y,z]
and
MOV DWORD / BYTE p[x,y,z], eax

Of some space like x:0...100, y:0...100, z:0...100 or something...

How can such thing set up in a proper and way and how to setup the data most efficiently (alignment, etc.) ? Some macros for that ?
06 Jun 2006, 16:22
Quantum

Joined: 24 Jun 2005
Posts: 122
Quantum
Quote:

p = DWORD[x,y,z]

That whould be:
p = [x + y*100 + z*100*100]

Computing those scales (x100 and x10000) whould require using a mul instruction. That's very inefficient. I suggest binding those arrays to a power of 2 (i.e. 128).

x e [0,127], y e [0,127], z e [0,127]

So that the above expression:
p = [x + y<<7 + z<<14]

Say x is eax, y is edx and z is ecx:
shl ecx,14
shl edx,7
mov eax,[eax] ; here we get that p DWORD.

And you can make a macro from this if you like.
06 Jun 2006, 18:30
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 7848
Location: Kraków, Poland
Tomasz Grysztar
That's very efficient for speed but quite inefficient for the used memory if we are going to access only few actually important points in such 3D space. Thus to make a good solution we need first to know:
1) How many "attributed" point with what ranges of coordinates are we going to use;
2) What is more important for us: speed or memory usage (usually you can get more efficiency in one of those areas at the cost of the other one).
06 Jun 2006, 18:53
Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
Quantum wrote:

Say x is eax, y is edx and z is ecx:
shl ecx,14
shl edx,7
mov eax,[eax] ; here we get that p DWORD.

Thanks Quantum, the hint about the power of 2 addresses is nice. In your example shouldn't I also do a shl eax,2 before the mov, as p would be a DOWRD and be aligned to a DWORD address ?

If I want to save some memory and my p(x,y,z) values is just the size of a byte in your example I would need
shl ecx,14
shl edx,7
movsx eax,byte[eax]

Is that correct ? I'm still relatively new to x86 asm, so if I learned it right I can't access bytes with 'mov', I got to use the 'movsx' !?
07 Jun 2006, 17:54
Quantum

Joined: 24 Jun 2005
Posts: 122
Quantum
Quote:

In your example shouldn't I also do a shl eax,2 before the mov, as p would be a DOWRD and be aligned to a DWORD address ?

Yes, I missed that detail.

Quote:

Is that correct ?

Yes. zx/sx suffixes perform unsigned/signed extension.

Quote:

so if I learned it right I can't access bytes with 'mov'

Reading a byte at address given in eax and stoing it in al:
mov al,[eax]
07 Jun 2006, 19:39

Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
But why don't you interleave X,Y,Z,X,Y,... because you are going to need the appropriate XYZ together anyway!?

Maybe if you described more accurately the application that it is going to be? I know that 3DS file for example has them interleaved. If you are using DWORDs you can add a dummy there and this would also be efficient if using BYTE. Example:
Code:
```align 4
db x1,y1,z1,dummy1
db x2,y2,z2,...
; Here you have the X-coordinate always aligned
```

...another example:
Code:
```align 16
dd x1,y1,z1,dummy1
dd x2,y2,z2,...
; This is aligned by 16 so you can access it with SSE (128-bit)
```

My theory is that when you use x-buffer, y-buffer and z-buffer separately it is not very performance wise because no matter how you access them, they are ALWAYS far and CPU doesn't know how to handle the cache.
08 Jun 2006, 09:16
Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
But why don't you interleave X,Y,Z,X,Y,... because you are going to need the appropriate XYZ together anyway!?

Maybe if you described more accurately the application that it is going to be? I know that 3DS file for example has them interleaved. If you are using DWORDs you can add a dummy there and this would also be efficient if using BYTE.

Year, of course your solution is the way to do it if I would have to access 3D coordinates. I was thinking in another application though.

I remembered an algoritm I was implementing to create a '2D-fire'. In the end it's a matrix with x*y bytes and your have to calculate sums of pixels (sum each 8 pixel sourrounding one plus the one) and calculate the average and place it one pixel higher basically for the whole matrix for one frame.

So I want to port this to 3D, as a real 'voluminous' fire, as a x * y * z matrix, each cooridnate represented by one byte. Then I got to have to sum 3*9 pixels and calculate an average and place the result.

As the coordinates of the pixels themselves are fixed in distance, like a perfect grid, I don't need to store them, they are just like constants.
I'll post some pseudo algoritm when I finished it.
08 Jun 2006, 16:36

Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Ok, I understood better now. You should really use the x,x,x...,y,y,y...,z,z,z... way and make three passes. First go through the array linearly and sum every 3 consecutive pixels, then use some algorithm to go through the array once again, but now on Y-coordinate. You need to jump X times every time again summing 3 consecutive pixels. Finally, you do it the Z-way by jumping X*Y times.
Example:
Code:
```;array
+--+--+--+
|9 |4 |3 |
+--+--+--+
|2 |5 |6 |
+--+--+--+
|7 |1 |8 |
+--+--+--+

First pass:
+--+--+--+
|- |16|- |
+--+--+--+
|- |13|- |
+--+--+--+
|- |16|- |
+--+--+--+

Second pass:
+--+--+--+
|- |- |- |
+--+--+--+
|- |45|- |
+--+--+--+
|- |- |- |
+--+--+--+
```

Third pass would take into account the third dimension. I hope it won't be hard because its very easy in my head right now
09 Jun 2006, 08:13
Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
Finally I come up with something working...it's by no way finished, just playing around with it, and needs huge optimizing and cleaning of the code...

EDIT: The file is attached...

Anyway, I followed your hints, Madis, regarding the basic algoritm.

So basically it's now a 128*128*128 - 3D-Space that is 'blured' to achieve some nice graphical effect and mapped in an easy way to the screen...just found out that all the memory access uses quite a lot of speed and a 256*256*256 Space would be too much.

To use it as a fire Algoritm wasn't looking nice...hard to map a 3D fire, I think...

Overall I want to turn it out like a kind of nice looking memory benchmark and I already see that an effect is quite clear on my 2 systems:

- Sempron 1,8 GHz: 3,2 frames/sec
- Old Athlon 1,0 GHz: 0,8 frames/sec

So, no more scaling along the pure CPU power, got something to do with the memory speed, I guess.

My next steps would be to

1) Enhance the visual appearance, like put in some moving spiral dots or something, change colours, etc...

2) Make use of multi-core CPU's, like spreading the grid in like 8 sections, like 64*64*64*8 for calculations

3) Optimize, correct and clean all code...

I'm open for any comments, results on other systems from you guys out there...

Oh, and a question...how can I get rid of the hourglass during full screen !?

Also, how can I find out the theoreticall maximum memory bandwidth of a system ? Is there some decent hardware info tool for this ?

Last edited by Kuemmel on 30 Oct 2006, 06:56; edited 2 times in total
29 Oct 2006, 23:44
vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
30 Oct 2006, 01:02
Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
attached it to my entry now...strange, the link was working from here...
30 Oct 2006, 06:57
MichaelH

Joined: 03 May 2005
Posts: 402
MichaelH
Quote:

Oh, and a question...how can I get rid of the hourglass during full screen !?

cominvk DDraw, SetCooperativeLevel, [mainhwnd], DDSCL_FULLSCREEN or DDSCL_NORMAL

Platform SDK says DDSCL_NORMAL - Application will function as a regular Windows application.
30 Oct 2006, 07:51

Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
Kuemmel:
You can use "invoke ShowCursor, FALSE" to hide the mouse cursor, and "invoke ShowCursor, TRUE" to get it back.
30 Oct 2006, 09:45

Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Crashes on Server 2003 Enterprise x86-64. Its a laptop with 1400x1050 res. and it doesn't go to fullscreen, but hides in the corner...
Runs successfully on 2000 SP4, though

Second thoughts...there might be the problem that is finishes too fast!?
31 Oct 2006, 11:06
Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
Crashes on Server 2003 Enterprise x86-64. Its a laptop with 1400x1050 res. and it doesn't go to fullscreen, but hides in the corner...
Runs successfully on 2000 SP4, though

Second thoughts...there might be the problem that is finishes too fast!?

Hm, basically I use the same screen code like in my Fractal Benchmark, just the resolution is not 800x600, now it's 640x480...could that be a problem ?

Hm, finishes to fast, could be, what's your frame rate result ?

I limit it because I only got slow machines at hand...of course I'll change it later.
31 Oct 2006, 17:43

Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
1) Sorry, not Enterprise, but Standard. I don't know where I got the idea, that it was Enterprise :S

2) Sorry, I had wierd drivers (32?) and now I've put correct drivers and restarted.

Now the only thing is that I can't get it to compile. Btw, the app runs at 9.9FPS or near that. I included the ddraw.inc and stuff, but it doesn't like the LockSurface thingy Any help?
08 Nov 2006, 10:24
Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel

the problem with compiling could be the same for some people like when I did the mandelbrot benchmark. I still use the same FASM setup. I provide some link on my page to the include files used:

http://www.mikusite.de/x86/KMB_INCLUDE.zip

...may be that helps...
08 Nov 2006, 17:39

Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
yipee I got it compiling, but I couldn't find one calculation bug. You have the box correctly drawn only with the 1024x768. Others have ghost boxes on the same height ±1 pixel.

This runs at 8.333FPS@1024x768 and 9.276FPS@640x480 resolution. Strange is that is uses only "left core" of my T7200 :S
09 Nov 2006, 13:52
Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
yipee I got it compiling, but I couldn't find one calculation bug. You have the box correctly drawn only with the 1024x768. Others have ghost boxes on the same height ±1 pixel.

This runs at 8.333FPS@1024x768 and 9.276FPS@640x480 resolution. Strange is that is uses only "left core" of my T7200 :S

That's my fault. At the moment the whole code is locked to one core. Anyway the whole code setup is like that any second CPU wouldn't benefit too much. So don't worry about this, it's forced at the moment.

Later on I want to change the calculation of the 128*128*128 blur field to lets say at least cubes to 8 times 64*64*64 and lock them on a core...the other stuff isn't that time critical, I think so that should help multi core machines.

Lomg way to go still...not so happy with the visual look...we'll see...Christmas holiday coming soon
09 Nov 2006, 22:09

Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Hehee - make the fireworks more colourful and fly some Santas and deers around
10 Nov 2006, 07:56
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum