flat assembler
Message board for the users of flat assembler.
Index
> Windows > How to access 3D attribute data in a nice way ? |
Author |
|
Quantum 06 Jun 2006, 18:30
Quote:
That whould be: p = [x + y*100 + z*100*100] Computing those scales (x100 and x10000) whould require using a mul instruction. That's very inefficient. I suggest binding those arrays to a power of 2 (i.e. 128). x e [0,127], y e [0,127], z e [0,127] So that the above expression: p = [x + y<<7 + z<<14] Say x is eax, y is edx and z is ecx: shl ecx,14 shl edx,7 add eax,edx add eax,ecx mov eax,[eax] ; here we get that p DWORD. And you can make a macro from this if you like. |
|||
06 Jun 2006, 18:30 |
|
Tomasz Grysztar 06 Jun 2006, 18:53
That's very efficient for speed but quite inefficient for the used memory if we are going to access only few actually important points in such 3D space. Thus to make a good solution we need first to know:
1) How many "attributed" point with what ranges of coordinates are we going to use; 2) What is more important for us: speed or memory usage (usually you can get more efficiency in one of those areas at the cost of the other one). |
|||
06 Jun 2006, 18:53 |
|
Kuemmel 07 Jun 2006, 17:54
Quantum wrote:
Thanks Quantum, the hint about the power of 2 addresses is nice. In your example shouldn't I also do a shl eax,2 before the mov, as p would be a DOWRD and be aligned to a DWORD address ? If I want to save some memory and my p(x,y,z) values is just the size of a byte in your example I would need shl ecx,14 shl edx,7 add eax,edx add eax,ecx movsx eax,byte[eax] Is that correct ? I'm still relatively new to x86 asm, so if I learned it right I can't access bytes with 'mov', I got to use the 'movsx' !? |
|||
07 Jun 2006, 17:54 |
|
Quantum 07 Jun 2006, 19:39
Quote:
Yes, I missed that detail. Quote:
Yes. zx/sx suffixes perform unsigned/signed extension. Quote:
Reading a byte at address given in eax and stoing it in al: mov al,[eax] |
|||
07 Jun 2006, 19:39 |
|
Madis731 08 Jun 2006, 09:16
But why don't you interleave X,Y,Z,X,Y,... because you are going to need the appropriate XYZ together anyway!?
Maybe if you described more accurately the application that it is going to be? I know that 3DS file for example has them interleaved. If you are using DWORDs you can add a dummy there and this would also be efficient if using BYTE. Example: Code: align 4 db x1,y1,z1,dummy1 db x2,y2,z2,... ; Here you have the X-coordinate always aligned ...another example: Code: align 16 dd x1,y1,z1,dummy1 dd x2,y2,z2,... ; This is aligned by 16 so you can access it with SSE (128-bit) My theory is that when you use x-buffer, y-buffer and z-buffer separately it is not very performance wise because no matter how you access them, they are ALWAYS far and CPU doesn't know how to handle the cache. |
|||
08 Jun 2006, 09:16 |
|
Kuemmel 08 Jun 2006, 16:36
Madis731 wrote: But why don't you interleave X,Y,Z,X,Y,... because you are going to need the appropriate XYZ together anyway!? Year, of course your solution is the way to do it if I would have to access 3D coordinates. I was thinking in another application though. I remembered an algoritm I was implementing to create a '2D-fire'. In the end it's a matrix with x*y bytes and your have to calculate sums of pixels (sum each 8 pixel sourrounding one plus the one) and calculate the average and place it one pixel higher basically for the whole matrix for one frame. So I want to port this to 3D, as a real 'voluminous' fire, as a x * y * z matrix, each cooridnate represented by one byte. Then I got to have to sum 3*9 pixels and calculate an average and place the result. As the coordinates of the pixels themselves are fixed in distance, like a perfect grid, I don't need to store them, they are just like constants. I'll post some pseudo algoritm when I finished it. |
|||
08 Jun 2006, 16:36 |
|
Madis731 09 Jun 2006, 08:13
Ok, I understood better now. You should really use the x,x,x...,y,y,y...,z,z,z... way and make three passes. First go through the array linearly and sum every 3 consecutive pixels, then use some algorithm to go through the array once again, but now on Y-coordinate. You need to jump X times every time again summing 3 consecutive pixels. Finally, you do it the Z-way by jumping X*Y times.
Example: Code: ;array +--+--+--+ |9 |4 |3 | +--+--+--+ |2 |5 |6 | +--+--+--+ |7 |1 |8 | +--+--+--+ First pass: +--+--+--+ |- |16|- | +--+--+--+ |- |13|- | +--+--+--+ |- |16|- | +--+--+--+ Second pass: +--+--+--+ |- |- |- | +--+--+--+ |- |45|- | +--+--+--+ |- |- |- | +--+--+--+ Third pass would take into account the third dimension. I hope it won't be hard because its very easy in my head right now |
|||
09 Jun 2006, 08:13 |
|
Kuemmel 29 Oct 2006, 23:44
Finally I come up with something working...it's by no way finished, just playing around with it, and needs huge optimizing and cleaning of the code...
EDIT: The file is attached... Anyway, I followed your hints, Madis, regarding the basic algoritm. So basically it's now a 128*128*128 - 3D-Space that is 'blured' to achieve some nice graphical effect and mapped in an easy way to the screen...just found out that all the memory access uses quite a lot of speed and a 256*256*256 Space would be too much. To use it as a fire Algoritm wasn't looking nice...hard to map a 3D fire, I think... Overall I want to turn it out like a kind of nice looking memory benchmark and I already see that an effect is quite clear on my 2 systems: - Sempron 1,8 GHz: 3,2 frames/sec - Old Athlon 1,0 GHz: 0,8 frames/sec So, no more scaling along the pure CPU power, got something to do with the memory speed, I guess. My next steps would be to 1) Enhance the visual appearance, like put in some moving spiral dots or something, change colours, etc... 2) Make use of multi-core CPU's, like spreading the grid in like 8 sections, like 64*64*64*8 for calculations 3) Optimize, correct and clean all code... I'm open for any comments, results on other systems from you guys out there... Oh, and a question...how can I get rid of the hourglass during full screen !? Also, how can I find out the theoreticall maximum memory bandwidth of a system ? Is there some decent hardware info tool for this ?
Last edited by Kuemmel on 30 Oct 2006, 06:56; edited 2 times in total |
|||||||||||
29 Oct 2006, 23:44 |
|
vid 30 Oct 2006, 01:02
your link doesn't work for me
|
|||
30 Oct 2006, 01:02 |
|
Kuemmel 30 Oct 2006, 06:57
attached it to my entry now...strange, the link was working from here...
|
|||
30 Oct 2006, 06:57 |
|
MichaelH 30 Oct 2006, 07:51
Quote:
cominvk DDraw, SetCooperativeLevel, [mainhwnd], DDSCL_FULLSCREEN or DDSCL_NORMAL Platform SDK says DDSCL_NORMAL - Application will function as a regular Windows application. |
|||
30 Oct 2006, 07:51 |
|
madmatt 30 Oct 2006, 09:45
Kuemmel:
You can use "invoke ShowCursor, FALSE" to hide the mouse cursor, and "invoke ShowCursor, TRUE" to get it back. |
|||
30 Oct 2006, 09:45 |
|
Madis731 31 Oct 2006, 11:06
Crashes on Server 2003 Enterprise x86-64. Its a laptop with 1400x1050 res. and it doesn't go to fullscreen, but hides in the corner...
Runs successfully on 2000 SP4, though Second thoughts...there might be the problem that is finishes too fast!? |
|||
31 Oct 2006, 11:06 |
|
Kuemmel 31 Oct 2006, 17:43
Madis731 wrote: Crashes on Server 2003 Enterprise x86-64. Its a laptop with 1400x1050 res. and it doesn't go to fullscreen, but hides in the corner... Hm, basically I use the same screen code like in my Fractal Benchmark, just the resolution is not 800x600, now it's 640x480...could that be a problem ? Hm, finishes to fast, could be, what's your frame rate result ? I limit it because I only got slow machines at hand...of course I'll change it later. |
|||
31 Oct 2006, 17:43 |
|
Madis731 08 Nov 2006, 10:24
1) Sorry, not Enterprise, but Standard. I don't know where I got the idea, that it was Enterprise :S
2) Sorry, I had wierd drivers (32?) and now I've put correct drivers and restarted. Now the only thing is that I can't get it to compile. Btw, the app runs at 9.9FPS or near that. I included the ddraw.inc and stuff, but it doesn't like the LockSurface thingy Any help? |
|||
08 Nov 2006, 10:24 |
|
Kuemmel 08 Nov 2006, 17:39
Hi Madis731,
the problem with compiling could be the same for some people like when I did the mandelbrot benchmark. I still use the same FASM setup. I provide some link on my page to the include files used: http://www.mikusite.de/x86/KMB_INCLUDE.zip ...may be that helps... |
|||
08 Nov 2006, 17:39 |
|
Madis731 09 Nov 2006, 13:52
yipee I got it compiling, but I couldn't find one calculation bug. You have the box correctly drawn only with the 1024x768. Others have ghost boxes on the same height ±1 pixel.
This runs at 8.333FPS@1024x768 and 9.276FPS@640x480 resolution. Strange is that is uses only "left core" of my T7200 :S |
|||
09 Nov 2006, 13:52 |
|
Kuemmel 09 Nov 2006, 22:09
Madis731 wrote: yipee I got it compiling, but I couldn't find one calculation bug. You have the box correctly drawn only with the 1024x768. Others have ghost boxes on the same height ±1 pixel. That's my fault. At the moment the whole code is locked to one core. Anyway the whole code setup is like that any second CPU wouldn't benefit too much. So don't worry about this, it's forced at the moment. Later on I want to change the calculation of the 128*128*128 blur field to lets say at least cubes to 8 times 64*64*64 and lock them on a core...the other stuff isn't that time critical, I think so that should help multi core machines. Lomg way to go still...not so happy with the visual look...we'll see...Christmas holiday coming soon |
|||
09 Nov 2006, 22:09 |
|
Madis731 10 Nov 2006, 07:56
Hehee - make the fireworks more colourful and fly some Santas and deers around
|
|||
10 Nov 2006, 07:56 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.