flat assembler
Message board for the users of flat assembler.
Index
> Main > Calculating multi-dimensional array displacements, how? Goto page 1, 2, 3 Next |
Author |
|
SomeoneNew 04 Dec 2007, 06:26
Linky: http://pastebin.com/m1bd40eb9
In that URL I layed out a small text describing my problem, basically I want to understand how to calculate the displacement to emulate a 3d array of bytes. could anyone explain me how arrays in C++ or similar work?, I'd like the same output because it's for pixel data. I'm more down to the idea / concept than how to do it in ASM. _________________ Im new, sorry if I bothered with any stupid question |
|||
04 Dec 2007, 06:26 |
|
SomeoneNew 04 Dec 2007, 07:29
Thanks
I found this too: http://webster.cs.ucr.edu/AoA/Windows/HTML/Arraysa2.html I implemented their formula and it worked ok for most of my operations, however when I got to convolute my image... well... things went all wrong Needless to say it's slow!, could I perform stream operations instead?, specially for value clamping (0, 255) because its eating a lot of cpu doing just that! |
|||
04 Dec 2007, 07:29 |
|
edfed 04 Dec 2007, 08:06
if i understand right, you want to build 3D with array.
one pixel for one 3D cell? personnaly i use the xl, yl, zl labels for equates and datas don't forget that a convolution need more datas than expected. some "phatom" datas before and after to work well. one other thing is the base address of each dim, are they zero based or one based? in case of zero based, the xl, yl, zl values as ptr are "one" more than reality. accessing an array by individual element is not good for reasons. you can make faster if the elements are accessed in an elemental loops, and just increment the pointer , setting the repeats for each loop and coding the algorythm within the N loops. possible in asm and in high level languages. |
|||
04 Dec 2007, 08:06 |
|
bitRAKE 04 Dec 2007, 08:15
Tell me more about the operation. The description thus far doesn't make clear why a 3D array is needed for pixel indexing. Usually, the color component can be accessed as a point element rather than an array of bits/bytes. If it's a convolution or similar process the whole picture as a flat array of pixels, maybe.
|
|||
04 Dec 2007, 08:15 |
|
SomeoneNew 04 Dec 2007, 08:17
yes but convolution requires specific X,Y position per pixel basis, I can't avoid calculating those!
my array is 3d because [X]}[Y][Z], where X, Y are the pixel coordinates and Z is the color component (R,G,B,A) the only optimization I have in the displacement calculations is that I only calculate once per pixel, the following (for the Z component) is a simple additive operation (adds the total of an entire column per step). so instead of 3 displacements I'm doing 1 and 2 subsequent adds per "getpixel" or "setpixel" call. Last edited by SomeoneNew on 04 Dec 2007, 08:21; edited 1 time in total |
|||
04 Dec 2007, 08:17 |
|
edfed 04 Dec 2007, 08:27
ok, but this is not a three dim array that you need.
only a 2 dim array. the third is the data itself, pointed to by: (esi+)x+y*xl like a simple 2D array. i was thinking you would make a solid 3D figure, constituted of cubic elements, disposed like a rubicube... |
|||
04 Dec 2007, 08:27 |
|
bitRAKE 04 Dec 2007, 08:33
SomeoneNew wrote: my array is 3d because [X][Y][Z], where X, Y are the pixel coordinates and Z is the color component (R,G,B,A) Are all pixels being processed? Can get/set pixel be integrated into processing? If ESI is position within graphic and EBX is width, the following addressing modes can access three pixels vertically: [esi], [esi+ebx*4], [esi+ebx*8]. Combined with immediate offsets it becomes easy to access neighboring pixels. Any of this help? |
|||
04 Dec 2007, 08:33 |
|
SomeoneNew 04 Dec 2007, 08:35
doesn't it depend whether it's interleaved or planar data? you suggest an interleaved solution?
RGBA8 is the pixel format i'm using by the way this is how it would look like in an HL implementation (basic like) Code: pixels(10,10,0) = 64; //r pixels(10,10,1) = 128; //g pixels(10,10,2) = 64; //b pixels(10,10,3) = 255; //a |
|||
04 Dec 2007, 08:35 |
|
edfed 04 Dec 2007, 08:52
pseudo code
Code: dim dword (5,5) ;binary result: ; B G R B G R B G R B G R B G R dd 000000,010101,020202,030303,040404 dd 010101,020202,030303,040404,050505 dd 020202,030303,040404,050505,060606 dd 030303,040404,050505,060606,070707 dd 040404,050505,060606,070707,080808 |
|||
04 Dec 2007, 08:52 |
|
bitRAKE 04 Dec 2007, 15:24
SomeoneNew wrote: doesn't it depend whether it's interleaved or planar data? you suggest an interleaved solution? Different HLL's implement multi-dimentional arrays differently - we need to know how it is stored in memory. Must the color planes be separate? If yes, then is their size constant? If yes, then the same algorithm works with four memory accesses: Code: mov [edi+PLANE_SIZE*0],128 mov [edi+PLANE_SIZE*1],64 mov [edi+PLANE_SIZE*2],128 mov [edi+PLANE_SIZE*3],64 PLANE_SIZE doesn't change in the inner loop, but if your algorithm needs to handle variable PLANE_SIZEs then it might be advanteous to have a register for each plane...there are always ways. Maybe, each plane could be processed separately? Aren't the planes merged at some point for display/storage? If the algorithm is pixel-centric then it is always better to merge the planes and then process. Lots of speculation because of all the ambiguity. |
|||
04 Dec 2007, 15:24 |
|
bitRAKE 04 Dec 2007, 19:08
SomeoneNew wrote: this is how it would look like in an HL implementation (basic like) Code: mov [edi],0FF408040h |
|||
04 Dec 2007, 19:08 |
|
SomeoneNew 04 Dec 2007, 22:00
I'm ok in using a different approach, but now that I'm at it, what would be the fastest implementation?
bitRAKE: indeed it is. _________________ Im new, sorry if I bothered with any stupid question |
|||
04 Dec 2007, 22:00 |
|
bitRAKE 04 Dec 2007, 22:25
SomeoneNew wrote: what would be the fastest implementation? |
|||
04 Dec 2007, 22:25 |
|
edfed 04 Dec 2007, 23:06
exact.
pixel is fast, component is at least (N components) times slower. |
|||
04 Dec 2007, 23:06 |
|
SomeoneNew 05 Dec 2007, 03:20
Wouldn't it be slow anyway to manipulate the dwords instead of working with the bytes?
what if I wanted to fully average 2 pixel data sets? with components you would do: Code: d_r = ( s_r + d_r ) / 2 d_g = ( s_g + d_g ) / 2 d_b = ( s_b + d_b ) / 2 d_a = ( s_a + d_a ) / 2 _________________ Im new, sorry if I bothered with any stupid question |
|||
05 Dec 2007, 03:20 |
|
edfed 05 Dec 2007, 03:37
just make this:
Code: ;eax=pixel1 ;ebx=pixel2 pxlmean: .r: add al,bl jc @f shr al,1 ror eax,8 ror ebx,8 jmp .g @@: shr al,1 or al,80h ror eax,8 ror ebx,8 .g: add al,bl jc @f shr al,1 ror eax,8 ror ebx,8 jmp .b @@: shr al,1 or al,80h ror eax,8 ror ebx,8 .b: add al,bl jc @f shr al,1 ror eax,8 ror ebx,8 jmp .a @@: shr al,1 or al,80h ror eax,8 ror ebx,8 .a add al,bl jc @f shr al,1 ror eax,8 ror ebx,8 jmp .g @@: shr al,1 or al,80h ror eax,8 ror ebx,8 ret with this algorythm, the mean of two odd component give an odd component if it's not important, then Code: pxlmean: .r: shr al,1 shr bl,1 add al,bl ror eax,8 ror ebx,8 .g: shr al,1 shr bl,1 add al,bl ror eax,8 ror ebx,8 .b: shr al,1 shr bl,1 add al,bl ror eax,8 ror ebx,8 .a: shr al,1 shr bl,1 add al,bl ror eax,8 ;eax=pixel mean ret on IA32 it's fast to work on bytes and dword but not for words. |
|||
05 Dec 2007, 03:37 |
|
LocoDelAssembly 05 Dec 2007, 04:01
edfed, modification of your first code:
Code: ;eax=pixel1 ;ebx=pixel2 pxlmean: .r: add al,bl rcr al,1 ror eax,8 ror ebx,8 .g: add al,bl rcr al,1 ror eax,8 ror ebx,8 .b: add al,bl rcr al, 1 ror eax,8 ror ebx,8 .a: add al,bl rcr al, 1 ror eax,8 ror ebx,8 ; jmp .g ;I have supposed that this jump is wrong ret |
|||
05 Dec 2007, 04:01 |
|
edfed 05 Dec 2007, 04:10
great, i didn't think about the rcr instruction
one other method, bit magic! Code: pxlmean: mov ecx,eax mov edx,ebx and eax,not 80808080h and ebx,not 80808080h add eax,ebx mov ebx,eax and ebx,01010101h and eax,not 01010101h shr eax,1 add eax,ebx and ecx,80808080h shr ecx,1 and edx,80808080h shr edx,1 add eax,ecx ret |
|||
05 Dec 2007, 04:10 |
|
bitRAKE 05 Dec 2007, 04:44
Don't forget PAVGB - can't beat that for speed.
Basically, memory access is usually the bottleneck - even if the data is in the cache memory access takes at least another cycle. Sometimes this latency can be hidden. Each algorithm needs to be looked at separately because there are times when everything is geared towards byte loads and parrallel data operations are slower because of all the data massaging needed. Knowing which instructions are availible changes the whole perspective, though. |
|||
05 Dec 2007, 04:44 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.