flat assembler
Message board for the users of flat assembler.

 Index > Main > Calculating multi-dimensional array displacements, how? Goto page 1, 2, 3  Next
Author
 Thread
SomeoneNew

Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 06:26
Linky: http://pastebin.com/m1bd40eb9

In that URL I layed out a small text describing my problem, basically I want to understand how to calculate the displacement to emulate a 3d array of bytes.

could anyone explain me how arrays in C++ or similar work?, I'd like the same output because it's for pixel data.

I'm more down to the idea / concept than how to do it in ASM.

_________________
Im new, sorry if I bothered with any stupid question
04 Dec 2007, 06:26
bitRAKE

Joined: 21 Jul 2003
Posts: 3957
Location: vpcmipstrm
bitRAKE 04 Dec 2007, 06:40
Just think about it very logically in little steps: point, line, plane/grid, field.
Code:
```Size_x = 11
Size_y = 7
Size_z = 5

Size_xy = Size_x * Size_y

MyArray rb Size_x * Size_y * Size_z

; get byte at index (eax,ebx,ecx)=(x,y,z)
imul ecx,Size_xy
imul ebx,Size_x
add eax,ecx
add eax,ebx
movzx eax,byte [MyArray + eax]    ```
Note: each index is zero based.

To reach a (z) index we must increment by the size of the XY plane.
To reach a (y) index we must increment by the size of the X array.
(x) index is just value in the X array.

There are plenty of shortcuts we can use in x86. For example, sizes that are powers of two (256 is really convienent (c: ).
04 Dec 2007, 06:40
SomeoneNew

Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 07:29
Thanks

I found this too: http://webster.cs.ucr.edu/AoA/Windows/HTML/Arraysa2.html

I implemented their formula and it worked ok for most of my operations, however when I got to convolute my image... well... things went all wrong

Needless to say it's slow!, could I perform stream operations instead?, specially for value clamping (0, 255) because its eating a lot of cpu doing just that!
04 Dec 2007, 07:29
edfed

Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 04 Dec 2007, 08:06
if i understand right, you want to build 3D with array.
one pixel for one 3D cell?

personnaly i use the xl, yl, zl labels for equates and datas

don't forget that a convolution need more datas than expected.
some "phatom" datas before and after to work well.
one other thing is the base address of each dim, are they zero based or one based?
in case of zero based, the xl, yl, zl values as ptr are "one" more than reality.
accessing an array by individual element is not good for reasons.
you can make faster if the elements are accessed in an elemental loops, and just increment the pointer , setting the repeats for each loop and coding the algorythm within the N loops.
possible in asm and in high level languages.
04 Dec 2007, 08:06
bitRAKE

Joined: 21 Jul 2003
Posts: 3957
Location: vpcmipstrm
bitRAKE 04 Dec 2007, 08:15
Tell me more about the operation. The description thus far doesn't make clear why a 3D array is needed for pixel indexing. Usually, the color component can be accessed as a point element rather than an array of bits/bytes. If it's a convolution or similar process the whole picture as a flat array of pixels, maybe.
04 Dec 2007, 08:15
SomeoneNew

Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 08:17
yes but convolution requires specific X,Y position per pixel basis, I can't avoid calculating those!

my array is 3d because [X]}[Y][Z], where X, Y are the pixel coordinates and Z is the color component (R,G,B,A)

the only optimization I have in the displacement calculations is that I only calculate once per pixel, the following (for the Z component) is a simple additive operation (adds the total of an entire column per step).

so instead of 3 displacements I'm doing 1 and 2 subsequent adds per "getpixel" or "setpixel" call.

Last edited by SomeoneNew on 04 Dec 2007, 08:21; edited 1 time in total
04 Dec 2007, 08:17
edfed

Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 04 Dec 2007, 08:27
ok, but this is not a three dim array that you need.
only a 2 dim array. the third is the data itself, pointed to by:

(esi+)x+y*xl

like a simple 2D array.

i was thinking you would make a solid 3D figure, constituted of cubic elements, disposed like a rubicube...
04 Dec 2007, 08:27
bitRAKE

Joined: 21 Jul 2003
Posts: 3957
Location: vpcmipstrm
bitRAKE 04 Dec 2007, 08:33
SomeoneNew wrote:
my array is 3d because [X][Y][Z], where X, Y are the pixel coordinates and Z is the color component (R,G,B,A)
I always treat RGBA as a single dword. index = y*w+x

Are all pixels being processed? Can get/set pixel be integrated into processing?

If ESI is position within graphic and EBX is width, the following addressing modes can access three pixels vertically: [esi], [esi+ebx*4], [esi+ebx*8]. Combined with immediate offsets it becomes easy to access neighboring pixels.

Any of this help?
04 Dec 2007, 08:33
SomeoneNew

Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 08:35
doesn't it depend whether it's interleaved or planar data? you suggest an interleaved solution?

RGBA8 is the pixel format i'm using by the way

this is how it would look like in an HL implementation (basic like)

Code:
```pixels(10,10,0) = 64;    //r
pixels(10,10,1) = 128;  //g
pixels(10,10,2) = 64;    //b
pixels(10,10,3) = 255;  //a
```
04 Dec 2007, 08:35
edfed

Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 04 Dec 2007, 08:52
pseudo code
Code:
```dim dword (5,5)
;binary result:

;  B G R  B G R  B G R  B G R  B G R
dd 000000,010101,020202,030303,040404
dd 010101,020202,030303,040404,050505
dd 020202,030303,040404,050505,060606
dd 030303,040404,050505,060606,070707
dd 040404,050505,060606,070707,080808
```
04 Dec 2007, 08:52
bitRAKE

Joined: 21 Jul 2003
Posts: 3957
Location: vpcmipstrm
bitRAKE 04 Dec 2007, 15:24
SomeoneNew wrote:
doesn't it depend whether it's interleaved or planar data? you suggest an interleaved solution?
Either way the algorithm can be changed to match the data. Or the data can be changed to allow for a better algorithm. We are only constrained by external dependancies.

Different HLL's implement multi-dimentional arrays differently - we need to know how it is stored in memory. Must the color planes be separate? If yes, then is their size constant? If yes, then the same algorithm works with four memory accesses:
Code:
```mov [edi+PLANE_SIZE*0],128
mov [edi+PLANE_SIZE*1],64
mov [edi+PLANE_SIZE*2],128
mov [edi+PLANE_SIZE*3],64    ```
...colaspes the inner loop and reduces the problem to two dimensions. Additionally, this can be used with the addressing method previously described if neighboring pixel data is needed.

PLANE_SIZE doesn't change in the inner loop, but if your algorithm needs to handle variable PLANE_SIZEs then it might be advanteous to have a register for each plane...there are always ways.

Maybe, each plane could be processed separately? Aren't the planes merged at some point for display/storage? If the algorithm is pixel-centric then it is always better to merge the planes and then process.

Lots of speculation because of all the ambiguity.
04 Dec 2007, 15:24
bitRAKE

Joined: 21 Jul 2003
Posts: 3957
Location: vpcmipstrm
bitRAKE 04 Dec 2007, 19:08
SomeoneNew wrote:
this is how it would look like in an HL implementation (basic like)

Code:
```pixels(10,10,0) = 64;    //r
pixels(10,10,1) = 128;  //g
pixels(10,10,2) = 64;    //b
pixels(10,10,3) = 255;  //a
```
Odds are this is just a single dword:
Code:
`mov [edi],0FF408040h    `
04 Dec 2007, 19:08
SomeoneNew

Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 22:00
I'm ok in using a different approach, but now that I'm at it, what would be the fastest implementation?

bitRAKE: indeed it is.

_________________
Im new, sorry if I bothered with any stupid question
04 Dec 2007, 22:00
bitRAKE

Joined: 21 Jul 2003
Posts: 3957
Location: vpcmipstrm
bitRAKE 04 Dec 2007, 22:25
SomeoneNew wrote:
what would be the fastest implementation?
to opperate pixel(s) and not their components.
04 Dec 2007, 22:25
edfed

Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 04 Dec 2007, 23:06
exact.
pixel is fast, component is at least (N components) times slower.
04 Dec 2007, 23:06
SomeoneNew

Joined: 12 Aug 2006
Posts: 54
SomeoneNew 05 Dec 2007, 03:20
Wouldn't it be slow anyway to manipulate the dwords instead of working with the bytes?

what if I wanted to fully average 2 pixel data sets?

with components you would do:

Code:
```d_r = ( s_r + d_r ) / 2
d_g = ( s_g + d_g ) / 2
d_b = ( s_b + d_b ) / 2
d_a = ( s_a + d_a ) / 2    ```

_________________
Im new, sorry if I bothered with any stupid question
05 Dec 2007, 03:20
edfed

Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 05 Dec 2007, 03:37
just make this:
Code:
```;eax=pixel1
;ebx=pixel2
pxlmean:
.r:
add al,bl
jc @f
shr al,1
ror eax,8
ror ebx,8
jmp .g
@@:
shr al,1
or al,80h
ror eax,8
ror ebx,8
.g:
add al,bl
jc @f
shr al,1
ror eax,8
ror ebx,8
jmp .b
@@:
shr al,1
or al,80h
ror eax,8
ror ebx,8
.b:
add al,bl
jc @f
shr al,1
ror eax,8
ror ebx,8
jmp .a
@@:
shr al,1
or al,80h
ror eax,8
ror ebx,8
.a
add al,bl
jc @f
shr al,1
ror eax,8
ror ebx,8
jmp .g
@@:
shr al,1
or al,80h
ror eax,8
ror ebx,8
ret
```

with this algorythm, the mean of two odd component give an odd component
if it's not important, then
Code:
```pxlmean:
.r:
shr al,1
shr bl,1
add al,bl
ror eax,8
ror ebx,8
.g:
shr al,1
shr bl,1
add al,bl
ror eax,8
ror ebx,8
.b:
shr al,1
shr bl,1
add al,bl
ror eax,8
ror ebx,8
.a:
shr al,1
shr bl,1
add al,bl
ror eax,8
;eax=pixel mean
ret
```

on IA32 it's fast to work on bytes and dword but not for words.
05 Dec 2007, 03:37
LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 05 Dec 2007, 04:01
edfed, modification of your first code:
Code:
```;eax=pixel1
;ebx=pixel2
pxlmean:
.r:
add al,bl
rcr al,1
ror eax,8
ror ebx,8

.g:
add al,bl
rcr al,1
ror eax,8
ror ebx,8

.b:
add al,bl
rcr al, 1
ror eax,8
ror ebx,8

.a:
add al,bl
rcr al, 1
ror eax,8
ror ebx,8
;          jmp .g  ;I have supposed that this jump is wrong

ret
```
05 Dec 2007, 04:01
edfed

Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 05 Dec 2007, 04:10
great, i didn't think about the rcr instruction
one other method, bit magic!
Code:
```pxlmean:
mov ecx,eax
mov edx,ebx
and eax,not 80808080h
and ebx,not 80808080h
add eax,ebx
mov ebx,eax
and ebx,01010101h
and eax,not 01010101h
shr eax,1
add eax,ebx
and ecx,80808080h
shr ecx,1
and edx,80808080h
shr edx,1
add eax,ecx
ret
```
05 Dec 2007, 04:10
bitRAKE

Joined: 21 Jul 2003
Posts: 3957
Location: vpcmipstrm
bitRAKE 05 Dec 2007, 04:44
Don't forget PAVGB - can't beat that for speed.

Basically, memory access is usually the bottleneck - even if the data is in the cache memory access takes at least another cycle. Sometimes this latency can be hidden. Each algorithm needs to be looked at separately because there are times when everything is geared towards byte loads and parrallel data operations are slower because of all the data massaging needed. Knowing which instructions are availible changes the whole perspective, though.
05 Dec 2007, 04:44
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area
Goto page 1, 2, 3  Next

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum

Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.