flat assembler
Message board for the users of flat assembler.

Index > Main > Calculating multi-dimensional array displacements, how?

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 06:26
Linky: http://pastebin.com/m1bd40eb9

In that URL I layed out a small text describing my problem, basically I want to understand how to calculate the displacement to emulate a 3d array of bytes.

could anyone explain me how arrays in C++ or similar work?, I'd like the same output because it's for pixel data.

I'm more down to the idea / concept than how to do it in ASM.

_________________
Im new, sorry if I bothered with any stupid question Smile
Post 04 Dec 2007, 06:26
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 04 Dec 2007, 06:40
Just think about it very logically in little steps: point, line, plane/grid, field.
Code:
Size_x = 11
Size_y = 7
Size_z = 5

Size_xy = Size_x * Size_y

MyArray rb Size_x * Size_y * Size_z

; get byte at index (eax,ebx,ecx)=(x,y,z)
imul ecx,Size_xy
imul ebx,Size_x
add eax,ecx
add eax,ebx
movzx eax,byte [MyArray + eax]    
Note: each index is zero based.

To reach a (z) index we must increment by the size of the XY plane.
To reach a (y) index we must increment by the size of the X array.
(x) index is just value in the X array.

There are plenty of shortcuts we can use in x86. For example, sizes that are powers of two (256 is really convienent (c: ).
Post 04 Dec 2007, 06:40
View user's profile Send private message Visit poster's website Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 07:29
Thanks Smile

I found this too: http://webster.cs.ucr.edu/AoA/Windows/HTML/Arraysa2.html

I implemented their formula and it worked ok for most of my operations, however when I got to convolute my image... well... things went all wrong Sad

Needless to say it's slow!, could I perform stream operations instead?, specially for value clamping (0, 255) because its eating a lot of cpu doing just that!
Post 04 Dec 2007, 07:29
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 04 Dec 2007, 08:06
if i understand right, you want to build 3D with array.
one pixel for one 3D cell?

personnaly i use the xl, yl, zl labels for equates and datas

don't forget that a convolution need more datas than expected.
some "phatom" datas before and after to work well.
one other thing is the base address of each dim, are they zero based or one based?
in case of zero based, the xl, yl, zl values as ptr are "one" more than reality.
accessing an array by individual element is not good for reasons.
you can make faster if the elements are accessed in an elemental loops, and just increment the pointer , setting the repeats for each loop and coding the algorythm within the N loops.
possible in asm and in high level languages.
Post 04 Dec 2007, 08:06
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 04 Dec 2007, 08:15
Tell me more about the operation. The description thus far doesn't make clear why a 3D array is needed for pixel indexing. Usually, the color component can be accessed as a point element rather than an array of bits/bytes. If it's a convolution or similar process the whole picture as a flat array of pixels, maybe.
Post 04 Dec 2007, 08:15
View user's profile Send private message Visit poster's website Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 08:17
yes but convolution requires specific X,Y position per pixel basis, I can't avoid calculating those!

my array is 3d because [X]}[Y][Z], where X, Y are the pixel coordinates and Z is the color component (R,G,B,A)

the only optimization I have in the displacement calculations is that I only calculate once per pixel, the following (for the Z component) is a simple additive operation (adds the total of an entire column per step).

so instead of 3 displacements I'm doing 1 and 2 subsequent adds per "getpixel" or "setpixel" call.


Last edited by SomeoneNew on 04 Dec 2007, 08:21; edited 1 time in total
Post 04 Dec 2007, 08:17
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 04 Dec 2007, 08:27
ok, but this is not a three dim array that you need.
only a 2 dim array. the third is the data itself, pointed to by:

(esi+)x+y*xl

like a simple 2D array.

i was thinking you would make a solid 3D figure, constituted of cubic elements, disposed like a rubicube...
Post 04 Dec 2007, 08:27
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 04 Dec 2007, 08:33
SomeoneNew wrote:
my array is 3d because [X][Y][Z], where X, Y are the pixel coordinates and Z is the color component (R,G,B,A)
I always treat RGBA as a single dword. index = y*w+x

Are all pixels being processed? Can get/set pixel be integrated into processing?

If ESI is position within graphic and EBX is width, the following addressing modes can access three pixels vertically: [esi], [esi+ebx*4], [esi+ebx*8]. Combined with immediate offsets it becomes easy to access neighboring pixels.

Any of this help?
Post 04 Dec 2007, 08:33
View user's profile Send private message Visit poster's website Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 08:35
doesn't it depend whether it's interleaved or planar data? you suggest an interleaved solution?

RGBA8 is the pixel format i'm using by the way

this is how it would look like in an HL implementation (basic like)

Code:
pixels(10,10,0) = 64;    //r
pixels(10,10,1) = 128;  //g
pixels(10,10,2) = 64;    //b
pixels(10,10,3) = 255;  //a
    
Post 04 Dec 2007, 08:35
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 04 Dec 2007, 08:52
pseudo code
Code:
dim dword (5,5)
;binary result:

;  B G R  B G R  B G R  B G R  B G R
dd 000000,010101,020202,030303,040404
dd 010101,020202,030303,040404,050505
dd 020202,030303,040404,050505,060606
dd 030303,040404,050505,060606,070707
dd 040404,050505,060606,070707,080808
    
Post 04 Dec 2007, 08:52
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 04 Dec 2007, 15:24
SomeoneNew wrote:
doesn't it depend whether it's interleaved or planar data? you suggest an interleaved solution?
Either way the algorithm can be changed to match the data. Or the data can be changed to allow for a better algorithm. We are only constrained by external dependancies.

Different HLL's implement multi-dimentional arrays differently - we need to know how it is stored in memory. Must the color planes be separate? If yes, then is their size constant? If yes, then the same algorithm works with four memory accesses:
Code:
mov [edi+PLANE_SIZE*0],128
mov [edi+PLANE_SIZE*1],64
mov [edi+PLANE_SIZE*2],128
mov [edi+PLANE_SIZE*3],64    
...colaspes the inner loop and reduces the problem to two dimensions. Additionally, this can be used with the addressing method previously described if neighboring pixel data is needed.

PLANE_SIZE doesn't change in the inner loop, but if your algorithm needs to handle variable PLANE_SIZEs then it might be advanteous to have a register for each plane...there are always ways.

Maybe, each plane could be processed separately? Aren't the planes merged at some point for display/storage? If the algorithm is pixel-centric then it is always better to merge the planes and then process.

Lots of speculation because of all the ambiguity.
Post 04 Dec 2007, 15:24
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 04 Dec 2007, 19:08
SomeoneNew wrote:
this is how it would look like in an HL implementation (basic like)

Code:
pixels(10,10,0) = 64;    //r
pixels(10,10,1) = 128;  //g
pixels(10,10,2) = 64;    //b
pixels(10,10,3) = 255;  //a
    
Odds are this is just a single dword:
Code:
mov [edi],0FF408040h    
Post 04 Dec 2007, 19:08
View user's profile Send private message Visit poster's website Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 04 Dec 2007, 22:00
I'm ok in using a different approach, but now that I'm at it, what would be the fastest implementation?

bitRAKE: indeed it is.

_________________
Im new, sorry if I bothered with any stupid question Smile
Post 04 Dec 2007, 22:00
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 04 Dec 2007, 22:25
SomeoneNew wrote:
what would be the fastest implementation?
to opperate pixel(s) and not their components.
Post 04 Dec 2007, 22:25
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 04 Dec 2007, 23:06
exact.
pixel is fast, component is at least (N components) times slower.
Post 04 Dec 2007, 23:06
View user's profile Send private message Visit poster's website Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 05 Dec 2007, 03:20
Wouldn't it be slow anyway to manipulate the dwords instead of working with the bytes?

what if I wanted to fully average 2 pixel data sets?

with components you would do:

Code:
d_r = ( s_r + d_r ) / 2
d_g = ( s_g + d_g ) / 2
d_b = ( s_b + d_b ) / 2
d_a = ( s_a + d_a ) / 2    

_________________
Im new, sorry if I bothered with any stupid question Smile
Post 05 Dec 2007, 03:20
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 05 Dec 2007, 03:37
just make this:
Code:
;eax=pixel1
;ebx=pixel2
pxlmean:  
.r:
          add al,bl
          jc @f      
          shr al,1
          ror eax,8
          ror ebx,8
          jmp .g
@@:
          shr al,1
          or al,80h
          ror eax,8
          ror ebx,8
.g:
          add al,bl
          jc @f
          shr al,1
          ror eax,8
          ror ebx,8
          jmp .b
@@:
          shr al,1
          or al,80h
          ror eax,8
          ror ebx,8
.b:
          add al,bl
          jc @f
          shr al,1
          ror eax,8
          ror ebx,8
          jmp .a
@@:
          shr al,1
          or al,80h
          ror eax,8
          ror ebx,8
.a
          add al,bl
          jc @f
          shr al,1
          ror eax,8
          ror ebx,8
          jmp .g
@@:
          shr al,1
          or al,80h
          ror eax,8
          ror ebx,8
          ret
    

with this algorythm, the mean of two odd component give an odd component
if it's not important, then
Code:
pxlmean:
.r:
          shr al,1
          shr bl,1
          add al,bl
          ror eax,8
          ror ebx,8
.g:
          shr al,1
          shr bl,1
          add al,bl
          ror eax,8
          ror ebx,8
.b:
          shr al,1
          shr bl,1
          add al,bl
          ror eax,8
          ror ebx,8
.a:
          shr al,1
          shr bl,1
          add al,bl
          ror eax,8
;eax=pixel mean
          ret
    


on IA32 it's fast to work on bytes and dword but not for words.
Post 05 Dec 2007, 03:37
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 05 Dec 2007, 04:01
edfed, modification of your first code:
Code:
;eax=pixel1 
;ebx=pixel2 
pxlmean:   
.r: 
          add al,bl 
          rcr al,1 
          ror eax,8 
          ror ebx,8 

.g:
          add al,bl 
          rcr al,1 
          ror eax,8 
          ror ebx,8 

.b: 
          add al,bl 
          rcr al, 1 
          ror eax,8 
          ror ebx,8 

.a: 
          add al,bl 
          rcr al, 1 
          ror eax,8 
          ror ebx,8 
;          jmp .g  ;I have supposed that this jump is wrong

          ret 
    
Post 05 Dec 2007, 04:01
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 05 Dec 2007, 04:10
great, i didn't think about the rcr instruction
one other method, bit magic!
Code:
pxlmean:
        mov ecx,eax
        mov edx,ebx
        and eax,not 80808080h
        and ebx,not 80808080h
        add eax,ebx
        mov ebx,eax
        and ebx,01010101h
        and eax,not 01010101h
        shr eax,1
        add eax,ebx
        and ecx,80808080h
        shr ecx,1
        and edx,80808080h
        shr edx,1
        add eax,ecx
        ret
    
Post 05 Dec 2007, 04:10
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 05 Dec 2007, 04:44
Don't forget PAVGB - can't beat that for speed. Cool

Basically, memory access is usually the bottleneck - even if the data is in the cache memory access takes at least another cycle. Sometimes this latency can be hidden. Each algorithm needs to be looked at separately because there are times when everything is geared towards byte loads and parrallel data operations are slower because of all the data massaging needed. Knowing which instructions are availible changes the whole perspective, though.
Post 05 Dec 2007, 04:44
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.