flat assembler
Message board for the users of flat assembler.

 Index > Main > ..
Author
pool

Joined: 08 Jan 2007
Posts: 97
pool 18 Dec 2009, 12:06
..

Last edited by pool on 17 Mar 2013, 12:06; edited 1 time in total
18 Dec 2009, 12:06
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20215
revolution 18 Dec 2009, 12:54
Code:
`addpd xmm0,xmm1    `
Can you be more specific about what you are trying to do?

Show some code you are having trouble with and perhaps we can help you fix it.
18 Dec 2009, 12:54

Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
Maybe this should be moved to the 'MAIN" forum.
20 Dec 2009, 09:53
bitshifter

Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 20 Dec 2009, 17:44
I have a small 4x4 matrix library that Madis and i have been developing...
If 3D transformations with SSE interest you then i can post some nice code
20 Dec 2009, 17:44
Borsuc

Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 20 Dec 2009, 20:23
do you have 3x3 matrix instead? I would be interested in that. I prefer less overhead and to do translations more optimized.
20 Dec 2009, 20:23
bitshifter

Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 20 Dec 2009, 22:21
Using the whole 128bit register saves me from shuffling all the time.

Ok, here is a small example, first in C, then in SSE2
Code:
```void TranslateMatrix_XYZ(float m[16], const float v[3])
{
m[12] += m[0] * v[0] + m[4] * v[1] + m[8]  * v[2];
m[13] += m[1] * v[0] + m[5] * v[1] + m[9]  * v[2];
m[14] += m[2] * v[0] + m[6] * v[1] + m[10] * v[2];
m[15] += m[3] * v[0] + m[7] * v[1] + m[11] * v[2];
}
```

Code:
```; g_translation is vector [X,Y,Z,W=1.0]
; data aligned on 16 byte bounadary

movaps  xmm0,dqword[g_translation]
movaps  xmm1,dqword[g_matrix]
movaps  xmm2,dqword[g_matrix+16]
movaps  xmm3,dqword[g_matrix+32]
movaps  xmm4,dqword[g_matrix+48]
pshufd  xmm5,xmm0,00000000b
pshufd  xmm6,xmm0,01010101b
pshufd  xmm7,xmm0,10101010b
mulps   xmm1,xmm5
mulps   xmm2,xmm6
mulps   xmm3,xmm7
movaps  dqword[g_matrix+48],xmm4
```

I also have these routines already implemented and tested...

CopyMatrix
IdentityMatrix
TransposeMatrix
RotateMatrix_X
RotateMatrix_Y
RotateMatrix_Z
RotateMatrix_XYZ
ScaleMatrix_X
ScaleMatrix_Y
ScaleMatrix_Z
ScaleMatrix_XYZ
TranslateMatrix_X
TranslateMatrix_Y
TranslateMatrix_Z
TranslateMatrix_XYZ
MultiplyMatrices
TransformVector
20 Dec 2009, 22:21
Borsuc

Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 20 Dec 2009, 23:27
hmm I thought you were doing 4 transformations at a time (so any matrix size would be suitable), not the whole matrix at a time. Actually now that I think of it, a 3x4 matrix would be even better with that approach.

(i.e you transform 4 vectors at a time)

thanks for the code, i'm a noob in SSE and it proves helpful

_________________
Previously known as The_Grey_Beast
20 Dec 2009, 23:27
bitshifter

Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 20 Dec 2009, 23:45
For transforming vectors i setup the transformation matrix once,
then batch process all my vertices through it, its nice and fast that way.
Now dont confuse matrix transformations with vector transformations...
To transform a vector would be like...
Code:
```void TransformVector(float v[3], const float m[16])
{
const float x = v[0];
const float y = v[1];
const float z = v[2];

v[0] = m[0] * x + m[4] * y + m[8]  * z + m[12];
v[1] = m[1] * x + m[5] * y + m[9]  * z + m[13];
v[2] = m[2] * x + m[6] * y + m[10] * z + m[14];
}
```

Code:
```; g_vector is dqword [X,Y,Z,W=1.0]
; All data aligned on 16 byte boundary

movaps  xmm0,dqword[g_vector]
movaps  xmm1,dqword[g_matrix]
movaps  xmm2,dqword[g_matrix+16]
movaps  xmm3,dqword[g_matrix+32]
movaps  xmm4,dqword[g_matrix+48]
pshufd  xmm5,xmm0,00000000b
pshufd  xmm6,xmm0,01010101b
pshufd  xmm7,xmm0,10101010b
mulps   xmm1,xmm5
mulps   xmm2,xmm6
mulps   xmm3,xmm7
movaps  dqword[g_vector],xmm4
```

We only need 3 dot products but we get 4
and trash the vector.w in the process
So you would need to preserve it somehow....

Also i would like to thank Madis for tutoring me with SIMD optimizations.
Sometimes i do good and produce very fast code, other times he kicks my butt.
We usually ponder a routine for a while and see who can do the best.
And of course, someone else may spot ways to do things better, as always...

And if you want to see some mind blowing 3x concatenated rotations, just ask...
20 Dec 2009, 23:45
Borsuc

Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 20 Dec 2009, 23:51
Yes well transforming vectors is the slow process -- you have to do it for each vertex, and that's the slow part, matrix is usually done once per object.

Sorry for noob question but can't you use SSE in this way to not waste anything at all:

put x1,x2,x3,x4 in the registers and then do it with the same algorithms like a normal (non-SSE) vector transformation... but you do 4 vectors at a time as you can see.

isn't that possible, or does it require lots of shuffling? (I'm of course also talking about storing the vertices like that in memory, not as "x1,y1,z1;x2,y2,z2" which would probably need shuffling).

bitshifter wrote:
And if you want to see some mind blowing 3x concatenated rotations, just ask...
what do you mean by 3x concatenated rotations, you mean in all 3dimensions?

_________________
Previously known as The_Grey_Beast
20 Dec 2009, 23:51
bitshifter

Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 21 Dec 2009, 00:01
Yes, by cashing the matrix and pumping all the vertices through it.

And yes, concatenation in 3 dimensions all in one shot.

Note: The order of multiple concatenated rotations produces different results as expected.

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
21 Dec 2009, 00:01
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum