flat assembler
Message board for the users of flat assembler.

Index > Main > ..

Author
Thread Post new topic Reply to topic
pool



Joined: 08 Jan 2007
Posts: 97
pool 18 Dec 2009, 12:06
..


Last edited by pool on 17 Mar 2013, 12:06; edited 1 time in total
Post 18 Dec 2009, 12:06
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20524
Location: In your JS exploiting you and your system
revolution 18 Dec 2009, 12:54
Code:
addpd xmm0,xmm1    
Can you be more specific about what you are trying to do?

Show some code you are having trouble with and perhaps we can help you fix it.
Post 18 Dec 2009, 12:54
View user's profile Send private message Visit poster's website Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt 20 Dec 2009, 09:53
Maybe this should be moved to the 'MAIN" forum.
Post 20 Dec 2009, 09:53
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 20 Dec 2009, 17:44
I have a small 4x4 matrix library that Madis and i have been developing...
If 3D transformations with SSE interest you then i can post some nice code Smile
Post 20 Dec 2009, 17:44
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 20 Dec 2009, 20:23
do you have 3x3 matrix instead? I would be interested in that. I prefer less overhead and to do translations more optimized.
Post 20 Dec 2009, 20:23
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 20 Dec 2009, 22:21
Using the whole 128bit register saves me from shuffling all the time.
More overhead, yes, faster, yes.

Ok, here is a small example, first in C, then in SSE2
Code:
void TranslateMatrix_XYZ(float m[16], const float v[3])
{
   m[12] += m[0] * v[0] + m[4] * v[1] + m[8]  * v[2];
   m[13] += m[1] * v[0] + m[5] * v[1] + m[9]  * v[2];
   m[14] += m[2] * v[0] + m[6] * v[1] + m[10] * v[2];
   m[15] += m[3] * v[0] + m[7] * v[1] + m[11] * v[2];
}
    

Code:
; g_translation is vector [X,Y,Z,W=1.0]
; data aligned on 16 byte bounadary

        movaps  xmm0,dqword[g_translation]
        movaps  xmm1,dqword[g_matrix]
        movaps  xmm2,dqword[g_matrix+16]
        movaps  xmm3,dqword[g_matrix+32]
        movaps  xmm4,dqword[g_matrix+48]
        pshufd  xmm5,xmm0,00000000b
        pshufd  xmm6,xmm0,01010101b
        pshufd  xmm7,xmm0,10101010b
        mulps   xmm1,xmm5
        mulps   xmm2,xmm6
        mulps   xmm3,xmm7
        addps   xmm4,xmm1
        addps   xmm4,xmm2
        addps   xmm4,xmm3
        movaps  dqword[g_matrix+48],xmm4
    


I also have these routines already implemented and tested...

CopyMatrix
IdentityMatrix
TransposeMatrix
RotateMatrix_X
RotateMatrix_Y
RotateMatrix_Z
RotateMatrix_XYZ
ScaleMatrix_X
ScaleMatrix_Y
ScaleMatrix_Z
ScaleMatrix_XYZ
TranslateMatrix_X
TranslateMatrix_Y
TranslateMatrix_Z
TranslateMatrix_XYZ
MultiplyMatrices
TransformVector
Post 20 Dec 2009, 22:21
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 20 Dec 2009, 23:27
hmm I thought you were doing 4 transformations at a time (so any matrix size would be suitable), not the whole matrix at a time. Actually now that I think of it, a 3x4 matrix would be even better with that approach.

(i.e you transform 4 vectors at a time)

thanks for the code, i'm a noob in SSE and it proves helpful Smile

_________________
Previously known as The_Grey_Beast
Post 20 Dec 2009, 23:27
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 20 Dec 2009, 23:45
For transforming vectors i setup the transformation matrix once,
then batch process all my vertices through it, its nice and fast that way.
Now dont confuse matrix transformations with vector transformations...
To transform a vector would be like...
Code:
void TransformVector(float v[3], const float m[16])
{
   const float x = v[0];
   const float y = v[1];
   const float z = v[2];

   v[0] = m[0] * x + m[4] * y + m[8]  * z + m[12];
   v[1] = m[1] * x + m[5] * y + m[9]  * z + m[13];
   v[2] = m[2] * x + m[6] * y + m[10] * z + m[14];
}
    

Code:
; g_vector is dqword [X,Y,Z,W=1.0]
; All data aligned on 16 byte boundary

        movaps  xmm0,dqword[g_vector]
        movaps  xmm1,dqword[g_matrix]
        movaps  xmm2,dqword[g_matrix+16]
        movaps  xmm3,dqword[g_matrix+32]
        movaps  xmm4,dqword[g_matrix+48]
        pshufd  xmm5,xmm0,00000000b
        pshufd  xmm6,xmm0,01010101b
        pshufd  xmm7,xmm0,10101010b
        mulps   xmm1,xmm5
        mulps   xmm2,xmm6
        mulps   xmm3,xmm7
        addps   xmm4,xmm1
        addps   xmm4,xmm2
        addps   xmm4,xmm3
        movaps  dqword[g_vector],xmm4
    

We only need 3 dot products but we get 4
and trash the vector.w in the process Sad
So you would need to preserve it somehow....

Also i would like to thank Madis for tutoring me with SIMD optimizations.
Sometimes i do good and produce very fast code, other times he kicks my butt.
We usually ponder a routine for a while and see who can do the best.
And of course, someone else may spot ways to do things better, as always...

And if you want to see some mind blowing 3x concatenated rotations, just ask...
Post 20 Dec 2009, 23:45
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 20 Dec 2009, 23:51
Yes well transforming vectors is the slow process -- you have to do it for each vertex, and that's the slow part, matrix is usually done once per object.

Sorry for noob question but can't you use SSE in this way to not waste anything at all:

put x1,x2,x3,x4 in the registers and then do it with the same algorithms like a normal (non-SSE) vector transformation... but you do 4 vectors at a time as you can see.

isn't that possible, or does it require lots of shuffling? (I'm of course also talking about storing the vertices like that in memory, not as "x1,y1,z1;x2,y2,z2" which would probably need shuffling).

bitshifter wrote:
And if you want to see some mind blowing 3x concatenated rotations, just ask...
what do you mean by 3x concatenated rotations, you mean in all 3dimensions?

_________________
Previously known as The_Grey_Beast
Post 20 Dec 2009, 23:51
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 21 Dec 2009, 00:01
Yes, by cashing the matrix and pumping all the vertices through it.

And yes, concatenation in 3 dimensions all in one shot.

Note: The order of multiple concatenated rotations produces different results as expected.

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
Post 21 Dec 2009, 00:01
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.