flat assembler
Message board for the users of flat assembler.

Index > Windows > struct's, macro's and fpu Vector's

Author
Thread Post new topic Reply to topic
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 13 Nov 2008, 04:10
Hello, i am studying fasm struct's, macro's and fpu Vector operations.

(assuming useage of win32ax.inc and msvcrt.dll)

Here is my double precision Vector structure.
Code:
struct Vector
   x dq ?
   y dq ?
   z dq ?
ends    


This macro calculates the length of a Vector.
Code:
macro GetVectorLength v
{
   fld   qword[v]
   fmul  qword[v]
   fld   qword[v+8]
   fmul  qword[v+8]
   faddp st1,st0
   fld   qword[v+16]
   fmul  qword[v+16]
   faddp st1,st0
   fsqrt
   ; leaves length on fpu stack
}    


This macro normalizes a Vector to unit length.
Code:
macro NormalizeVector v
{
   GetVectorLength v
   fld  qword[v]
   fdiv st0,st1 ; possible divide by zero noted.
   fstp qword[v]
   fld  qword[v+8]
   fdiv st0,st1 ; possible divide by zero noted.
   fstp qword[v+8]
   fld  qword[v+16]
   fdiv st0,st1 ; possible divide by zero noted.
   fstp qword[v+16]
   fstp st0 ; remove length from fpu stack
}    


To test these you can do this.
Code:
; in data section
vec Vector 1.2,3.4,5.6
buf rb 256
fmt db '%f %f %f',0

; in code section
NormalizeVector vec
cinvoke sprintf,buf,fmt,double[vec.x],double[vec.y],double[vec.z]
invoke MessageBox,0,buf,0,0    


So what i would like to know is where can i get info on the cpu cycles for a given asm instruction.
I have some Intel manuals but i couldnt find it in them.

The reason is, lets say i come up with a new way for GetVectorLength like...
Code:
macro GetVectorLength v
{
   fld   qword[v]
   fld   st0
   fmulp st1,st0
   fld   qword[v+8]
   fld   st0
   fmulp st1,st0
   faddp st1,st0
   fld   qword[v+16]
   fld   st0
   fmulp st1,st0
   faddp st1,st0
   fsqrt
   ; leaves length on fpu stack
}    


What i see is that it accesses the vectors variables less, but uses a few more fpu instructions, and without profiling the code i could not just guess at which version is what is faster.
If you have any comments or constructive criticism, my ears are open.
Post 13 Nov 2008, 04:10
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4075
Location: vpcmpistri
bitRAKE 13 Nov 2008, 05:32
Check out Agner Fog's optimization manual/tools. Basically, there are many factors effecting the actual performance of an algorithm; and as the complexity increases the best that can be done it to test and document the conditions under which you are experiencing a particular metric. Providing a test program allow others to offer results for other configurations. Cycle counts can be approximated with RDTSC and performance counters help to see what is going on under the hood.

http://www.agner.org/optimize/

I just watched a talk I thought was decent: Machine Architecture: Things Your Programming Language Never Told You, http://www.nwcpp.org/Meetings/2007/09.html . it applies to assembly language, too.
Post 13 Nov 2008, 05:32
View user's profile Send private message Visit poster's website Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 14 Nov 2008, 02:01
Great set of resources!
The block tester says that my abs() is 3 times faster than C/C++ std::abs()!
One question, the charts show a part in the fpu section about clock cycle overlap if the next instruction is of type fpu.
Can someone explain this to me how this works.
Post 14 Nov 2008, 02:01
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 14 Nov 2008, 02:20
bitshifter wrote:
The block tester says that my abs() is 3 times faster than C/C++ std::abs()!

Which C++ implementation (and version), though? And does it involve CALLing a routine, or is compiler intrinsics used? Are you calling fabs() on float or double values? (on float, it likely involves float->double->fabs()->float on some compilers).

Not trying to bash you, just saying that there's a lot to benchmarking (different CPUs being somewhat of a pain sometimes), and there's no such thing as "C/C++" when discussing speed, you need to name the full platform, since there's quite some differences in regards to standard-library quality and compiler code generation.

Sometimes I wish we were back at the Pentium [U,V] pipes, it was easier to optimize back then. There's a lot of rules these days, sometimes it almost feels that optimizing x86 code these days is nondeterministic Smile (yeah yeah, I suck).

_________________
Image - carpe noctem
Post 14 Nov 2008, 02:20
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.