flat assembler
Message board for the users of flat assembler.

Index > Main > vector math with FASM

Author
Thread Post new topic Reply to topic
vo1d



Joined: 25 Jan 2005
Posts: 1
Location: Ukraine, Kremenchug
vo1d
Hello! There is function in C
Code:
inline void vadd(float* a, float* b)
{
 a[0] += b[0]
 a[1] += b[1]
 a[2] += b[2]
}
    


1) how to convert it to asm
2) how to call it from C
3) is converted function in FASM will run faster then in C

Thanks
Post 25 Jan 2005, 08:48
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Code:
alfa dd 1.0,2.0,3.0
beta dd 4.0,5.0,6.0

;...


proc vadd,a,b
mov eax,[a]
mov ebx,[b]
fld dword[eax+0]
fld dword[ebx+0]
fadd st0,st1
fst dword[eax+0]
fld dword[eax+4]
fld dword[ebx+4]
fadd st0,st1
fst dword[eax+4]
fld dword[eax+8]
fld dword[ebx+8]
fadd st0,st1
fst dword[eax+8]
endp

;and later...

stdcall vadd,alfa,beta
    
Post 25 Jan 2005, 10:59
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
IronFelix



Joined: 09 Dec 2004
Posts: 141
Location: Russia, Murmansk region
IronFelix
Hello, guys!
Let me try to ask your questions, vo1d.
1) The code of such function is above.
2) To call it from C you need to know about DLL in FASM and all concerned. Or you can try to use abstract classes (interfaces) in C and appropriate code in FASM, but it also requires knowledge about DLL. It is not hard to code such thing.
3) Function will be faster if you use some specific instruction (SSE, 3DNow!, SSE2). In the above code there is a thing (i suppose), which decreases speed:
proc is a macro, and if it has parameters (passed to function) it makes stack frame (instruction 'enter' and 'leave' or their equivalents 'push ebp/mov ebp,esp' and 'mov esp,ebp/pop ebp'). In such a small function you can avoid this and get your parameters through esp register. It will reduce code size and increase speed.
Some words about Asm and C++ (i likes both languages, but prefer Asm). I think that there is no language today which allows to write such fast and small code like Asm does. And i don't think that it will ever appear (or in the near future).
Regards.
Post 25 Jan 2005, 11:27
View user's profile Send private message Reply with quote
IronFelix



Joined: 09 Dec 2004
Posts: 141
Location: Russia, Murmansk region
IronFelix
Some words about offered function:
I think, that it would be better to code like this:

vadd:
a equ esp+4
b equ esp+8

mov eax,[a]
mov edx,[b] ; ebx must be stored before it is used, but edx mustn't
fld dword [eax]
fadd dword [edx]
fstp [eax] ; FPU is now empty
fld dword [eax+4]
fadd dword [edx+4]
fstp [eax+4] ; FPU is now empty
fld dword [eax+8]
fadd dword [edx+8]
fstp [eax+8] ; FPU is now empty
retn 8

Madis731, if you leave FPU not empty, it will decrease speed (especially on Pentium) significantly if you use your function many times (when all registers have values, on next fld there will be slow down). And in my code i avoided stack frame instructions.
Regards.
Post 25 Jan 2005, 11:39
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
It was just a straight-forward answer to his question - I know you (and me too Very Happy) can make it better.
It is general code - if you provide me a specific program, I'll try to optimize it for the program in hand.

FPU generally is slow - SSE FP instructions are faster and can be pipelined and serialized etc....
Post 25 Jan 2005, 15:36
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
IronFelix



Joined: 09 Dec 2004
Posts: 141
Location: Russia, Murmansk region
IronFelix
Thanks for your comment,Madis731!
Excuse me for my answers, but when i see a code which i can optimize, i can't just look at it.
Thanks again.
Regards.
Post 26 Jan 2005, 06:08
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S.
vo1d wrote:
inline...

Well, I belive there's no way to represent this with *standalone* assembler:
Quote:
The inline specifiers instruct the compiler to insert a copy of the function body into each place the function is called

So, probably, inline asm is only choice. In MSVC (it uses ugly masm syntax Sad) it will be something like (with SSE):
Code:
// oparands should be aligned on 16 bytes boundary
// or use movups
inline void add_4Xfloat(float* a, float* b)
{
    __asm   mov     ecx, a
    __asm   mov     edx, b
    __asm   movaps  xmm0, qword ptr [edx]
    __asm   addps   xmm0, qword ptr [ecx]
    __asm   movaps  qword ptr [ecx], xmm0
} 
//
float a[4] = {1.0,2.0,3.0,4.0};
float b[4] = {1.5,2.5,3.5,4.5};
//
add_4Xfloat(a,b);    

or may be intel C++ compiler's "_mm_add_ps" instrinsic...
Post 27 Jan 2005, 18:49
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.