flat assembler
Message board for the users of flat assembler.

 Index > Main > vector math with FASM
Author
 Thread
vo1d

Joined: 25 Jan 2005
Posts: 1
Location: Ukraine, Kremenchug
vo1d 25 Jan 2005, 08:48
Hello! There is function in C
Code:
```inline void vadd(float* a, float* b)
{
a[0] += b[0]
a[1] += b[1]
a[2] += b[2]
}
```

1) how to convert it to asm
2) how to call it from C
3) is converted function in FASM will run faster then in C

Thanks
25 Jan 2005, 08:48
Madis731

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 25 Jan 2005, 10:59
Code:
```alfa dd 1.0,2.0,3.0
beta dd 4.0,5.0,6.0

;...

proc vadd,a,b
mov eax,[a]
mov ebx,[b]
fld dword[eax+0]
fld dword[ebx+0]
fadd st0,st1
fst dword[eax+0]
fld dword[eax+4]
fld dword[ebx+4]
fadd st0,st1
fst dword[eax+4]
fld dword[eax+8]
fld dword[ebx+8]
fadd st0,st1
fst dword[eax+8]
endp

;and later...

stdcall vadd,alfa,beta
```
25 Jan 2005, 10:59
IronFelix

Joined: 09 Dec 2004
Posts: 141
Location: Russia, Murmansk region
IronFelix 25 Jan 2005, 11:27
Hello, guys!
Let me try to ask your questions, vo1d.
1) The code of such function is above.
2) To call it from C you need to know about DLL in FASM and all concerned. Or you can try to use abstract classes (interfaces) in C and appropriate code in FASM, but it also requires knowledge about DLL. It is not hard to code such thing.
3) Function will be faster if you use some specific instruction (SSE, 3DNow!, SSE2). In the above code there is a thing (i suppose), which decreases speed:
proc is a macro, and if it has parameters (passed to function) it makes stack frame (instruction 'enter' and 'leave' or their equivalents 'push ebp/mov ebp,esp' and 'mov esp,ebp/pop ebp'). In such a small function you can avoid this and get your parameters through esp register. It will reduce code size and increase speed.
Some words about Asm and C++ (i likes both languages, but prefer Asm). I think that there is no language today which allows to write such fast and small code like Asm does. And i don't think that it will ever appear (or in the near future).
Regards.
25 Jan 2005, 11:27
IronFelix

Joined: 09 Dec 2004
Posts: 141
Location: Russia, Murmansk region
IronFelix 25 Jan 2005, 11:39
Some words about offered function:
I think, that it would be better to code like this:

vadd:
a equ esp+4
b equ esp+8

mov eax,[a]
mov edx,[b] ; ebx must be stored before it is used, but edx mustn't
fld dword [eax]
fadd dword [edx]
fstp [eax] ; FPU is now empty
fld dword [eax+4]
fadd dword [edx+4]
fstp [eax+4] ; FPU is now empty
fld dword [eax+8]
fadd dword [edx+8]
fstp [eax+8] ; FPU is now empty
retn 8

Madis731, if you leave FPU not empty, it will decrease speed (especially on Pentium) significantly if you use your function many times (when all registers have values, on next fld there will be slow down). And in my code i avoided stack frame instructions.
Regards.
25 Jan 2005, 11:39
Madis731

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 25 Jan 2005, 15:36
It was just a straight-forward answer to his question - I know you (and me too ) can make it better.
It is general code - if you provide me a specific program, I'll try to optimize it for the program in hand.

FPU generally is slow - SSE FP instructions are faster and can be pipelined and serialized etc....
25 Jan 2005, 15:36
IronFelix

Joined: 09 Dec 2004
Posts: 141
Location: Russia, Murmansk region
IronFelix 26 Jan 2005, 06:08
Thanks for your comment,Madis731!
Excuse me for my answers, but when i see a code which i can optimize, i can't just look at it.
Thanks again.
Regards.
26 Jan 2005, 06:08
S.T.A.S.

Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 27 Jan 2005, 18:49
vo1d wrote:
inline...

Well, I belive there's no way to represent this with *standalone* assembler:
Quote:
The inline specifiers instruct the compiler to insert a copy of the function body into each place the function is called

So, probably, inline asm is only choice. In MSVC (it uses ugly masm syntax ) it will be something like (with SSE):
Code:
```// oparands should be aligned on 16 bytes boundary
// or use movups
inline void add_4Xfloat(float* a, float* b)
{
__asm   mov     ecx, a
__asm   mov     edx, b
__asm   movaps  xmm0, qword ptr [edx]
__asm   addps   xmm0, qword ptr [ecx]
__asm   movaps  qword ptr [ecx], xmm0
}
//
float a[4] = {1.0,2.0,3.0,4.0};
float b[4] = {1.5,2.5,3.5,4.5};
//
add_4Xfloat(a,b);    ```

or may be intel C++ compiler's "_mm_add_ps" instrinsic...
27 Jan 2005, 18:49
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum

Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.