flat assembler
Message board for the users of flat assembler.
Index
> Main > vector math with FASM |
Author |
|
Madis731 25 Jan 2005, 10:59
Code: alfa dd 1.0,2.0,3.0 beta dd 4.0,5.0,6.0 ;... proc vadd,a,b mov eax,[a] mov ebx,[b] fld dword[eax+0] fld dword[ebx+0] fadd st0,st1 fst dword[eax+0] fld dword[eax+4] fld dword[ebx+4] fadd st0,st1 fst dword[eax+4] fld dword[eax+8] fld dword[ebx+8] fadd st0,st1 fst dword[eax+8] endp ;and later... stdcall vadd,alfa,beta |
|||
25 Jan 2005, 10:59 |
|
IronFelix 25 Jan 2005, 11:27
Hello, guys!
Let me try to ask your questions, vo1d. 1) The code of such function is above. 2) To call it from C you need to know about DLL in FASM and all concerned. Or you can try to use abstract classes (interfaces) in C and appropriate code in FASM, but it also requires knowledge about DLL. It is not hard to code such thing. 3) Function will be faster if you use some specific instruction (SSE, 3DNow!, SSE2). In the above code there is a thing (i suppose), which decreases speed: proc is a macro, and if it has parameters (passed to function) it makes stack frame (instruction 'enter' and 'leave' or their equivalents 'push ebp/mov ebp,esp' and 'mov esp,ebp/pop ebp'). In such a small function you can avoid this and get your parameters through esp register. It will reduce code size and increase speed. Some words about Asm and C++ (i likes both languages, but prefer Asm). I think that there is no language today which allows to write such fast and small code like Asm does. And i don't think that it will ever appear (or in the near future). Regards. |
|||
25 Jan 2005, 11:27 |
|
IronFelix 25 Jan 2005, 11:39
Some words about offered function:
I think, that it would be better to code like this: vadd: a equ esp+4 b equ esp+8 mov eax,[a] mov edx,[b] ; ebx must be stored before it is used, but edx mustn't fld dword [eax] fadd dword [edx] fstp [eax] ; FPU is now empty fld dword [eax+4] fadd dword [edx+4] fstp [eax+4] ; FPU is now empty fld dword [eax+8] fadd dword [edx+8] fstp [eax+8] ; FPU is now empty retn 8 Madis731, if you leave FPU not empty, it will decrease speed (especially on Pentium) significantly if you use your function many times (when all registers have values, on next fld there will be slow down). And in my code i avoided stack frame instructions. Regards. |
|||
25 Jan 2005, 11:39 |
|
Madis731 25 Jan 2005, 15:36
It was just a straight-forward answer to his question - I know you (and me too ) can make it better.
It is general code - if you provide me a specific program, I'll try to optimize it for the program in hand. FPU generally is slow - SSE FP instructions are faster and can be pipelined and serialized etc.... |
|||
25 Jan 2005, 15:36 |
|
IronFelix 26 Jan 2005, 06:08
Thanks for your comment,Madis731!
Excuse me for my answers, but when i see a code which i can optimize, i can't just look at it. Thanks again. Regards. |
|||
26 Jan 2005, 06:08 |
|
S.T.A.S. 27 Jan 2005, 18:49
vo1d wrote: inline... Well, I belive there's no way to represent this with *standalone* assembler: Quote: The inline specifiers instruct the compiler to insert a copy of the function body into each place the function is called So, probably, inline asm is only choice. In MSVC (it uses ugly masm syntax ) it will be something like (with SSE): Code: // oparands should be aligned on 16 bytes boundary // or use movups inline void add_4Xfloat(float* a, float* b) { __asm mov ecx, a __asm mov edx, b __asm movaps xmm0, qword ptr [edx] __asm addps xmm0, qword ptr [ecx] __asm movaps qword ptr [ecx], xmm0 } // float a[4] = {1.0,2.0,3.0,4.0}; float b[4] = {1.5,2.5,3.5,4.5}; // add_4Xfloat(a,b); or may be intel C++ compiler's "_mm_add_ps" instrinsic... |
|||
27 Jan 2005, 18:49 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.