flat assembler
Message board for the users of flat assembler.
Index
> Tutorials and Examples > FPU Practice Source |
Author |
|
fasmnewbie 25 Nov 2018, 14:17
This simple source attached is a minimal version of BASELIB / CPULIB focusing on FPU programming. These functions should work out-of-the box. Functions were extracted from CPU2.0 routines, and a few improvements were made to selected ones. It is intended for those who
1. Want to try out of FPU just for the sake of it 2. Want to learn FPU from the basics 3. Refresh their FPU programming skill 4. Quick check on FPU environment, floating-point binary format and similar stuff. This source is exposing high-level side of FASMW, where extracted low-level functions (from CPULIB) were transformed into high-level self-sufficient units, with help from C library. This source observes _fastcall calling convention on Win64. Sorry, no Linux64 version. For low-level Linux64 version, just use CPULIB. But hey, FPU is FPU no matter on what platform. Examples, and output: Code: format PE64 console include 'win64axp.inc' entry main section '.data' data readable writeable y dq -10.012644 section '.text' code readable executable main: sub rsp,40 finit fldpi ;Example 1: Using fpu_stack to view FPU registers call fpu_stack ;or ccall fpu_stack call prnline ;Example 2: View a double-precision format using native CALL movq xmm0,[y] call fpdinfo ;ccall fpdinfo,float [y] call prnline ;Example 3: View a single-precision format using CCALL / FASTCALL ccall fpfinfo,float dword 0.012345 call prnline call exith Output: Code: ;Example 1 [FPU in REAL10] st0: +3.141592653589793238 st1: ... st2: ... st3: ... st4: ... st5: ... st6: ... st7: ... ;Example 2 -10.012644 = C0240679463CFB33 1.10000000010.0100000001100111100101000110001111001111101100110011 - +1.2515805000000000 S.Expnent1023.Mantissa .1026-1023=3 ;Example 3 0.012345 = 3C4A42AF 0.01111000.10010100100001010101111 + +1.58016 S.Expnt127.Mantissa .120-127=-7 NOTE: Extended-Precision routine is not rounded. I want to keep it that way to preserve its original format as per FPU/hardware output. "prnflt" however is rounded as I wrote it using SIMD instructions. Have fun with floating-point and vector instructions. p/s In some routines, FPU are not preserved. You can save them yourself as shown in other routines. EDIT: December 4,2018 Added forward view versions of all MMX, XMM and YMM registers in addition to default reversed view. So, if you want to view the registers dump in forward orientation, use 1. dumpxmmf (view XMM dumps in forward) 2. dumpmmxf (view YMM dumps in forward) 3. dumpymmf (view MMX dumps in forward) Or else, in reversed orientations, use the default orientation 1. dumpxmm (view XMM dumps in reversed view) 2. dumpmmx (view YMM dumps in reversed view) 3. dumpymm (view MMX dumps in reversed view) Added: Binary (fplib.obj). Removed: Sources EDIT: I transferred all the attachments to CPU2.0 just in case you need it. Last edited by fasmnewbie on 21 Dec 2018, 10:37; edited 20 times in total |
|||
25 Nov 2018, 14:17 |
|
fasmnewbie 02 Dec 2018, 18:57
I updated the attachment to include full range of floating-point helper routines dealing with MMX, SSE and AVX in addition to FPU. For AVX/AVX2, dealing with bytes may require some wide console buffer. This is a complete floating-point set you can find and use in a single source. Practice makes perfect
I also updated BASELIB / CPULIB to Revision 4.1.6. Enjoy. p.s Names have been changed to reflect similar routines in BASELIB/CPULIB. |
|||
02 Dec 2018, 18:57 |
|
fasmnewbie 04 Dec 2018, 10:41
Before I forgot, here's how one can use "fplib.obj" from C, to deal, view or handle float data type.
Code: //gcc -m64 this.c fplib.obj -s -o this.exe #include <stdio.h> extern void dumpxmmf(unsigned long long); extern void prnflt(float); int main() { float x[] = {45.34,12.11,-16.54,14.17,90.12,11.12,54.11}; float y[] = {45.34,12.11,16.54,14.17,90.12,11.12,54.11}; float z=0.0; z = x[2]*y[0]; // -16.54 * 45.34 = -749.9236 dumpxmmf(8); //view result in XMM, as floats prnflt(z); //Display xmm0 as float putchar('\n'); prnflt(x[1]); //Display float element return 0; } The supposed output (may vary on your PC or GCC) Code: PACKED SINGLES High<-Low xmm0: 0.0| 0.0| 0.0| -749.9236| xmm1: 0.0| 0.0| 0.0| -16.54| xmm2: 0.0| 0.0| 0.0| 0.0| xmm3: 0.0| 0.0| 0.0| 0.0| xmm4: 0.0| 0.0| 0.0| 0.0| xmm5: 0.0| 0.0| 0.0| 0.0| xmm6: 0.0| 0.0| 0.0| 0.0| xmm7: 0.0| 0.0| 0.0| 0.0| xmm8: 0.0| 0.0| 0.0| 0.0| xmm9: 0.0| 0.0| 0.0| 0.0| xmm10: 0.0| 0.0| 0.0| 0.0| xmm11: 0.0| 0.0| 0.0| 0.0| xmm12: 0.0| 0.0| 0.0| 0.0| xmm13: 0.0| 0.0| 0.0| 0.0| xmm14: 0.0| 0.0| 0.0| 0.0| xmm15: 0.0| 0.0| 0.0| 0.0| -749.9236 12.11 Hope they will be useful. |
|||
04 Dec 2018, 10:41 |
|
fasmnewbie 06 Dec 2018, 16:05
Caveats:
While I did say that one can't call prndblx from (MingW64) C/C++, it turned out that it works but with slightly funny trailing digits. You need to use long double type to feed into prndblx. Something like below; Code: //gcc -m64 this.c fplib.obj -s -o this.exe extern void prndblx(long double); //This is actually a pointer to a string. LOL. extern void fpdinfo(double); extern void fpfinfo(float); extern void prnline(); int main() { double x=-567779.25; long double z=2829323214.534143259; float y=0.982313; fpdinfo(x); prnline(); fpfinfo(y); prnline(); prndblx(z); } output: Code: -567779.250000 = C12153C680000000 1.10000010010.0001010100111100011010000000000000000000000000000000 - +1.0829529762268066 S.Expnent1023.Mantissa .1042-1023=19 0.982313 = 3F7B78DD 0.01111110.11110110111100011011101 + +1.964626 S.Expnt127.Mantissa .126-127=-1 +2829323214.534143447 This might be useful because it turns out that MinGW64 and MSVCRT are not full IEEE-754 compliant. It also surprises me that those punks at Microsoft / MingW are translating a "long double" type as a pointer to a constant string just like prndblx. No wonder why your FPU Precision bits are intentionally lowered by Windows every time you start your PC. |
|||
06 Dec 2018, 16:05 |
|
fasmnewbie 06 Dec 2018, 16:33
Hmm.. ok, I decided to throw in another example of fplib. This time to see how vector shuffling works on MMX registers using series of PSHUFW instructions. Note if you are more comfortable viewing in reversed use dumpmmx instead of dumpmmxf.
Code: ;------------------------------ ; fasm this.asm ; gcc -m64 this.obj fplib.obj -s -o this.exe ;------------------------------ format MS64 COFF public main extrn dumpmmxf extrn prnline section '.data' data readable writeable align 32 x dw 10,20,30,40,50,60,70,80 section '.text' code readable executable main: sub rsp,40 movq mm0,qword[x] ;feed to MM0 mov ecx,2 ;view as unsigned WORDS call dumpmmxf ;view in forward direction call prnline ;byte permutations pshufw mm4,mm0,0 pshufw mm5,mm0,1 pshufw mm6,mm0,2 pshufw mm7,mm0,3 mov ecx,2 ;view as unsigned WORDS call dumpmmxf ;view in forward direction add rsp,40 ret Output Code: PACKED UNSIGNED WORDS High<-Low mm0: 40| 30| 20| 10| mm1: 0| 0| 0| 0| mm2: 0| 0| 0| 0| mm3: 0| 0| 0| 0| mm4: 0| 0| 0| 0| mm5: 0| 0| 0| 0| mm6: 0| 0| 0| 0| mm7: 0| 0| 0| 0| PACKED UNSIGNED WORDS High<-Low mm0: 40| 30| 20| 10| mm1: 0| 0| 0| 0| mm2: 0| 0| 0| 0| mm3: 0| 0| 0| 0| mm4: 10| 10| 10| 10| mm5: 10| 10| 10| 20| mm6: 10| 10| 10| 30| mm7: 10| 10| 10| 40| Edit: Removed the single attachment "fplib.lib". Just use the attachment in the first post. |
|||
06 Dec 2018, 16:33 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.