flat assembler
Message board for the users of flat assembler.

Index > Tutorials and Examples > FPU Practice Source

Author
Thread Post new topic Reply to topic
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 25 Nov 2018, 14:17
This simple source attached is a minimal version of BASELIB / CPULIB focusing on FPU programming. These functions should work out-of-the box. Functions were extracted from CPU2.0 routines, and a few improvements were made to selected ones. It is intended for those who

1. Want to try out of FPU just for the sake of it
2. Want to learn FPU from the basics
3. Refresh their FPU programming skill
4. Quick check on FPU environment, floating-point binary format and similar stuff.

This source is exposing high-level side of FASMW, where extracted low-level functions (from CPULIB) were transformed into high-level self-sufficient units, with help from C library.

This source observes _fastcall calling convention on Win64. Sorry, no Linux64 version. For low-level Linux64 version, just use CPULIB. But hey, FPU is FPU no matter on what platform.
Examples, and output:
Code:
format PE64 console
include 'win64axp.inc'
entry main

section '.data' data readable writeable
y dq -10.012644

section '.text' code readable executable
main:
        sub     rsp,40
        finit
        fldpi

;Example 1: Using fpu_stack to view FPU registers
        call    fpu_stack   ;or ccall fpu_stack
        call    prnline

;Example 2: View a double-precision format using native CALL
        movq    xmm0,[y]
        call    fpdinfo ;ccall fpdinfo,float [y]
        call    prnline

;Example 3: View a single-precision format using CCALL / FASTCALL
        ccall   fpfinfo,float dword 0.012345
        call    prnline

        call    exith    

Output:
Code:
;Example 1
[FPU in REAL10]
st0: +3.141592653589793238
st1: ...
st2: ...
st3: ...
st4: ...
st5: ...
st6: ...
st7: ...

;Example 2
-10.012644 = C0240679463CFB33
1.10000000010.0100000001100111100101000110001111001111101100110011
-          +1.2515805000000000
S.Expnent1023.Mantissa
 .1026-1023=3

;Example 3
0.012345 = 3C4A42AF
0.01111000.10010100100001010101111
+       +1.58016
S.Expnt127.Mantissa
 .120-127=-7    


NOTE: Extended-Precision routine is not rounded. I want to keep it that way to preserve its original format as per FPU/hardware output. "prnflt" however is rounded as I wrote it using SIMD instructions.

Have fun with floating-point and vector instructions.

p/s In some routines, FPU are not preserved. You can save them yourself as shown in other routines.

EDIT: December 4,2018
Added forward view versions of all MMX, XMM and YMM registers in addition to default reversed view. So, if you want to view the registers dump in forward orientation, use

1. dumpxmmf (view XMM dumps in forward)
2. dumpmmxf (view YMM dumps in forward)
3. dumpymmf (view MMX dumps in forward)

Or else, in reversed orientations, use the default orientation

1. dumpxmm (view XMM dumps in reversed view)
2. dumpmmx (view YMM dumps in reversed view)
3. dumpymm (view MMX dumps in reversed view)

Added: Binary (fplib.obj).
Removed: Sources

EDIT: I transferred all the attachments to CPU2.0 just in case you need it.


Last edited by fasmnewbie on 21 Dec 2018, 10:37; edited 20 times in total
Post 25 Nov 2018, 14:17
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 25 Nov 2018, 15:02
The reason why FPU arithmetic output is different between Win and Linux in the same code setting is due to the default FPU precision mode being employed by both OSes. Many coders fail to appreciate the importance of setting the precision mode (PC bits) to the highest, focusing only on rounding mode.

For example, using fpu_cflag, one can see that that Windows default to double-precision instead of extended-precision. The FPU on the contrary, defaults to extended-precision. That means, Windows intentionally lowers the FPU precision for your PC for reasons that I still don't understand. Example;

Code:
format PE64 console
include 'win64axp.inc'
entry main

section '.data' data readable writeable
y dq 10.0
x dq 3.0

section '.text' code readable executable
main:
        sub     rsp,40

;See Windows 10 default Precision Mode (PC bits)
;Watch PC PC bits
        call    fpu_cflag
        call    prnline

;Change the Precision Mode to Extended Precision so all
;your FPU arithmetics (FDIV,FMUL,FADD) are more precised.
        ccall   fpu_precision,1
        call    fpu_cflag
        call    prnline

        call    exith    

Output
Code:
        IC RC RC PC PC IEM   PM UM OM ZM DM IM
0  0  0  0  0  0  1  0  0  1  1  1  1  1  1  1 [0x27F]

        IC RC RC PC PC IEM   PM UM OM ZM DM IM
0  0  0  0  0  0  1  1  0  1  1  1  1  1  1  1 [0x37F]    


You can prove it with real FPU code and the supplied routines
Code:
        ;ccall   fpu_precision,1 ;Enable this, you get 0.300xxx
        call    fpu_cflag
        fld     [y] ;10.0
        fld     [x] ;3.0
        fdiv    st0,st1          ;if not, you get 0.2999...888
        call    fpu_stack    

Output
Code:
        IC RC RC PC PC IEM   PM UM OM ZM DM IM
0  0  0  0  0  0  1  0  0  1  1  1  1  1  1  1 [0x27F]
[FPU in REAL10]
st0: +0.2999999999999999888
st1: +10.00000000000000000
st2: ...
st3: ...
st4: ...
st5: ...
st6: ...
st7: ...    


Last edited by fasmnewbie on 07 Dec 2018, 00:24; edited 1 time in total
Post 25 Nov 2018, 15:02
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 02 Dec 2018, 18:57
I updated the attachment to include full range of floating-point helper routines dealing with MMX, SSE and AVX in addition to FPU. For AVX/AVX2, dealing with bytes may require some wide console buffer. This is a complete floating-point set you can find and use in a single source. Practice makes perfect Very Happy

I also updated BASELIB / CPULIB to Revision 4.1.6.

Enjoy.

p.s Names have been changed to reflect similar routines in BASELIB/CPULIB.
Post 02 Dec 2018, 18:57
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 04 Dec 2018, 10:41
Before I forgot, here's how one can use "fplib.obj" from C, to deal, view or handle float data type.
Code:
//gcc -m64 this.c fplib.obj -s -o this.exe
#include <stdio.h>

extern void dumpxmmf(unsigned long long);
extern void prnflt(float);

int main()
{
        float x[] = {45.34,12.11,-16.54,14.17,90.12,11.12,54.11};
        float y[] = {45.34,12.11,16.54,14.17,90.12,11.12,54.11};
        float z=0.0;
        
        z = x[2]*y[0];  // -16.54 * 45.34 = -749.9236
        dumpxmmf(8);    //view result in XMM, as floats
        prnflt(z);      //Display xmm0 as float
        putchar('\n');
        prnflt(x[1]);   //Display float element
        return 0;
}    


The supposed output (may vary on your PC or GCC)
Code:
PACKED SINGLES  High<-Low
 xmm0: 0.0| 0.0| 0.0| -749.9236|
 xmm1: 0.0| 0.0| 0.0| -16.54|
 xmm2: 0.0| 0.0| 0.0| 0.0|
 xmm3: 0.0| 0.0| 0.0| 0.0|
 xmm4: 0.0| 0.0| 0.0| 0.0|
 xmm5: 0.0| 0.0| 0.0| 0.0|
 xmm6: 0.0| 0.0| 0.0| 0.0|
 xmm7: 0.0| 0.0| 0.0| 0.0|
 xmm8: 0.0| 0.0| 0.0| 0.0|
 xmm9: 0.0| 0.0| 0.0| 0.0|
xmm10: 0.0| 0.0| 0.0| 0.0|
xmm11: 0.0| 0.0| 0.0| 0.0|
xmm12: 0.0| 0.0| 0.0| 0.0|
xmm13: 0.0| 0.0| 0.0| 0.0|
xmm14: 0.0| 0.0| 0.0| 0.0|
xmm15: 0.0| 0.0| 0.0| 0.0|
-749.9236
12.11    

Hope they will be useful.
Post 04 Dec 2018, 10:41
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 06 Dec 2018, 16:05
Caveats:

While I did say that one can't call prndblx from (MingW64) C/C++, it turned out that it works but with slightly funny trailing digits. You need to use long double type to feed into prndblx. Something like below;

Code:
//gcc -m64 this.c fplib.obj -s -o this.exe
extern void prndblx(long double);   //This is actually a pointer to a string. LOL.
extern void fpdinfo(double);
extern void fpfinfo(float);
extern void prnline();

int main()
{
        double x=-567779.25;
        long double z=2829323214.534143259;
        float y=0.982313;

        fpdinfo(x);
        prnline();
        fpfinfo(y);
        prnline();
        prndblx(z);

}    


output:
Code:
-567779.250000 = C12153C680000000
1.10000010010.0001010100111100011010000000000000000000000000000000
-          +1.0829529762268066
S.Expnent1023.Mantissa
 .1042-1023=19

0.982313 = 3F7B78DD
0.01111110.11110110111100011011101
+       +1.964626
S.Expnt127.Mantissa
 .126-127=-1

+2829323214.534143447    


This might be useful because it turns out that MinGW64 and MSVCRT are not full IEEE-754 compliant. It also surprises me that those punks at Microsoft / MingW are translating a "long double" type as a pointer to a constant string just like prndblx. No wonder why your FPU Precision bits are intentionally lowered by Windows every time you start your PC.
Post 06 Dec 2018, 16:05
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 06 Dec 2018, 16:33
Hmm.. ok, I decided to throw in another example of fplib. This time to see how vector shuffling works on MMX registers using series of PSHUFW instructions. Note if you are more comfortable viewing in reversed use dumpmmx instead of dumpmmxf.

Code:
;------------------------------
; fasm this.asm
; gcc -m64 this.obj fplib.obj -s -o this.exe
;------------------------------
        format MS64 COFF
        public main

        extrn dumpmmxf
        extrn prnline

        section '.data' data readable writeable align 32
        x dw 10,20,30,40,50,60,70,80

        section '.text' code readable executable
main:
        sub     rsp,40

        movq    mm0,qword[x]   ;feed to MM0
        mov     ecx,2          ;view as unsigned WORDS
        call    dumpmmxf       ;view in forward direction
        call    prnline

        ;byte permutations
        pshufw  mm4,mm0,0
        pshufw  mm5,mm0,1
        pshufw  mm6,mm0,2
        pshufw  mm7,mm0,3

        mov     ecx,2          ;view as unsigned WORDS
        call    dumpmmxf       ;view in forward direction
        add     rsp,40
        ret    


Output
Code:
PACKED UNSIGNED WORDS  High<-Low
mm0:    40|   30|   20|   10|
mm1:     0|    0|    0|    0|
mm2:     0|    0|    0|    0|
mm3:     0|    0|    0|    0|
mm4:     0|    0|    0|    0|
mm5:     0|    0|    0|    0|
mm6:     0|    0|    0|    0|
mm7:     0|    0|    0|    0|

PACKED UNSIGNED WORDS  High<-Low
mm0:    40|   30|   20|   10|
mm1:     0|    0|    0|    0|
mm2:     0|    0|    0|    0|
mm3:     0|    0|    0|    0|
mm4:    10|   10|   10|   10|
mm5:    10|   10|   10|   20|
mm6:    10|   10|   10|   30|
mm7:    10|   10|   10|   40|    


Edit: Removed the single attachment "fplib.lib". Just use the attachment in the first post.
Post 06 Dec 2018, 16:33
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.