IEEE 754

Index > Main > IEEE 754

Author

Thread

Ali.Z

Joined: 08 Jan 2018
Posts: 822

Ali.Z 23 Nov 2018, 18:17

im only interested in:
- half precision
- single precision
- double precision

im pretty much sure the algorithm (*) is the same for each, what driving me nuts is:

how to convert (for example) 1.25 from text form to IEEE-754 single precision floating point value.

1 is 1 in hex
25 is 19 in hex

sure i can use cvtsi2ss but what about the decimal 25? cvtsi2ss is useful only to non-decimal values.

* read couple posts about different algorithms, seen some video tutorials too ... but they were not explaining much stuff or maybe hiding some useful info.

also:
found a copy of both IEEE 754-1985 and IEEE 754-2008
but didnt find any algorithm there, so idk what im missing but i wanna learn this stuff.

_________________
Asm For Wise Humans

23 Nov 2018, 18:17

Furs

Joined: 04 Mar 2016
Posts: 2652

Furs 23 Nov 2018, 18:59

Decimals are "different", you need to think of them differently. 0.5 is half of 1, so in binary it is 0.1 (with one fractional digit, you have 0.0, 0.1, 1.0, etc, basically 0.5 step increments in decimal). So it's a different "5" than the 5 from 5.0 (left of decimal).

A simple way, but not very good (because of overflow), is to convert it from integer as before (e.g. via cvtsi2ss as you mentioned), and then divide by 10^digits. For example, say you have 5 decimal digits:

Code:

42.12345

So you'd take 12345 as an integer, convert it to float, and divide (multiply by inverse) by 10^5, or 100000 (5 zeros). That gets you the fraction. Then add the integer part (42) to get the number.

It's simple but has problems with overflow, only good if you have limited number of fractional digits.

23 Nov 2018, 18:59

redsock

Joined: 09 Oct 2009
Posts: 438
Location: Australia

redsock 23 Nov 2018, 19:20

This is a hard (IMO) problem space... Some very good reading (and how I chose to implement it in my own library) that also explains the binary format and helped me grok it: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf ... as well as searching for string$to_double and string$from_double in my implementation: https://2ton.com.au/library_as_html/string32.inc.html ...

The 3/10 problem (as explained in the above paper) in my implementation goes <-> 0.3 (from_double and to_double again produces the same binary format). Using naive/faster implementations, 0.3 becomes 0.29999... etc. There is no simple way that I am aware of to deal with all of that hahah, messy Smile

_________________
2 Ton Digital - https://2ton.com.au/

23 Nov 2018, 19:20

Mikl___

Joined: 30 Dec 2014
Posts: 143
Location: Russian Federation, Irkutsk

Mikl___ 23 Nov 2018, 19:28

Hi, Ali.A! Did you want to find an algorithm?
Your sample
+1.25=(-1)^{Sign}x 2^{exponent} x 1,mantissa
+1.25 > 0 --> Sign=0
log2(1.25)=0.321928095 --> exponent=0
exponent in "offset code" = 0 + 127 = 127 = 0x7F = 0111.1111
23 (bits in the mantissa) - 0 (exponent) = 23
mantissa = 2^{23} x (1.25 - 2^{0})=2^{23} x 0.25= 2097152= 0x200000= 010.0000.0000.0000.0000.0000b
1 bit(Sign)+8 bits(exponent)+23 bits(mantissa)=32 bits in single precision floating point value

Code:

Sign|  exponent |mantissa
   0|011 1.111 1|010.0000.0000.0000.0000.0000
      3    F     A    0    0    0    0    0

+1.25=0x3FA00000

Sample of Furs
-42.12345
-42.12345 < 0 Sign=1
log2(42.12345)=5.396551696
exponent in "offset code" = 5 + 127 =132=0x84=1000.0100
23 (bits in the mantissa) - 5 (exponent) = 18
mantissa = 2^{18} x (42.12345 - 2^{5})=2^{18} x 10.12345=2653801.6768~2653802=0x287E6A=010.1000.0111.1110.0110.1010

Code:

Sign|  exponent |mantissa
   1|100 0.010 0|010.1000.0111.1110.0110.1010
      C     2     2    8    7    E    6    A

-42.12345=0xC2287E6A

IEEE-754 Floating Point Converter

Last edited by Mikl___ on 25 Nov 2018, 02:37; edited 7 times in total

23 Nov 2018, 19:28

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 24 Nov 2018, 04:07

redsock wrote:

This is a hard (IMO) problem space... Some very good reading (and how I chose to implement it in my own library) that also explains the binary format and helped me grok it: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf ... as well as searching for string$to_double and string$from_double in my implementation: https://2ton.com.au/library_as_html/string32.inc.html ...

The 3/10 problem (as explained in the above paper) in my implementation goes <-> 0.3 (from_double and to_double again produces the same binary format). Using naive/faster implementations, 0.3 becomes 0.29999... etc. There is no simple way that I am aware of to deal with all of that hahah, messy

IMHO, they should not use those two test cases as opening examples in the first article. The FPU (which is IEEE 754 compliant) does emit correct rounding for 1/3 and 3/10 if they used higher precision mode prior to using FPU's arithmetic instructions (FDIV etc). Those particular 'garbage digits' they mentioned was due to lower precision mode used (53-bit instead of 64-bit) prior to FDIV.

24 Nov 2018, 04:07

Mikl___

Joined: 30 Dec 2014
Posts: 143
Location: Russian Federation, Irkutsk

Mikl___ 25 Nov 2018, 05:02

Ali.A it's MASM

Code:

;                             FpuAtoFL
  ; -----------------------------------------------------------------------
  ; This procedure was written by Raymond Filiatreault, December 2002
  ; Modified January, 2004, to eliminate .data section and remove some
  ; redundant code.
  ; Modified March 2004 to avoid any potential data loss from the FPU
  ; Revised January 2005 to free the FPU st7 register if necessary.
  ; Revised December 2006 to avoid a minuscule error when processing strings
  ; which do not have any decimal digit.
  ;
  ; This FpuAtoFL function converts a decimal number from a zero terminated
  ; alphanumeric string format (Src) to an 80-bit REAL number and returns
  ; the result as an 80-bit REAL number at the specified destination (the
  ; FPU itself or a memory location), unless an invalid operation is
  ; reported by the FPU or the definition of the parameters (with uID) is
  ; invalid.
  ;
  ; The source can be a string in regular numeric format or in scientific
  ; notation. The number of digits (excluding all leading 0's and trailing
  ; decimal 0's) must not exceed 18. If in scientific format, the exponent
  ; must be within +/-4931
  ;
  ; The source is checked for validity. The procedure returns an error if
  ; a character other than those acceptable is detected prior to the
  ; terminating zero or the above limits are exceeded.
  ;
  ; This procedure is based on converting the digits into a specific packed
  ; decimal format which can be used by the FPU and then adjusted for an
  ; exponent of 10.
  ;
  ; Only EAX is used to return error or success. All other CPU registers
  ; are preserved.
  ;
  ; IF the FPU is specified as the destination for the result,
  ;       the st7 data register will become the st0 data register where the
  ;       result will be returned (any valid data in that register would
  ;       have been trashed).
  ;
  ; -----------------------------------------------------------------------
    .386
    .model flat, stdcall  ; 32 bit memory model
    option casemap :none  ; case sensitive
    include Fpu.inc
    .code
FpuAtoFL proc public lpSrc:DWORD, lpDest:DWORD, uID:DWORD

LOCAL content[108] :BYTE
LOCAL tempst       :TBYTE
LOCAL bcdstrf      :TBYTE
LOCAL bcdstri      :TBYTE

      fsave content
      push  ebx
      push  ecx
      push  edx
      push  esi
      push  edi
      xor   eax,eax
      xor   ebx,ebx
      xor   edx,edx
      lea   edi,bcdstri
      stosd
      stosd
      stosd
      stosd
      stosd
      lea   edi,bcdstri+8
      mov   esi,lpSrc
      mov   ecx,18
@@:      lodsb
      cmp   al," "
      jz    @B                ;eliminate leading spaces
      or    al,al             ;is string empty?
      jnz   @F
atoflerr:      frstor content
atoflerr1:      xor   eax,eax
      pop   edi
      pop   esi
      pop   edx
      pop   ecx
      pop   ebx
      ret
;check for leading sign
@@:      cmp   al,"+"
      jz    @F
      cmp   al,"-"
      jnz   integer
      mov   ah,80h
@@:      mov   [edi+1],ah        ;put sign byte in bcd strings
      mov   [edi+11],ah
      xor   eax,eax
      lodsb
;--------------------------------------------
;convert the integer digits to packed decimal
;--------------------------------------------
integer:      cmp   al,"."
      jnz   @F
      lea   edi,bcdstri
      call  load_integer
      lodsb
      lea   edi,bcdstrf+8
      mov   cl,18
      and   bh,4
      jmp   decimals
@@:      cmp   al,"e"
      jnz   @F
      .if   cl == 18
            jmp   atoflerr    ;error if no digit other than 0 before e
      .endif
      lea   edi,bcdstri
      call  load_integer
      jmp   scient
@@:      cmp   al,"E"
      jnz   @F
      .if   cl == 18
            jmp   atoflerr    ;error if no digit other than 0 before E
      .endif
      lea   edi,bcdstri
      call  load_integer
      jmp   scient
@@:      or    al,al
      jnz   @F
      test  bh,4
      jz    atoflerr          ;error if no numerical digit before terminating 0
      lea   edi,bcdstri
      call  load_integer
      jmp   laststep
@@:      sub   al,"0"
      jc    atoflerr          ;unacceptable character
      jnz   @F
      test  bh,2
      jnz   @F
      or    bh,4              ;at least 1 numerical character
      lodsb
      jmp   integer     
@@:      cmp   al,9
      ja    atoflerr          ;unacceptable character
      or    bh,6              ;at least 1 non-zero numerical character
      sub   ecx,1
      jc    atoflerr          ;more than 18 integer digits
      mov   ah,al
      lodsb
      cmp   al,"."
      jnz   @F
      mov   al,0
      ror   ax,4
      mov   [edi],al
      lea   edi,bcdstri
      call  load_integer
      lea   edi,bcdstrf+8
      mov   cl,18
      and   bh,4
      lodsb
      jmp   decimals
@@:      cmp   al,"e"
      jnz   @F
      mov   al,0
      ror   ax,4
      mov   [edi],al
      lea   edi,bcdstri
      call  load_integer
      jmp   scient
@@:      cmp   al,"E"
      jnz   @F
      mov   al,0
      ror   ax,4
      mov   [edi],al
      lea   edi,bcdstri
      call  load_integer
      jmp   scient
@@:      or    al,al
      jnz   @F
      ror   ax,4
      mov   [edi],al
      lea   edi,bcdstri
      call  load_integer
      jmp   laststep
@@:      sub   al,"0"
      jc    atoflerr          ;unacceptable character
      cmp   al,9
      ja    atoflerr          ;unacceptable character
      dec   ecx
      rol   al,4
      ror   ax,4
      mov   [edi],al
      dec   edi
      lodsb
      jmp   integer
;--------------------------------------------
;convert the decimal digits to packed decimal
;--------------------------------------------
decimals:      cmp   al,"e"
      jnz   @F
      lea   edi,bcdstrf
      call  load_decimal
      jmp   scient
@@:      cmp   al,"E"
      jnz   @F
      lea   edi,bcdstrf
      call  load_decimal
      jmp   scient
@@:      or    al,al
      jnz   @F
      test  bh,4
      jz    atoflerr          ;error if no numerical digit before terminating 0
      lea   edi,bcdstrf
      call  load_decimal
      jmp   laststep
@@:      sub   al,"0"
      jc    atoflerr          ;unacceptable character
      cmp   al,9
      ja    atoflerr          ;unacceptable character
      or    bh,4              ;at least 1 numerical character
      .if   al != 0
            or    bh,2
      .endif
      sub   ecx,1
      jnc   @F
      .if   al == 0           ;if trailing decimal 0
            inc   ecx
            lodsb
            jmp   decimals
      .endif
      jmp   atoflerr
@@:      mov   ah,al
decimal1:
      lodsb
      cmp   al,"e"
      jnz   @F
      mov   al,0
      ror   ax,4
      mov   [edi],al
      lea   edi,bcdstrf
      call  load_decimal
      jmp   scient
@@:      cmp   al,"E"
      jnz   @F
      mov   al,0
      ror   ax,4
      mov   [edi],al
      lea   edi,bcdstrf
      call  load_decimal
      jmp   scient
@@:      or    al,al
      jnz   @F
      test  bh,4
      jz    atoflerr          ;error if no numerical digit before terminating 0
      mov   al,0
      ror   ax,4
      mov   [edi],al
      lea   edi,bcdstrf
      call  load_decimal
      jmp   laststep
@@:      sub   al,"0"
      jc    atoflerr          ;unacceptable character
      cmp   al,9
      ja    atoflerr          ;unacceptable character
      .if   al != 0
            or    bh,2        ;at least one non-zero decimal digit
      .endif
      dec   ecx
      rol   al,4
      ror   ax,4
      mov   [edi],al
      dec   edi
      lodsb
      jmp   decimals
laststep:      fstsw ax                ;retrieve exception flags from FPU
      fwait
      shr   al,1              ;test for invalid operation
      jc    atoflerr          ;clean-up and return error
laststep2:      test  uID,DEST_FPU      ;check where result should be stored
      jnz   @F                ;destination is the FPU
      mov   eax,lpDest
      fstp  tbyte ptr[eax]    ;store result at specified address
      jmp   restore
@@:      fstp  tempst            ;store result temporarily
restore:      frstor  content         ;restore all previous FPU registers
      jz    @F
      ffree st(7)             ;free it if not already empty
      fld   tempst
@@:      or    al,1              ;to insure EAX!=0
@@:      pop   edi
      pop   esi
      pop   edx
      pop   ecx
      pop   ebx
      ret
scient:      xor   eax,eax
      xor   edx,edx
      lodsb
      cmp   al,"+"
      jz    @F
      cmp   al,"-"
      jnz   scient1
      stc
      rcr   eax,1     ;keep sign of exponent in most significant bit of EAX
@@:      lodsb                   ;get next digit after sign
scient1:      push  eax
      and   eax,0ffh
      jnz   @F        ;continue if 1st byte of exponent is not terminating 0
scienterr:      pop   eax
      jmp   atoflerr          ;no exponent
@@:      sub   al,30h
      jc    scienterr         ;unacceptable character
      cmp   al,9
      ja    scienterr         ;unacceptable character
      add   edx,edx           ;x2
      lea   edx,[edx+edx*4]   ;x2x5=x10
      add   edx,eax
      cmp   edx,4931
      ja    scienterr         ;exponent too large
      lodsb
      or    al,al
      jnz   @B
      pop   eax               ;retrieve exponent sign flag
      rcl   eax,1             ;is most significant bit set?
      jnc   @F
      neg   edx
@@:      call  XexpY
      fmul
      jmp   laststep
FpuAtoFL endp
; 
;put 10 to the proper exponent (value in EDX) on the FPU
XexpY:      push  edx
      fild  dword ptr[esp]    ;load the exponent
      fldl2t                  ;load log2(10)
      fmul                    ;->log2(10)*exponent
      pop   edx
;at this point, only the log base 2 of the 10^exponent is on the FPU
;the FPU can compute the antilog only with the mantissa
;the characteristic of the logarithm must thus be removed
      fld   st(0)             ;copy the logarithm
      frndint                 ;keep only the characteristic
      fsub  st(1),st          ;keeps only the mantissa
      fxch                    ;get the mantissa on top
      f2xm1                   ;->2^(mantissa)-1
      fld1
      fadd                    ;add 1 back
;the number must now be readjusted for the characteristic of the logarithm
      fscale                  ;scale it with the characteristic
;the characteristic is still on the FPU and must be removed
      fstp  st(1)             ;clean-up the register
      ret
;shifts the packed BCD string of the integers to the integer position
;EDI points to the BCD string
;ECX = count of positions for shifting the BCD string
load_integer:
      push  esi
      .if   cl == 18
            fldz
      .else
            mov   esi,edi
            sub   ecx,18
            neg   ecx
            shr   ecx,1
            push  edi
            .if   !CARRY?     ;even number of integer digits
                  mov   edx,9
                  sub   edx,ecx
                  add   esi,edx
                  rep   movsb
            .else             ;odd number of integer digits
                  mov   edx,8
                  sub   edx,ecx
                  add   esi,edx
                  xor   eax,eax
                  lodsb
                  rol   ax,4
                  test  ecx,ecx
                  .if   !ZERO?
                     @@:
                        rol   ah,4
                        lodsb
                        rol   ax,4
                        stosb
                        dec   ecx
                        jnz   @B
                  .endif
                  mov   [edi],ah
                  inc   edi
            .endif
            mov   ecx,edx
            xor   eax,eax
            rep   stosb
            pop   edi
            fbld  tbyte ptr[edi]
      .endif
      pop   esi
      ret
;converts the decimal portion in the packed BCD string to binary
;EDI points to the BCD string
load_decimal:      test  bh,2
      jnz   @F
      ret
@@:      .if   cl == 18
            fldz
      .else
            fbld  tbyte ptr[edi]
            mov   edx,-18
            call  XexpY
            fmul
      .endif
      fadd
      ret
end

25 Nov 2018, 05:02

Ali.Z

Joined: 08 Jan 2018
Posts: 822

Ali.Z 27 Nov 2018, 17:55

Mikl___ wrote:

algorithm

im bad in math, math names and symbols ... but thank you for spending such time. (much appreciated)

_________________
Asm For Wise Humans

27 Nov 2018, 17:55

donn

Joined: 05 Mar 2010
Posts: 321

donn 27 Nov 2018, 19:38

Conceptually, it's like binary scientific notation.

https://en.wikipedia.org/wiki/Single-precision_floating-point_format

There are only 3 parts as the others have mentioned: sign, exponent, and the fraction.

If you want an exact view of the format:
- There may also be an implicit leading bit which provides some extra precision.
- The exponent does not start at 0. You have to subtract 127 from it in the case of single precision.

Just to recap: With scientific notation, you multiply a representation of a number with only a single digit and a fraction by 10 or 2 to an exponent. This use of an exponent leads to the name floating point and allows really big and small numbers to compress into only a few bits at the cost of imprecision.

So, with binary floating point numbers, the fraction uses a non-decimal base, base 2, and as was already mentioned, the fractional part is not tenths, hundredths, but the equivalent with base 2.

Historically, I think Konrad Zuse came up with the first binary floating point implementation. I find the format intriguing also and tried parsing it a while ago in 32-bit. It can be frustrating not having any indication if the parsing is correct or not until the end. If you still have any interest, I'll post my 32-bit and 64-bit versions this evening. They were only for learning purposes, so I might try improving them with some of redsock's methods.

27 Nov 2018, 19:38

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum