flat assembler
Message board for the users of flat assembler.
Index
> Main > fpu Store instruction 
Author 

Zoltanmatey31 16 Jan 2023, 20:02
silly me: "fstp accepts the same
operands as the fst instruction and can also store value in the 80–bit memory." WRITTEN RIGHT THERE. 

16 Jan 2023, 20:02 

AsmGuru62 16 Jan 2023, 23:15
I was surprised to know that FST cannot store into 80bit memory.
Only FSTP can do it. So, imagine you have all 8 registers in FPU busy with values and you need to just dump ST0 (all 80 bits) into memory. And you cannot do it. 

16 Jan 2023, 23:15 

revolution 16 Jan 2023, 23:22
Use FSAVE if you want to examine FPU registers.


16 Jan 2023, 23:22 

Furs 17 Jan 2023, 14:08
AsmGuru62 wrote: I was surprised to know that FST cannot store into 80bit memory. If you just use unary operators that replace the top of the stack and push back new value, then why do you need to fill the other 7 registers? What are they filled for? I guess that's what Intel thought when designing the instruction encodings. Can't blame them. 

17 Jan 2023, 14:08 

macomics 17 Jan 2023, 15:14
Furs wrote: If you just use unary operators that replace the top of the stack and push back new value, then why do you need to fill the other 7 registers? What are they filled for? In order to reduce the number of loads and stores of values, you can use more than one top register. Old example Code: ; Cx = Az * By  Ay * Bz ; Cy = Ax * Bz  Az * Bx ; Cz = Ay * Bx  Ax * By if used fpuVectorMultVA fpuVectorMultVA: ; Ax = st0, Ay = st1, Az = st2, Bx = st3, By = st4, Bz = st5 fld st1 fmul st0, st4 fld st1 fmul st0, st6 fsubp st1, st0 fxch st3 fld st0 fmul st0, st6 fxch st2 fmul st0, st7 fxch st3 fmul st0, st7 fsubp st2, st0 fmul st0, st4 fsubp st2, st0 ; Cx = st0, Cy = st1, Cz = st2, Bx = st3, By = st4, Bz = st5 retn end if; used fpuVectorMultVA Since the loading algorithm is implemented separately, you do not need to write several identical algorithms for integers or for single or double precision real numbers. Also, the data may be in the coprocessor for longer than a single operation. For example, immediately after multiplication, the normalization function can be called for the first three numbers loaded in the coprocessor. 

17 Jan 2023, 15:14 

AsmGuru62 17 Jan 2023, 18:40
I was doing some formulas for Gravity Corrections and they are huge!
So, I was calculating some intermediate values and keeping it in FPU until needed. I could have saved it, but I needed all 80bits of precision. An example of routine in FORTRAN I had to port: Code: subroutine innerzone(m,n,u,v,z,f) implicit double precision (ah,oz) real m,n h1=m+sqrt(1+m**2) g1=n+sqrt(1+n**2) p=1/(1+v**2) g2=n+u*v*p h2=m+u*v*p t=1/sqrt(1+v**2) g3=sqrt((n+u*v*p)**2+(1+u**2)*p(u*v*p)**2) h3=sqrt((m+u*v*p)**2+(1+u**2)*p(u*v*p)**2) f=z*(log(g1/h1)t*log((g2+g3)/(h2+h3))) return end 

17 Jan 2023, 18:40 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992023, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.