flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > [solved] FDIV is not assembling if 1st operand is not ST0

Author
Thread Post new topic Reply to topic
AsmGuru62



Joined: 28 Jan 2004
Posts: 1671
Location: Toronto, Canada
AsmGuru62 02 May 2020, 15:51
Code:
    fdiv    st0, st2
    fdiv    st1, st3       <-- cannot assemble this one.
    

The Intel Manual says:
DC F8+i ---> FDIV ST(i), ST(0)
Post 02 May 2020, 15:51
View user's profile Send private message Send e-mail Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 02 May 2020, 16:22
What the excerpt you quoted shows, one of the operands needs to be ST(0). If the first one is not, the second one must be:
Code:
    fdiv    st0, st2
    fdiv    st1, st0    
Post 02 May 2020, 16:22
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1671
Location: Toronto, Canada
AsmGuru62 02 May 2020, 17:36
Wow! I did not see this.
Thank you!
Post 02 May 2020, 17:36
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 02 May 2020, 22:47
All of the legacy FP instructions can only encode a single register.

If you wanted to have Fop st(i),st(j) then you need a way to encode two registers, i & j.

The newer MMX instructions and up can encode a second register.
Post 02 May 2020, 22:47
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1671
Location: Toronto, Canada
AsmGuru62 02 May 2020, 23:47
You mean SSE2 instruction set for packed doubles and floats. I wanted to use those, but they do not have ATAN2 or LOG operations. I suppose I can use a mixed set.
Post 02 May 2020, 23:47
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 03 May 2020, 00:18
Yes. All the MMX, SSE, SSE2, ..., AVX512 are not stack based and have at least two registers in the encoding, a source and a destination.

But there are no native trigonometric instructions outside of the legacy FP. If you are doing many trig ops then it might be more efficient overall to use an SSE+ based algorithm. It depends upon what you need them for.
Post 03 May 2020, 00:18
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 03 May 2020, 09:13
revolution wrote:
But there are no native trigonometric instructions outside of the legacy FP. If you are doing many trig ops then it might be more efficient overall to use an SSE+ based algorithm. It depends upon what you need them for.
Also, computing trigonometric functions with FPU is actually laid with traps.
Post 03 May 2020, 09:13
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1671
Location: Toronto, Canada
AsmGuru62 03 May 2020, 12:24
Interesting link, Tomasz. Thanks.
I have a client - they use C++ legacy app to scan a ~20Gb file with float point values and do some statistics and it is slow.
I'll just compare the results from my new FPU code and their legacy app and just see if they match.
So, where I can find some resources on how to make FPATAN equivalent using SSE+?
Post 03 May 2020, 12:24
View user's profile Send private message Send e-mail Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 04 May 2020, 03:58
Had a few old transcendental function implementations sitting around. Just posted them, they use double precision radian values, think accuracy is up to 4 places. They use Gregory's series and Maclaurin's series on the sine/cosine:

https://github.com/hpdporg/datap/blob/bfbe7544ba6e8aadbfa2b1d8b8cc021677458860/src/main/asm/Numeric/Transcendental.inc#L228

And tests/usage:

https://github.com/hpdporg/datap/blob/master/src/test/cpp/Numeric/Transcendental.cpp

The range of input values may be limited to just small values, but shows some of the SSE instructions and the values I've tested in that range seem correct when using a calculator.
Post 04 May 2020, 03:58
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1671
Location: Toronto, Canada
AsmGuru62 04 May 2020, 14:39
Thank you.
Post 04 May 2020, 14:39
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.