flat assembler
Message board for the users of flat assembler.

Index > Main > FPU documentation / manuals

Author
Thread Post new topic Reply to topic
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
Hi there,

does anybody know some good documentations for using floating point commands in assembler ?
I did not find much useful documents with google, a few sample programs but not really a manual explaining using commands and registers (st0 to st7). I think the way programming is not as easy as described.

flat assembler does not allow fist [qword] - only fist [dword] or fistp [qword].
Last command I don't want because I don't want to pop this value from fpu stack.
Shocked
Post 08 Apr 2010, 22:25
View user's profile Send private message Send e-mail Reply with quote
windwakr



Joined: 30 Jun 2004
Posts: 827
Location: Michigan, USA
windwakr
Best FPU tutorial/documentation is here:
http://www.website.masmforum.com/tutorials/fptute/

It gives good explanations for everything(registers, instructions, the control/status words), and it provides usage examples for all of the instructions(which is nice for some of the more confusing ones).



fist is 16 or 32 bit only. Must use fistp for 64 bit value.
fist wrote:

This instruction rounds the value of the TOP data register ST(0) to the nearest integer according to the rounding mode of the RC field in the Control Word, and stores it at the specified destination (Dest). The destination can be the memory address of a 16-bit WORD or of a 32-bit DWORD.

This instruction cannot store the value of ST(0) as a 64-bit QWORD integer; see the FISTP instruction for such an operation.


To store 64bit value and save the value in the FPU, you will need to copy it. But this means you will need to use an extra stack space in the FPU.
Code:
fld st0
fistp [memoryloc]
    

_________________
----> * <---- My star, won HERE


Last edited by windwakr on 08 Apr 2010, 22:55; edited 3 times in total
Post 08 Apr 2010, 22:32
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko
Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture.
http://www.intel.com/Assets/PDF/manual/253665.pdf
"Chapter 8 Programming with the FPU x87" is yours.

The one over approaches fpu instructions too, which can be found in:
Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M
http://www.intel.com/Assets/PDF/manual/253666.pdf

Rest of instructions:
ntel® 64 and IA-32 Architectures Software Developer's Manual Volume 2B: Instruction Set Reference, N-Z
http://www.intel.com/Assets/PDF/manual/253667.pdf
Post 08 Apr 2010, 22:46
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4242
Location: 2018
edfed
one question about FPU.. what about performance?
how many fmul/s for example?
and mainly, what happens to a continuous instruction flow with FPU?

does interleaving with GP instructions make FPU instructions transparent in a timing point of view?
the same Q with SSE, MMX and other extra features.
Post 09 Apr 2010, 11:36
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17664
Location: In your JS exploiting you and your system
revolution
edfed: Every CPU generation is different. Check out the docs for your CPU to see what it does and how it pipelines the FPU instructions.
Post 09 Apr 2010, 11:46
View user's profile Send private message Visit poster's website Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
edfed wrote:
one question about FPU.. what about performance?
how many fmul/s for example?


Performance depends on what you need.
I process only integer values but 64 bit values.
Addition and subtraction is not the problem but division has to be much programming stuff and I don't think its faster via register code than using FPU.

Why not use build-in technique ?
And I want the code to be run on all pentium processors and do not expect 64 bit instructions/register.
Wink

@rest
thanks for information, I will check out.
I have the INTEL manuals on hard disc but I think they do not give too much information on the basics of FPU. But I will check the chapters again.
Post 09 Apr 2010, 19:34
View user's profile Send private message Send e-mail Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
windwakr wrote:

fist wrote:

This instruction rounds the value of the TOP data register ST(0) to the nearest integer according to the rounding mode of the RC field in the Control Word, and stores it at the specified destination (Dest). The destination can be the memory address of a 16-bit WORD or of a 32-bit DWORD.

This instruction cannot store the value of ST(0) as a 64-bit QWORD integer; see the FISTP instruction for such an operation.


To store 64bit value and save the value in the FPU, you will need to copy it. But this means you will need to use an extra stack space in the FPU.
Code:
fld st0
fistp [memoryloc]
    


Yes you are right but I dont find it logically, that instruction fist can only process 32 bit values and fistp can either 32 or 64 bit values. These are quite different commands because one pops the value from stack, the other not. I thought about errors in documentation or in programming INTEL chip. By the way I found that INTEL can do updates to processor firmware. Thats funny.
Laughing

Maybe I can get INTEL to provide a microcode update for this feature.
Razz
Post 09 Apr 2010, 19:40
View user's profile Send private message Send e-mail Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
shutdownall,

Think opcode, not mnemonic.
Post 09 Apr 2010, 21:05
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
@baldr

?!


Can anyone review following code ?
Working as expected ?
Anything to optimize ?

Its about time measurement using rdtsc command for small timers (about 100 ns or below). This section just updates calculations for measured cpu cycles per second and microsecond.

Code:
; data section
        timetickps dq 0 ; cycles per second
        timetickpu dd 0 ; cycles per microsecond
        timetickph dq 0 ; average of cycles during one timer tick
        timetickphn dq -1 ; min value of cycles (last measured period)
        timetickphx dq 0 ; max value of cycles (last measured period)
        timeintsz dt 0.0549255 ; correction value due to exact timer frequency

; code section
        fninit ; do I really need this ?
        fild [timetickphn]
        fild [timetickphx]
        faddp st1,st ; for cleanup stack after operation
        fidiv word ptr 2 ; takes average of a min and max value
        fst st1 ; make a copy because of following not needed pop grrrrr Sad
        fistp [timetickph]
        fld [timeintsz]
        fdiv st,st1 ; calculation of cycles per second
        fst st1 ; groundhog day Wink
        fistp [timetickps]
        fdiv dword ptr 1000000 ; calculation of cycles per microsecond
        fistp [timetickpu] ; not needed - used to clean stack
    


I think there is still one value on stack of FPU after this code.
Whats the best way to clean ?

Thanx for comments.
I am not so familar with FPU instructions yet.
Wink
Post 09 Apr 2010, 22:42
View user's profile Send private message Send e-mail Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
ERROR: The stack is empty during last two instructions!
Code:
fdiv dword ptr 1000000
fistp [timetickpu]    

I can help, but to make my life easier, you must pm/post a working formula...

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
Post 09 Apr 2010, 23:14
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I'm absolutely surprised that fasm supports "dword ptr". Anyway, take in mind it is equivalent to "fdiv dword [1000000]" which clearly is not what you meant, you need to store 1000000 in some variable and then use "fdiv [some_dword_var]".
Post 09 Apr 2010, 23:24
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko
shutdownall you'd really have to give a formula.
fidiv word ptr 2 <- you divide not by 2 but by the value [ds:2]
finit is useful at a prologue to insure all is fine
Post 10 Apr 2010, 03:52
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
shutdownall wrote:
?!
I mean, which opcode would you allocate for fist m64int? DF /6 is already taken by fbstp. Why you don't argue about absence of fiadd m64int, for example?

In the context of shown code, there is no need to save calculation results immediately after the calculation. If you need the result for the following calculation, leave it on stack for a while (thus providing processor with some time to do actual calculations).

The calculations itself are confusing, at least. 0.0549255 [s] is an approximation of 1/(14.31818 [MHz]/12*65536), IRQ0 period? Then why you divide (not multiply) it by mean tick count? fdiv st, st1 does st = st/st1, AFAIK. Such division will yield result <1, and fistp will happily store 0 at its destination.
Post 10 Apr 2010, 14:55
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
bitshifter wrote:
ERROR: The stack is empty during last two instructions!
Code:
fdiv dword ptr 1000000
fistp [timetickpu]    

I can help, but to make my life easier, you must pm/post a working formula...


Yes I think this is not correct.
In fact I wanted to use an immediate in fdiv instruction.

Here is what I want to do:

Code:
        timetickps dq 0
        timetickpu dd 0
        timetickph dq 0
        timetickphn dq -1
        timetickphx dq 0
        timeintsz dt 0.0549255
    


First I need an average of timetickphn (min value) and timetickphx (max value) which should be build in formular
timetickph=(timetickphn+timetickphx)/2

timetickph contains the number of cpu cycles or micro-ops during bios system timer period. I have a hook on interrupt 1ch (tsr or timerinterrupt) from bios and via rdtsc the value of micro-ops since last interrupt will be stored. Because interrupt can be masked by applications or system code there will be a difference. So I want to take an average.

This interrupt is created via the PC system timer with frequency 1.193180 MHz divided by 10000h which result 0.0549255 seconds (mainly assumed as 55 ms or 18.2 times per second).

timetickps=timetickph/0.0549255
which is same as
timetickps=timetickph*18.20648
to determine the micro-ops for one second

This should be in most cases the frequency run on the CPU = for 2.0 GHz rate should be value 2,000,000,000. But speedstep technology could reduce this value. That's why value is updated every tsr interrupt and new calculated.

timetickpu=timetickps/1.000.000
will give the micro-ops per microsecond.

So has to be calculated:
timetickph=(timetickphn+timetickphx)/2
timetickps=timetickph/0.0549255
timetickpu=timetickps/1.000.000

@bitshifter
Why is stack empty during last two instructions ?

@baldr
Why should the result fdiv st,st1 below 1 ???
I have here a 1.4 GHz CPU and st is about 04950000h and divide by 0.0549255 is the same as multiplying with 18.2.... and will surely not result in zero. Or I did miss something in my math lessons.
Shocked


Last edited by shutdownall on 10 Apr 2010, 23:18; edited 1 time in total
Post 10 Apr 2010, 23:05
View user's profile Send private message Send e-mail Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
baldr wrote:
shutdownall wrote:
?!
I mean, which opcode would you allocate for fist m64int? DF /6 is already taken by fbstp. Why you don't argue about absence of fiadd m64int, for example?

In the context of shown code, there is no need to save calculation results immediately after the calculation. If you need the result for the following calculation, leave it on stack for a while (thus providing processor with some time to do actual calculations).

The calculations itself are confusing, at least. 0.0549255 [s] is an approximation of 1/(14.31818 [MHz]/12*65536), IRQ0 period? Then why you divide (not multiply) it by mean tick count? fdiv st, st1 does st = st/st1, AFAIK. Such division will yield result <1, and fistp will happily store 0 at its destination.


I know that I do not need to store temporarily calculation results but I want to store these two variables.
By the way - the initial values of timetickphn and timetickphx are overwritten with runtime data.
Maybe that's why you may think result will be below 1.
Wink
Post 10 Apr 2010, 23:11
View user's profile Send private message Send e-mail Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
shutdownall,

Look here:
Code:
        fld [timeintsz]
        fdiv st,st1 ; calculation of cycles per second    
Then look here:
Intel 64 and IA-32 SDM vol. 2A: Instruction Set Reference A-M wrote:
FDIV/FDIVP/FIDIV—Divide

D8 F0+i    FDIV    ST(0), ST(i)     Divide ST(0) by ST(i) and store result in ST(0).
So fdiv in your code divides timeintsz (i.e. 0.0549255) by timetickph. Following fistp will round it toward nearest (because of fninit), thus if timetickph > 2*0.0549255 you will get 0 stored.

I have erroneously supposed that timetick* variables contain number of timer ticks (int 1C, e.g.). If they're TSC values (integers), why do you need floating-point calculations?

By the way, TSC has nothing to do with µops, you need PMC for that.
Post 11 Apr 2010, 00:53
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
Maybe something like this is what you need?
Code:
        timetickps dq 0
        timetickpu dq 0
        timetickph dq 0
        timetickphn dq -1
        timetickphx dq 0

        cd_2 dd 2
        cf_0_0549255 dd 0.0549255
        cf_1000_0 dd 1000.0

;timetickph=(timetickphn+timetickphx)/2

        fild    [cd_2]
        fild    [timetickphn]
        fild    [timetickphx]
        faddp
        fdivp
        fst     [timetickph]    ; Store as decimal?

;timetickps=timetickph/0.0549255

        fdiv    [cf_0_0549255]
        fst     [timetickps]    ; Store as decimal?

;timetickpu=timetickps/1.000.000

        fdiv    [cf_1000_0]
        fstp    [timetickpu]    ; Store as decimal?
    

Not sure exactly what needs to be integer or decimal here so...

As for balancing the stack, there must be a f(i)stp for each f(i)ld used.

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
Post 11 Apr 2010, 15:46
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 518
Location: Munich
shutdownall
@bitshifter

Thank you, I will try the code. I need to store as integer.

@baldr
You might be right if I have mistakes in my FPU codes. But this was not the goal. So as you can see in formula timetickph has to be divided by timeintsz.

I do not need floating point calculations but I have to calculate with 64 bit integer (qwords) and have to divide 64 bit. So was my idea to use FPU for this (for what else ?) that this part of my machine wont being bored.
Wink

Quote:
By the way, TSC has nothing to do with µops, you need PMC for that.


I dont need PMC for this. TSC (time stamp counter) is incremented internally periodicly. In earlier CPU versions this was identical to cpu cycles or micro-ops.

(Refer to INTEL docs)

Quote:
The processor monotonically increments the time-stamp counter MSR every clock
cycle and resets it to 0 whenever the processor is reset. See “Time Stamp Counter”
in Chapter 16 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual,
Volume 3B, for specific details of the time stamp counter behavior.


It seems that INTEL has changed behaviour and there can be a difference between cpu cycles but it is guaranteed that this can be used as a wall clock which periodicly is counted up. I think this was the idea to protect the timer from speed step technology. In fact I did not go very deeply in the docs. Just found a hint in Microsoft article.

Anyway for the purpose (realizing a very fast timer) this solution is suitable I think. If now there is no dependence on speed step technology anymore I could stop updating this timer every interrupt. But on the other hand it does not disturb.
Wink
Post 11 Apr 2010, 21:50
View user's profile Send private message Send e-mail Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1288
Location: Paradise Falls
Picnic
windwakr wrote:
Best FPU tutorial/documentation is here:
http://www.website.masmforum.com/tutorials/fptute/


CHM format is available also. I found this on asmcommunity pages.


Description:
Download
Filename: SimplyFPU1-4.zip
Filesize: 320.86 KB
Downloaded: 173 Time(s)

Post 13 Apr 2010, 23:57
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Everything you always wanted to know about math coprocessors (copro16a.txt, Norbert Juffa)
Pascal Floating-Point (also has some generic info)
Post 14 Apr 2010, 17:19
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.