flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
windwakr 08 Apr 2010, 22:32
Best FPU tutorial/documentation is here:
http://www.website.masmforum.com/tutorials/fptute/ It gives good explanations for everything(registers, instructions, the control/status words), and it provides usage examples for all of the instructions(which is nice for some of the more confusing ones). fist is 16 or 32 bit only. Must use fistp for 64 bit value. fist wrote:
To store 64bit value and save the value in the FPU, you will need to copy it. But this means you will need to use an extra stack space in the FPU. Code: fld st0 fistp [memoryloc] Last edited by windwakr on 08 Apr 2010, 22:55; edited 3 times in total |
|||
![]() |
|
edemko 08 Apr 2010, 22:46
Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture.
http://www.intel.com/Assets/PDF/manual/253665.pdf "Chapter 8 Programming with the FPU x87" is yours. The one over approaches fpu instructions too, which can be found in: Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M http://www.intel.com/Assets/PDF/manual/253666.pdf Rest of instructions: ntel® 64 and IA-32 Architectures Software Developer's Manual Volume 2B: Instruction Set Reference, N-Z http://www.intel.com/Assets/PDF/manual/253667.pdf |
|||
![]() |
|
edfed 09 Apr 2010, 11:36
one question about FPU.. what about performance?
how many fmul/s for example? and mainly, what happens to a continuous instruction flow with FPU? does interleaving with GP instructions make FPU instructions transparent in a timing point of view? the same Q with SSE, MMX and other extra features. |
|||
![]() |
|
revolution 09 Apr 2010, 11:46
edfed: Every CPU generation is different. Check out the docs for your CPU to see what it does and how it pipelines the FPU instructions.
|
|||
![]() |
|
shutdownall 09 Apr 2010, 19:34
edfed wrote: one question about FPU.. what about performance? Performance depends on what you need. I process only integer values but 64 bit values. Addition and subtraction is not the problem but division has to be much programming stuff and I don't think its faster via register code than using FPU. Why not use build-in technique ? And I want the code to be run on all pentium processors and do not expect 64 bit instructions/register. ![]() @rest thanks for information, I will check out. I have the INTEL manuals on hard disc but I think they do not give too much information on the basics of FPU. But I will check the chapters again. |
|||
![]() |
|
shutdownall 09 Apr 2010, 19:40
windwakr wrote:
Yes you are right but I dont find it logically, that instruction fist can only process 32 bit values and fistp can either 32 or 64 bit values. These are quite different commands because one pops the value from stack, the other not. I thought about errors in documentation or in programming INTEL chip. By the way I found that INTEL can do updates to processor firmware. Thats funny. ![]() Maybe I can get INTEL to provide a microcode update for this feature. ![]() |
|||
![]() |
|
baldr 09 Apr 2010, 21:05
shutdownall,
Think opcode, not mnemonic. |
|||
![]() |
|
shutdownall 09 Apr 2010, 22:42
@baldr
?! Can anyone review following code ? Working as expected ? Anything to optimize ? Its about time measurement using rdtsc command for small timers (about 100 ns or below). This section just updates calculations for measured cpu cycles per second and microsecond. Code: ; data section timetickps dq 0 ; cycles per second timetickpu dd 0 ; cycles per microsecond timetickph dq 0 ; average of cycles during one timer tick timetickphn dq -1 ; min value of cycles (last measured period) timetickphx dq 0 ; max value of cycles (last measured period) timeintsz dt 0.0549255 ; correction value due to exact timer frequency ; code section fninit ; do I really need this ? fild [timetickphn] fild [timetickphx] faddp st1,st ; for cleanup stack after operation fidiv word ptr 2 ; takes average of a min and max value fst st1 ; make a copy because of following not needed pop grrrrr I think there is still one value on stack of FPU after this code. Whats the best way to clean ? Thanx for comments. I am not so familar with FPU instructions yet. ![]() |
|||
![]() |
|
bitshifter 09 Apr 2010, 23:14
ERROR: The stack is empty during last two instructions!
Code: fdiv dword ptr 1000000 fistp [timetickpu] I can help, but to make my life easier, you must pm/post a working formula... _________________ Coding a 3D game engine with fasm is like trying to eat an elephant, you just have to keep focused and take it one 'byte' at a time. |
|||
![]() |
|
LocoDelAssembly 09 Apr 2010, 23:24
I'm absolutely surprised that fasm supports "dword ptr". Anyway, take in mind it is equivalent to "fdiv dword [1000000]" which clearly is not what you meant, you need to store 1000000 in some variable and then use "fdiv [some_dword_var]".
|
|||
![]() |
|
edemko 10 Apr 2010, 03:52
shutdownall you'd really have to give a formula.
fidiv word ptr 2 <- you divide not by 2 but by the value [ds:2] finit is useful at a prologue to insure all is fine |
|||
![]() |
|
baldr 10 Apr 2010, 14:55
shutdownall wrote: ?! In the context of shown code, there is no need to save calculation results immediately after the calculation. If you need the result for the following calculation, leave it on stack for a while (thus providing processor with some time to do actual calculations). The calculations itself are confusing, at least. 0.0549255 [s] is an approximation of 1/(14.31818 [MHz]/12*65536), IRQ0 period? Then why you divide (not multiply) it by mean tick count? fdiv st, st1 does st = st/st1, AFAIK. Such division will yield result <1, and fistp will happily store 0 at its destination. |
|||
![]() |
|
shutdownall 10 Apr 2010, 23:05
bitshifter wrote: ERROR: The stack is empty during last two instructions! Yes I think this is not correct. In fact I wanted to use an immediate in fdiv instruction. Here is what I want to do: Code: timetickps dq 0 timetickpu dd 0 timetickph dq 0 timetickphn dq -1 timetickphx dq 0 timeintsz dt 0.0549255 First I need an average of timetickphn (min value) and timetickphx (max value) which should be build in formular timetickph=(timetickphn+timetickphx)/2 timetickph contains the number of cpu cycles or micro-ops during bios system timer period. I have a hook on interrupt 1ch (tsr or timerinterrupt) from bios and via rdtsc the value of micro-ops since last interrupt will be stored. Because interrupt can be masked by applications or system code there will be a difference. So I want to take an average. This interrupt is created via the PC system timer with frequency 1.193180 MHz divided by 10000h which result 0.0549255 seconds (mainly assumed as 55 ms or 18.2 times per second). timetickps=timetickph/0.0549255 which is same as timetickps=timetickph*18.20648 to determine the micro-ops for one second This should be in most cases the frequency run on the CPU = for 2.0 GHz rate should be value 2,000,000,000. But speedstep technology could reduce this value. That's why value is updated every tsr interrupt and new calculated. timetickpu=timetickps/1.000.000 will give the micro-ops per microsecond. So has to be calculated: timetickph=(timetickphn+timetickphx)/2 timetickps=timetickph/0.0549255 timetickpu=timetickps/1.000.000 @bitshifter Why is stack empty during last two instructions ? @baldr Why should the result fdiv st,st1 below 1 ??? I have here a 1.4 GHz CPU and st is about 04950000h and divide by 0.0549255 is the same as multiplying with 18.2.... and will surely not result in zero. Or I did miss something in my math lessons. ![]() Last edited by shutdownall on 10 Apr 2010, 23:18; edited 1 time in total |
|||
![]() |
|
shutdownall 10 Apr 2010, 23:11
baldr wrote:
I know that I do not need to store temporarily calculation results but I want to store these two variables. By the way - the initial values of timetickphn and timetickphx are overwritten with runtime data. Maybe that's why you may think result will be below 1. ![]() |
|||
![]() |
|
baldr 11 Apr 2010, 00:53
shutdownall,
Look here: Code: fld [timeintsz] fdiv st,st1 ; calculation of cycles per second Intel 64 and IA-32 SDM vol. 2A: Instruction Set Reference A-M wrote: FDIV/FDIVP/FIDIV—Divide I have erroneously supposed that timetick* variables contain number of timer ticks (int 1C, e.g.). If they're TSC values (integers), why do you need floating-point calculations? By the way, TSC has nothing to do with µops, you need PMC for that. |
|||
![]() |
|
bitshifter 11 Apr 2010, 15:46
Maybe something like this is what you need?
Code: timetickps dq 0 timetickpu dq 0 timetickph dq 0 timetickphn dq -1 timetickphx dq 0 cd_2 dd 2 cf_0_0549255 dd 0.0549255 cf_1000_0 dd 1000.0 ;timetickph=(timetickphn+timetickphx)/2 fild [cd_2] fild [timetickphn] fild [timetickphx] faddp fdivp fst [timetickph] ; Store as decimal? ;timetickps=timetickph/0.0549255 fdiv [cf_0_0549255] fst [timetickps] ; Store as decimal? ;timetickpu=timetickps/1.000.000 fdiv [cf_1000_0] fstp [timetickpu] ; Store as decimal? Not sure exactly what needs to be integer or decimal here so... As for balancing the stack, there must be a f(i)stp for each f(i)ld used. _________________ Coding a 3D game engine with fasm is like trying to eat an elephant, you just have to keep focused and take it one 'byte' at a time. |
|||
![]() |
|
shutdownall 11 Apr 2010, 21:50
@bitshifter
Thank you, I will try the code. I need to store as integer. @baldr You might be right if I have mistakes in my FPU codes. But this was not the goal. So as you can see in formula timetickph has to be divided by timeintsz. I do not need floating point calculations but I have to calculate with 64 bit integer (qwords) and have to divide 64 bit. So was my idea to use FPU for this (for what else ?) that this part of my machine wont being bored. ![]() Quote: By the way, TSC has nothing to do with µops, you need PMC for that. I dont need PMC for this. TSC (time stamp counter) is incremented internally periodicly. In earlier CPU versions this was identical to cpu cycles or micro-ops. (Refer to INTEL docs) Quote: The processor monotonically increments the time-stamp counter MSR every clock It seems that INTEL has changed behaviour and there can be a difference between cpu cycles but it is guaranteed that this can be used as a wall clock which periodicly is counted up. I think this was the idea to protect the timer from speed step technology. In fact I did not go very deeply in the docs. Just found a hint in Microsoft article. Anyway for the purpose (realizing a very fast timer) this solution is suitable I think. If now there is no dependence on speed step technology anymore I could stop updating this timer every interrupt. But on the other hand it does not disturb. ![]() |
|||
![]() |
|
Picnic 13 Apr 2010, 23:57
windwakr wrote: Best FPU tutorial/documentation is here: CHM format is available also. I found this on asmcommunity pages.
|
|||||||||||
![]() |
|
rugxulo 14 Apr 2010, 17:19
Everything you always wanted to know about math coprocessors (copro16a.txt, Norbert Juffa)
Pascal Floating-Point (also has some generic info) |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.