flat assembler
Message board for the users of flat assembler.

Index > Main > double vs single precision

Author
Thread Post new topic Reply to topic
fasmFUN



Joined: 25 May 2019
Posts: 15
fasmFUN
hey all. i'd like to hear your opinion
what better to use for performance , double or single precision number,
when writing a 64-bit application that does a lot of math calculation.
do new cpu's work better with double's ?
thanks in advanced.
Post 13 Jan 2022, 12:37
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18486
Location: In your JS exploiting you and your system
revolution
I suppose, in theory, if SP is enough for the application then it might be more efficient with regard to memory accesses, since it only uses 4 bytes per value.

But, it might make no difference also. It depends upon what the CPU is doing, how it does it, and when it does it. Plus each CPU/system has different performance anyway.

The only way to really know is to test the code in both configurations to see if there is a noticeable difference.
Post 13 Jan 2022, 13:31
View user's profile Send private message Visit poster's website Reply with quote
sts-q



Joined: 29 Nov 2018
Posts: 49
sts-q
I _*believe*_ mixing 32 and 64 bit values slows things down, when accessing RAM.

( I _*know*_ this needs a lot more testing... )

( on i3-550 )

Question

Agner wrote about that, but where Rolling Eyes
Post 13 Jan 2022, 14:21
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18486
Location: In your JS exploiting you and your system
revolution
sts-q wrote:
I _*believe*_ mixing 32 and 64 bit values slows things down, when accessing RAM.
Maybe.

There is no single answer that applies to all systems, or the code it runs.
Post 13 Jan 2022, 14:35
View user's profile Send private message Visit poster's website Reply with quote
Hrstka



Joined: 05 May 2008
Posts: 33
Location: Czech republic
Hrstka
If you are using x87 instructions, the numbers are always converted by the CPU to "long double" (80-bit) floating point format. So I think it makes no difference. When using sse/avx instructions, you can do more calculations in parallel with single precision.
Post 13 Jan 2022, 15:30
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18486
Location: In your JS exploiting you and your system
revolution
Hrstka wrote:
If you are using x87 instructions, the numbers are always converted by the CPU to "long double" (80-bit) floating point format.
The values are stored in long format, but there is a control bit that sets the working precision for internal computations.

For modern CPUs the latency is probably only different for a few of the instructions, like fdiv or fsqrt. But check your particular system to see what effect it actually has.
Post 13 Jan 2022, 16:08
View user's profile Send private message Visit poster's website Reply with quote
donn



Joined: 05 Mar 2010
Posts: 227
donn
In terms of raw calculations, I think Single Precision SSE is considered twice as fast.

If you look at GFLOPS metrics on GPUs, Single Precision peak performance is higher than Double Precision from what I've seen and I suspected it's the same on CPUs:

So, if you look at instruction latencies, such as on AMD's Developer Guides Optimization guides, a floating point instruction has:

Code:
Instruction     Op1     Op2     Op3     Op4     "APM Vol"       Cpuid flag      Ops     Unit    Latency Throughput
VMULPD          ymm1    ymm2    ymm3            4                      AVX      1       FP0/1   3       2
VMULPS          xmm1    xmm2    xmm3            4                      AVX      1       FP0/1   3       2
    


So they execute at the same speed, but you get twice as many calculations in there per computation.

An Intel Employee seems to confirm this:

https://community.intel.com/t5/Intel-ISA-Extensions/GFLOPS-numbers-advertised-by-Intel/m-p/905807
Quote:

...But if you are interested in actual peak theoretical (and in fact achievable, unlike many of those cited by our GPU producing friends) numbers you can take a look at this my older post http://software.intel.com/en-us/forums/showpost.php?p=60696 , to give you a quick answer, for couple recent generations of Intel CPUs it is 8 SIMD SP FP operations/cycle (4 SIMD SP ADD + 4 SIMD SP MUL) and 4 SIMD DP FP ops/cycle (2 SIMD DP ADD + 2 SIMD DP MUL), and both will...


So, this is just considering the speed of a computation and if you take advantage of it. Sometimes passing in twice as many floating point values is not possible or what the calculation calls for, so this performance gain may not be achieved. Single precision numbers also have a smaller memory footprint, which is a plus.

Double precision is handy when a larger range is required in a number, single precision provides less in a single number, sort of like high quality HD images versus low quality. They're encoded as scientific notation, so it's not necessarily just bigger or smaller numbers are possible, but the range within it, the fidelity.

When dealing with performance, there are definitely unknowns, nothing is clear cut, but there are risks and opportunities you can deal with and take advantage of and then test.
Post 20 Jan 2022, 05:44
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18486
Location: In your JS exploiting you and your system
revolution
In a purely theoretical scenario then SP >= DP, and would never show lower performance.

But CPUs don't follow theoretical scenarios, they are real world practical devices, with many independent moving parts, each doing their thing in their own time, and competing with each other for limited resources. And there can even be unintuitive pathological cases where SP < DP.

However, I think in general, it would be quite a safe assumption to say the SP >= DP for all but the weirdest cases.

As to whether SP > DP though, that requires testing to confirm. There are many cases where SP = DP. Your particular application might fall into that category. It all depends upon what you are doing.
Post 20 Jan 2022, 06:13
View user's profile Send private message Visit poster's website Reply with quote
donn



Joined: 05 Mar 2010
Posts: 227
donn
Sure yep, theories. There are probably theories behind scenarios that yield unexpected results too and probabilities associated with their causes.

So it definitely helps to learn context around the theories so they can be used or not. In this example, SP can be faster than DP, but if you don't use more SP values in the equivalent instruction, you may not gain any benefit. Maybe you need more SP values to model your data so you run more computations, perform more memory moves without proper batching, and you can get slower results.

Agreed also, real world testing does not necessarily line up with theory due to so many moving parts, like background processes running that are totally unrelated.
Post 20 Jan 2022, 06:59
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18486
Location: In your JS exploiting you and your system
revolution
Perhaps to answer the OPs questions (all IMO):

1. If SP precision is sufficient for your use then use SP.
2. If SP isn't enough bits, then you have no choice, use DP.

Using that, then you probably will be just fine with regard to performance. Almost never slower, sometimes equal, and sometimes a gain.
Post 20 Jan 2022, 07:19
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.