Discussion:
Precision in C++17 for Double Types
Miguel TOLEDO GONZALEZ
2018-11-06 20:13:49 UTC
Hi, i'm not sure what precision corresponds to double float types. I have
read somewhere that a double type is stored in memory with 8 Bytes, what is
then,
the corresponding arithmetical precision for these types?
I mean precision as maximal digit's accuracy. For example:
2.1 = 2+0.1 = 2 + 1e-1 has 1 decimal digit of accuracy, and so on:
2.123456789 = 2 + 1e-1 + 2e-2 + 3e-3 + 4e-4 + 5e-5 + ... + 9e-9
with 9 digits of accuracy, and so for any finite series, meaning e-x =
10EXP(-x).

* My question is, how in standard ISO C++17 that digit's accuracy is
defined.
What is then the maximal digits accuracy for C++17 ?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Staffan Tjernstrom
2018-11-06 20:20:58 UTC
On Tuesday, November 6, 2018 at 3:14:02 PM UTC-5, Miguel TOLEDO GONZALEZ
Post by Miguel TOLEDO GONZALEZ
Hi, i'm not sure what precision corresponds to double float types. I have
read somewhere that a double type is stored in memory with 8 Bytes, what is
then,
the corresponding arithmetical precision for these types?
2.123456789 = 2 + 1e-1 + 2e-2 + 3e-3 + 4e-4 + 5e-5 + ... + 9e-9
with 9 digits of accuracy, and so for any finite series, meaning e-x =
10EXP(-x).
* My question is, how in standard ISO C++17 that digit's accuracy is
defined.
What is then the maximal digits accuracy for C++17 ?
https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon is probably
what you're looking for.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Hyman Rosen
2018-11-06 21:20:13 UTC
Post by Miguel TOLEDO GONZALEZ
2.1 = 2+0.1 = 2 + 1e-1 has 1 decimal digit of accuracy, and so on
It's not defined by C++, but most systems these days use the IEEE 754
Standard for their floating-point arithmetic.

Your question isn't clear. Binary floating-point, being (duh) binary, can
only represent sums of (positive or negative) powers of 2 exactly. Values
which are not such sums are converted to the nearest value available in the
binary format.

When it comes to interconversion between decimal and binary floating-point,
there are two measures of significance:
1) Given the universe of all decimal values with no more than D significant
digits, what is the largest value of D such that no two decimal numbers
convert to the same binary value?

For float, this value is 6, and for double, this value is 15. (This is
std::numeric_limits<T>::digits10.) This means that there exist two
different decimal values with D+1 significant digits that are each closest
to the same binary value. If you convert these D+1 decimal numbers to
binary, you cannot recover the original decimal values from the binary.

2) When converting a binary floating-point number to decimal, what is the
minimum number of significant digits D that must be used so that no two
binary values convert to the same decimal value?

For float this value is 9 and for double this value is 17. (This is
std::numeric_limits<T>::max_digits10.) This means that there are two
different binary values which are closest to the same decimal value with
D-1 significant digits. If you convert these binary values to D-1 decimal
numbers, you cannot recover the original binary values from the decimal.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Hyman Rosen
2018-11-06 21:36:21 UTC
Post by Hyman Rosen
When it comes to interconversion between decimal and binary
1) Given the universe of all decimal values with no more than D
significant digits, what is the largest value of D such that no two decimal
numbers convert to the same binary value?
For float, this value is 6, and for double, this value is 15.
Some people find this 6 to be especially distasteful. Those people can use
7 provided that the decimal values are in the range [ .0009999995 ..
8589972000 ]. That range covers most "business values," so those people
become a little happier.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Hyman Rosen
2018-11-06 22:11:28 UTC
Post by Hyman Rosen
2) When converting a binary floating-point number to decimal, what is the
minimum number of significant digits D that must be used so that no two
binary values convert to the same decimal value?
For float this value is 9 and for double this value is 17.
Note that for some binary floating-point values, conversion using fewer
digits still gives a decimal value that converts back to the original
value. (For example, the exact float value
.100000001490116119384765625 converted
using 9 significant digits is .100000001, but .1 converts back to that same
binary value.) For every binary floating-point value, there is a shortest
decimal representation that converts back to the original. From C++17
onward, this shortest representation is obtainable via std::to_chars.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Edward Catmur
2018-11-07 19:24:31 UTC
Â  For every binary floating-point value, there is a shortest decimal representation that converts back to the original.Â
It's worth noting that there may be more than one such representation, and two of them may be equally close to the original, if the original is exactly halfway between two such shortest representations.
From C++17 onward, this shortest representation is obtainable via std::to_chars.
In addition, if there is a sole closest representation this is required to be that representation; otherwise it must be one of the two closest representations but it is not specified which of those two it is.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Miguel TOLEDO GONZALEZ
2018-11-07 20:54:16 UTC
Thanks all for the detailed and precise answer. Yeah, my question was not
very clear, by you got the essential of it. I'm working with different
compilers in C++, evidently I use the standard C++ header for numeric
limits. But I realized some "small" differences in ranges and/or digit's
precision when compiling and running the short tests programs with
different compilers. I must say that I didn't know about IEEE 754, I just
do relative "primitive" short tests for arithmetical calculations. That is
important to me, because I need high precision calculations at real-time,
better said, at run-time, and "small" deviations at some time, imply higher
and higher order of deviation after some time -> mathematical time series.
Thank you for the clarification and help.
Post by Hyman Rosen
For every binary floating-point value, there is a shortest decimal
representation that converts back to the original.
It's worth noting that there may be more than one such representation, and
two of them may be equally close to the original, if the original is
exactly halfway between two such shortest representations.
From C++17 onward, this shortest representation is obtainable via
std::to_chars.
In addition, if there is a sole closest representation this is required to
be that representation; otherwise it must be one of the two closest
representations but it is not specified which of those two it is.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Hyman Rosen
2018-11-07 21:50:50 UTC
Post by Miguel TOLEDO GONZALEZ
Thanks all for the detailed and precise answer. Yeah, my question was not
very clear, by you got the essential of it. I'm working with different
compilers in C++, evidently I use the standard C++ header for numeric
limits. But I realized some "small" differences in ranges and/or digit's
precision when compiling and running the short tests programs with
different compilers. I must say that I didn't know about IEEE 754, I just
do relative "primitive" short tests for arithmetical calculations. That is
important to me, because I need high precision calculations at real-time,
better said, at run-time, and "small" deviations at some time, imply higher
and higher order of deviation after some time -> mathematical time series.
Thank you for the clarification and help.
If you're looking for any sort of exactness, dealing with floating-point is
very hard,
and you really have to know what you and your compiler are doing. Compilers
sometimes take shortcuts when it comes to following the standard (gcc with
x87
floating-point is particularly egregious) because they have been hijacked by