# Significant digits in a float?

Discussion in 'Python' started by Roy Smith, Apr 28, 2014.

1. ### Roy SmithGuest

I'm using Python 2.7

I have a bunch of floating point values. For example, here's a few (printed as reprs):

38.0
41.2586
40.75280000000001
49.25
33.795199999999994
36.837199999999996
34.1489
45.5

Fundamentally, these numbers have between 0 and 4 decimal digits of precision, and I want to be able to intuit how many each has, ignoring the obvious floating point roundoff problems. Thus, I want to map:

38.0 ==> 0
41.2586 ==> 4
40.75280000000001 ==> 4
49.25 ==> 2
33.795199999999994 ==> 4
36.837199999999996 ==> 4
34.1489 ==> 4
45.5 ==> 1

Is there any clean way to do that? The best I've come up with so far is to str() them and parse the remaining string to see how many digits it put after the decimal point.

The numbers are given to me as Python floats; I have no control over that. I'm willing to accept that fact that I won't be able to differentiate between float("38.0") and float("38.0000"). Both of those map to 1, which is OK for my purposes.

---
Roy Smith

Roy Smith, Apr 28, 2014

2. ### Steven D'ApranoGuest

On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote:

[...]
> Fundamentally, these numbers have between 0 and 4 decimal digits of
> precision,

I'm surprised that you have a source of data with variable precision,
especially one that varies by a factor of TEN THOUSAND. The difference
between 0 and 4 decimal digits is equivalent to measuring some lengths to
the nearest metre, some to the nearest centimetre, and some to the
nearest 0.1 of a millimetre. That's very unusual and I don't know what
justification you have for combining such a mix of data sources.

One possible interpretation of your post is that you have a source of
floats, where all the numbers are actually measured to the same
precision, and you've simply misinterpreted the fact that some of them
look like they have less precision. Since you indicate that 4 decimal
digits is the maximum, I'm going with 4 decimal digits. So if your data
includes the float 23.5, that's 23.5 measured to a precision of four
decimal places (that is, it's 23.5000, not 23.5001 or 23.4999).

On the other hand, if you're getting your values as *strings*, that's
another story. If you can trust the strings, they'll tell you how many
decimal places: "23.5" is only one decimal place, "23.5000" is four.

But then what to make of your later example?

> 40.75280000000001 ==> 4

Python floats (C doubles) are quite capable of distinguishing between
40.7528 and 40.75280000000001. They are distinct numbers:

py> 40.75280000000001 - 40.7528
7.105427357601002e-15

so if a number is recorded as 40.75280000000001 presumably it is because
it was measured as 40.75280000000001. (How that precision can be
justified, I don't know! Does it come from the Large Hadron Collider?) If
it were intended to be 40.7528, I expect it would have be recorded as
40.7528. What reason do you have to think that something recorded to 14
decimal places was only intended to have been recorded to 4?

much, but the whole scenario as you have described it makes me think that
*somebody* is doing something wrong. Perhaps you need to explain why
you're doing this, as it seems numerically broken.

> Is there any clean way to do that? The best I've come up with so far is
> to str() them and parse the remaining string to see how many digits it
> put after the decimal point.

I really think you need to go back to the source. Trying to infer the
precision of the measurements from the accident of the string formatting
seems pretty dubious to me.

But I suppose if you wanted to infer the number of digits after the
decimal place, excluding trailing zeroes (why, I do not understand), up
to a maximum of four digits, then you could do:

s = "%.4f" % number # rounds to four decimal places
s = s.rstrip("0") # ignore trailing zeroes, whether significant or not
count = len(s.split(".")[1])

Assuming all the numbers fit in the range where they are shown in non-
exponential format. If you have to handle numbers like 1.23e19 as well,
you'll have to parse the string more carefully. (Keep in mind that most
floats above a certain size are all integer-valued.)

> The numbers are given to me as Python floats; I have no control over
> that.

If that's the case, what makes you think that two floats from the same
data set were measured to different precision? Given that you don't see
strings, only floats, I would say that your problem is unsolvable.
Whether I measure something to one decimal place and get 23.5, or four
decimal places and get 23.5000, the float you see will be the same.

Perhaps you ought to be using Decimal rather than float. Floats have a
fixed precision, while Decimals can be configured. Then the right way to

py> from decimal import Decimal as D
py> x = D("23.5000")
py> x.as_tuple()
DecimalTuple(sign=0, digits=(2, 3, 5, 0, 0, 0), exponent=-4)

The number of decimal digits precision is -exponent.

> I'm willing to accept that fact that I won't be able to differentiate
> between float("38.0") and float("38.0000"). Both of those map to 1,
> which is OK for my purposes.

That seems... well, "bizarre and wrong" are the only words that come to
mind. If I were recording data as "38.0000" and you told me I had
measured it to only one decimal place accuracy, I wouldn't be too
pleased. Maybe if I understood the context better?

By the way, you contradict yourself here. Earlier, you described 38.0 as
having zero decimal places (which is wrong). Here you describe it as
having one, which is correct, and then in a later post you describe it as
having zero decimal places again.

--
Steven D'Aprano
http://import-that.dreamwidth.org/

Steven D'Aprano, Apr 29, 2014

3. ### Steven D'ApranoGuest

On Tue, 29 Apr 2014 13:23:07 +1000, Ben Finney wrote:

> Steven D'Aprano <> writes:
>
>> By the way, you contradict yourself here. Earlier, you described 38.0
>> as having zero decimal places (which is wrong). Here you describe it as
>> having one, which is correct, and then in a later post you describe it
>> as having zero decimal places again.

>
> I get the impression that this is at the core of the misunderstanding.
> Having a number's representation ending in â€œâ€¦.0â€ does not mean zero
> decimal places; it has exactly one. The value's representation contains
> the digit â€œ0â€ after the decimal point, but that digit is significant to
> the precision of the representation.
>
> If the problem could be stated such that â€œ38.0â€ and â€œ38â€ and â€œ38.000â€
> are consistently described with the correct number of decimal digits of
> precision (in those examples: one, zero, and three), maybe the
> discussion would make more sense.

It's actually trickier than that. Digits of precision can refer to
measurement error, or to the underlying storage type. Python floats are C
doubles, so they have 64 bits of precision (approximately 17 decimal
digits, if I remember correctly) regardless of the precision of the
measurement. The OP (Roy) is, I think, trying to guess the measurement
precision after the fact, given a float. If the measurement error really
does differ from value to value, I don't think he'll have much luck:
given a float like 23.0, all we can say is that it has *at least* zero
significant decimal places. 23.1 has at least one, 23.1111 has at least
four.

If you can put an upper bound on the precision, as Roy indicates he can,
then perhaps a reasonable approach is to convert to a string rounded to
four decimal places, then strip trailing zeroes:

py> x = 1234.1 # actual internal is closer to 1234.099999999999909
py> ("%.4f" % x).rstrip('0')
'1234.1'

then count the number of digits after the dot. (This assumes that the
string formatting routines are correctly rounded, which they should be on
*most* platforms.) But again, this only gives a lower bound to the number
of significant digits -- it's at least one, but might be more.

--
Steven

Steven D'Aprano, Apr 29, 2014
4. ### Roy SmithGuest

In article <535f0f9f\$0\$29965\$c3e8da3\$>,
Steven D'Aprano <> wrote:

> On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote:
>
> [...]
> > Fundamentally, these numbers have between 0 and 4 decimal digits of
> > precision,

>
> I'm surprised that you have a source of data with variable precision,
> especially one that varies by a factor of TEN THOUSAND.

OK, you're surprised.

> I don't know what justification you have for combining such a
> mix of data sources.

Because that's the data that was given to me. Real life data is messy.

> One possible interpretation of your post is that you have a source of
> floats, where all the numbers are actually measured to the same
> precision, and you've simply misinterpreted the fact that some of them
> look like they have less precision.

Another possibility is that they're latitude/longitude coordinates, some
of which are given to the whole degree, some of which are given to
greater precision, all the way down to the ten-thousandth of a degree.

> What reason do you have to think that something recorded to 14
> decimal places was only intended to have been recorded to 4?

Because I understand the physical measurement these numbers represent.
Sometimes, Steve, you have to assume that when somebody asks a question,

> Perhaps you need to explain why you're doing this, as it seems
> numerically broken.

These are latitude and longitude coordinates of locations. Some
locations are known to a specific street address. Some are known to a
city. Some are only known to the country. So, for example, the 38.0
value represents the latitude, to the nearest whole degree, of the
geographic center of the contiguous United States.

> I really think you need to go back to the source. Trying to infer the
> precision of the measurements from the accident of the string formatting
> seems pretty dubious to me.

Sure it is. But, like I said, real-life data is messy. You can wring
your hands and say, "this data sucks, I can't use it", or you can figure
out some way to deal with it. Which is the whole point of my post. The
best I've come up with is inferring something from the string formatting
and I'm hoping there might be something better I might do.

> But I suppose if you wanted to infer the number of digits after the
> decimal place, excluding trailing zeroes (why, I do not understand), up
> to a maximum of four digits, then you could do:
>
> s = "%.4f" % number # rounds to four decimal places
> s = s.rstrip("0") # ignore trailing zeroes, whether significant or not
> count = len(s.split(".")[1])

This at least seems a little more robust than just calling str(). Thank
you

> Assuming all the numbers fit in the range where they are shown in non-
> exponential format.

They're latitude/longitude, so they all fall into [-180, 180].

> Perhaps you ought to be using Decimal rather than float.

Like I said, "The numbers are given to me as Python floats; I have no
control over that".

> > I'm willing to accept that fact that I won't be able to differentiate
> > between float("38.0") and float("38.0000"). Both of those map to 1,
> > which is OK for my purposes.

>
> That seems... well, "bizarre and wrong" are the only words that come to
> mind.

I'm trying to intuit, from the values I've been given, which coordinates
are likely to be accurate to within a few miles. I'm willing to accept
a few false negatives. If the number is float("38"), I'm willing to
accept that it might actually be float("38.0000"), and I might be
throwing out a good data point that I don't need to.

For the purpose I'm using the data for, excluding the occasional good
data point won't hurt me. Including the occasional bad one, will.

> By the way, you contradict yourself here. Earlier, you described 38.0 as
> having zero decimal places (which is wrong). Here you describe it as
> having one, which is correct, and then in a later post you describe it as
> having zero decimal places again.

I was sloppy there. I was copy-pasting data from my program output.
Observe:

>>> print float("38")

38.0

In standard engineering parlance, the string "38" represents a number
with a precision of +/- 1 unit. Unfortunately, Python's default str()
representation turns this into "38.0", which implies +/- 0.1 unit.

Floats represented as strings (at least in some disciplines, such as
number of trailing zeros, they also include information about the
precision of the measurement. That information is lost when the string
is converted to a IEEE float. I'm trying to intuit that information
back, and as I mentioned earlier, am willing to accept that the
intuiting process will be imperfect. There is real-life value in
imperfect processes.

Roy Smith, Apr 29, 2014
5. ### Chris AngelicoGuest

On Tue, Apr 29, 2014 at 11:38 PM, Roy Smith <> wrote:
> I'm trying to intuit, from the values I've been given, which coordinates
> are likely to be accurate to within a few miles. I'm willing to accept
> a few false negatives. If the number is float("38"), I'm willing to
> accept that it might actually be float("38.0000"), and I might be
> throwing out a good data point that I don't need to.

You have one chance in ten, repeatably, of losing a digit. That is,
roughly 10% of your four-decimal figures will appear to be
three-decimal, and 1% of them will appear to be two-decimal, and so
on. Is that "a few" false negatives? It feels like a lot IMO. But
then, there's no alternative - the information's already gone.

ChrisA

Chris Angelico, Apr 29, 2014
6. ### Ned BatchelderGuest

On 4/29/14 12:30 PM, Chris Angelico wrote:
> On Tue, Apr 29, 2014 at 11:38 PM, Roy Smith <> wrote:
>> I'm trying to intuit, from the values I've been given, which coordinates
>> are likely to be accurate to within a few miles. I'm willing to accept
>> a few false negatives. If the number is float("38"), I'm willing to
>> accept that it might actually be float("38.0000"), and I might be
>> throwing out a good data point that I don't need to.

>
> You have one chance in ten, repeatably, of losing a digit. That is,
> roughly 10% of your four-decimal figures will appear to be
> three-decimal, and 1% of them will appear to be two-decimal, and so
> on. Is that "a few" false negatives? It feels like a lot IMO. But
> then, there's no alternative - the information's already gone.
>

Reminds me of the story that the first survey of Mt. Everest resulted in
a height of exactly 29,000 feet, but to avoid the appearance of an
estimate, they reported it as 29,002: http://www.jstor.org/stable/2684102

--
Ned Batchelder, http://nedbatchelder.com

Ned Batchelder, Apr 29, 2014

On 2014-04-29, Roy Smith wrote:

> Another possibility is that they're latitude/longitude coordinates, some
> of which are given to the whole degree, some of which are given to
> greater precision, all the way down to the ten-thousandth of a degree.

That makes sense. 1Â° of longitude is about 111 km at the equator,
78Â km at 45Â°N or S, & 0Â km at the poles.

"A man pitches his tent, walks 1 km south, walks 1 km east, kills a
bear, & walks 1 km north, where he's back at his tent. What color is
the bear?" ;-)

--
War is God's way of teaching Americans geography.
[Ambrose Bierce]

8. ### Mark H HarrisGuest

On 4/29/14 3:16 PM, Adam Funk wrote:
> "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
> bear, & walks 1 km north, where he's back at his tent. What color is
> the bear?" ;-)
>

Who manufactured the tent?

marcus

Mark H Harris, Apr 29, 2014
9. ### Ryan HiebertGuest

On Tue, Apr 29, 2014 at 3:16 PM, Adam Funk <> wrote:

>
> "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
> bear, & walks 1 km north, where he's back at his tent. What color is
> the bear?" ;-)

Skin or Fur?

Ryan Hiebert, Apr 29, 2014
10. ### Chris AngelicoGuest

On Wed, Apr 30, 2014 at 6:39 AM, Mark H Harris <> wrote:
> On 4/29/14 3:16 PM, Adam Funk wrote:
>>
>> "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
>> bear, & walks 1 km north, where he's back at his tent. What color is
>> the bear?" ;-)
>>

>
> Who manufactured the tent?

A man pitches his tent 1 km south and kills a bear with it. Clearly
that wasn't a tent, it was a cricket ball.

ChrisA

Chris Angelico, Apr 29, 2014
11. ### Gregory EwingGuest

Ned Batchelder wrote:
> Reminds me of the story that the first survey of Mt. Everest resulted in
> a height of exactly 29,000 feet, but to avoid the appearance of an
> estimate, they reported it as 29,002: http://www.jstor.org/stable/2684102

They could have said it was 29.000 kilofeet.

--
Greg

Gregory Ewing, Apr 29, 2014
12. ### emileGuest

On 04/29/2014 01:16 PM, Adam Funk wrote:

> "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
> bear, & walks 1 km north, where he's back at his tent. What color is
> the bear?" ;-)

From how many locations on Earth can someone walk one mile south, one
mile east, and one mile north and end up at their starting point?

Emile

emile, Apr 29, 2014
13. ### Mark LawrenceGuest

On 29/04/2014 23:42, emile wrote:
> On 04/29/2014 01:16 PM, Adam Funk wrote:
>
>> "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
>> bear, & walks 1 km north, where he's back at his tent. What color is
>> the bear?" ;-)

>
> From how many locations on Earth can someone walk one mile south, one
> mile east, and one mile north and end up at their starting point?
>
> Emile
>

Haven't you heard of The Triangular Earth Society?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

Mark Lawrence, Apr 30, 2014
14. ### Roy SmithGuest

In article <>,
Chris Angelico <> wrote:

> On Tue, Apr 29, 2014 at 11:38 PM, Roy Smith <> wrote:
> > I'm trying to intuit, from the values I've been given, which coordinates
> > are likely to be accurate to within a few miles. I'm willing to accept
> > a few false negatives. If the number is float("38"), I'm willing to
> > accept that it might actually be float("38.0000"), and I might be
> > throwing out a good data point that I don't need to.

>
> You have one chance in ten, repeatably, of losing a digit. That is,
> roughly 10% of your four-decimal figures will appear to be
> three-decimal, and 1% of them will appear to be two-decimal, and so
> on. Is that "a few" false negatives?

You're looking at it the wrong way. It's not that the glass is 10%
empty, it's that it's 90% full, and 90% is a lot of good data

Roy Smith, Apr 30, 2014
15. ### Chris AngelicoGuest

On Wed, Apr 30, 2014 at 9:53 AM, Roy Smith <> wrote:
> In article <>,
> Chris Angelico <> wrote:
>
>> On Tue, Apr 29, 2014 at 11:38 PM, Roy Smith <> wrote:
>> > I'm trying to intuit, from the values I've been given, which coordinates
>> > are likely to be accurate to within a few miles. I'm willing to accept
>> > a few false negatives. If the number is float("38"), I'm willing to
>> > accept that it might actually be float("38.0000"), and I might be
>> > throwing out a good data point that I don't need to.

>>
>> You have one chance in ten, repeatably, of losing a digit. That is,
>> roughly 10% of your four-decimal figures will appear to be
>> three-decimal, and 1% of them will appear to be two-decimal, and so
>> on. Is that "a few" false negatives?

>
> You're looking at it the wrong way. It's not that the glass is 10%
> empty, it's that it's 90% full, and 90% is a lot of good data

Hah! That's one way of looking at it.

At least you don't have to worry about junk digits getting in. The
greatest precision you're working with is three digits before the
decimal and four after, and a Python float can handle that easily.
(Which is what I was concerned about when I first queried your
terminology - four digits to the right of the decimal and, say, 10-12
to the left, and you're starting to see problems.)

ChrisA

Chris Angelico, Apr 30, 2014
16. ### Chris AngelicoGuest

On Wed, Apr 30, 2014 at 10:13 AM, Ben Finney <> wrote:
> The problem is you won't know *which* 90% is accurate, and which 10% is
> inaccurate. This is very different from the glass, where it's evident
> which part is good.
>
> So, I can't see that you have any choice but to say that *any* of the
> precision predictions should expect, on average, to be (10 + 1 + â€¦)
> percent inaccurate. And you can't know which ones. Is that an acceptable
> error rate?

But they're all going to be *at least* as accurate as the algorithm
says. A figure of 31.4 will be treated as 1 decimal, even though it
might really have been accurate to 4; but a figure of 27.1828 won't be
incorrectly reported as having only 2 decimals.

ChrisA

Chris Angelico, Apr 30, 2014
17. ### Dennis Lee BieberGuest

On 29 Apr 2014 05:43:19 GMT, Steven D'Aprano <>
declaimed the following:

>does differ from value to value, I don't think he'll have much luck:
>given a float like 23.0, all we can say is that it has *at least* zero
>significant decimal places. 23.1 has at least one, 23.1111 has at least
>four.
>

I wouldn't even give it that... Since internally they (ignore binary
conversion) translate into

2.30E1, 2.31E1, and 2.31111E1

I'd claim 3-significant digits, 3-significant digits, and 6-significant
digits. (Heck, as I recall classical FORTRAN, they would be 0.230E2...)

>If you can put an upper bound on the precision, as Roy indicates he can,
>then perhaps a reasonable approach is to convert to a string rounded to
>four decimal places, then strip trailing zeroes:
>

That I'd agree with... once the data has been converted to binary
float, all knowledge of the source significant digits has been lost.

Then confuse matters with the facet that in a math class

1.1 * 2.2 => 2.42

but in a physics or chemistry class the recommended result is

1.1 * 2.2 => 2.4

(one reason slide-rules were acceptable for so long -- and even my high
school trig course only required slide-rule significance even though half
the class had scientific calculators [costing >\$100, when a Sterling
slide-rule could still be had for <\$10]) <G>
--
Wulfraed Dennis Lee Bieber AF6VN
HTTP://wlfraed.home.netcom.com/

Dennis Lee Bieber, Apr 30, 2014
18. ### Dennis Lee BieberGuest

On Wed, 30 Apr 2014 08:51:32 +1000, Chris Angelico <>
declaimed the following:

>
>Any point where the mile east takes you an exact number of times
>around the globe. So, anywhere exactly one mile north of that, which
>is a number of circles not far from the south pole.
>

Yeah, but he'd have had to bring his own bear...

Bears and Penguins don't mix. Seals, OTOH, are food to the bears, and
eat the penquins.

--
Wulfraed Dennis Lee Bieber AF6VN
HTTP://wlfraed.home.netcom.com/

Dennis Lee Bieber, Apr 30, 2014
19. ### Roy SmithGuest

In article <>,
Dennis Lee Bieber <> wrote:

> in a physics or chemistry class the recommended result is
>
> 1.1 * 2.2 => 2.4

More than recommended. In my physics class, if you put down more
significant digits than the input data justified, you got the problem
marked wrong.

> (one reason slide-rules were acceptable for so long -- and even my high
> school trig course only required slide-rule significance even though half
> the class had scientific calculators [costing >\$100, when a Sterling
> slide-rule could still be had for <\$10]) <G>

Sterling? Snort. K&E was the way to go.

Roy Smith, Apr 30, 2014
20. ### Roy SmithGuest

In article <>,

> On 2014-04-29, Roy Smith wrote:
>
> > Another possibility is that they're latitude/longitude coordinates, some
> > of which are given to the whole degree, some of which are given to
> > greater precision, all the way down to the ten-thousandth of a degree.

>
> That makes sense. 1Â° of longitude is about 111 km at the equator,
> 78Â km at 45Â°N or S, & 0Â km at the poles.
>
>
> "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
> bear, & walks 1 km north, where he's back at his tent. What color is
> the bear?" ;-)

Assuming he shot the bear, red.

Roy Smith, Apr 30, 2014