Significant digits in a float?

Discussion in 'Python' started by Roy Smith, Apr 28, 2014.

  1. Roy Smith

    Roy Smith Guest

    I'm using Python 2.7

    I have a bunch of floating point values. For example, here's a few (printed as reprs):

    38.0
    41.2586
    40.75280000000001
    49.25
    33.795199999999994
    36.837199999999996
    34.1489
    45.5

    Fundamentally, these numbers have between 0 and 4 decimal digits of precision, and I want to be able to intuit how many each has, ignoring the obvious floating point roundoff problems. Thus, I want to map:

    38.0 ==> 0
    41.2586 ==> 4
    40.75280000000001 ==> 4
    49.25 ==> 2
    33.795199999999994 ==> 4
    36.837199999999996 ==> 4
    34.1489 ==> 4
    45.5 ==> 1

    Is there any clean way to do that? The best I've come up with so far is to str() them and parse the remaining string to see how many digits it put after the decimal point.

    The numbers are given to me as Python floats; I have no control over that. I'm willing to accept that fact that I won't be able to differentiate between float("38.0") and float("38.0000"). Both of those map to 1, which is OK for my purposes.

    ---
    Roy Smith
     
    Roy Smith, Apr 28, 2014
    #1
    1. Advertising

  2. On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote:

    [...]
    > Fundamentally, these numbers have between 0 and 4 decimal digits of
    > precision,


    I'm surprised that you have a source of data with variable precision,
    especially one that varies by a factor of TEN THOUSAND. The difference
    between 0 and 4 decimal digits is equivalent to measuring some lengths to
    the nearest metre, some to the nearest centimetre, and some to the
    nearest 0.1 of a millimetre. That's very unusual and I don't know what
    justification you have for combining such a mix of data sources.

    One possible interpretation of your post is that you have a source of
    floats, where all the numbers are actually measured to the same
    precision, and you've simply misinterpreted the fact that some of them
    look like they have less precision. Since you indicate that 4 decimal
    digits is the maximum, I'm going with 4 decimal digits. So if your data
    includes the float 23.5, that's 23.5 measured to a precision of four
    decimal places (that is, it's 23.5000, not 23.5001 or 23.4999).

    On the other hand, if you're getting your values as *strings*, that's
    another story. If you can trust the strings, they'll tell you how many
    decimal places: "23.5" is only one decimal place, "23.5000" is four.

    But then what to make of your later example?

    > 40.75280000000001 ==> 4


    Python floats (C doubles) are quite capable of distinguishing between
    40.7528 and 40.75280000000001. They are distinct numbers:

    py> 40.75280000000001 - 40.7528
    7.105427357601002e-15

    so if a number is recorded as 40.75280000000001 presumably it is because
    it was measured as 40.75280000000001. (How that precision can be
    justified, I don't know! Does it come from the Large Hadron Collider?) If
    it were intended to be 40.7528, I expect it would have be recorded as
    40.7528. What reason do you have to think that something recorded to 14
    decimal places was only intended to have been recorded to 4?

    Without knowing more about how your data is generated, I can't advise you
    much, but the whole scenario as you have described it makes me think that
    *somebody* is doing something wrong. Perhaps you need to explain why
    you're doing this, as it seems numerically broken.


    > Is there any clean way to do that? The best I've come up with so far is
    > to str() them and parse the remaining string to see how many digits it
    > put after the decimal point.


    I really think you need to go back to the source. Trying to infer the
    precision of the measurements from the accident of the string formatting
    seems pretty dubious to me.

    But I suppose if you wanted to infer the number of digits after the
    decimal place, excluding trailing zeroes (why, I do not understand), up
    to a maximum of four digits, then you could do:

    s = "%.4f" % number # rounds to four decimal places
    s = s.rstrip("0") # ignore trailing zeroes, whether significant or not
    count = len(s.split(".")[1])


    Assuming all the numbers fit in the range where they are shown in non-
    exponential format. If you have to handle numbers like 1.23e19 as well,
    you'll have to parse the string more carefully. (Keep in mind that most
    floats above a certain size are all integer-valued.)


    > The numbers are given to me as Python floats; I have no control over
    > that.


    If that's the case, what makes you think that two floats from the same
    data set were measured to different precision? Given that you don't see
    strings, only floats, I would say that your problem is unsolvable.
    Whether I measure something to one decimal place and get 23.5, or four
    decimal places and get 23.5000, the float you see will be the same.

    Perhaps you ought to be using Decimal rather than float. Floats have a
    fixed precision, while Decimals can be configured. Then the right way to
    answer your question is to inspect the number:

    py> from decimal import Decimal as D
    py> x = D("23.5000")
    py> x.as_tuple()
    DecimalTuple(sign=0, digits=(2, 3, 5, 0, 0, 0), exponent=-4)

    The number of decimal digits precision is -exponent.


    > I'm willing to accept that fact that I won't be able to differentiate
    > between float("38.0") and float("38.0000"). Both of those map to 1,
    > which is OK for my purposes.


    That seems... well, "bizarre and wrong" are the only words that come to
    mind. If I were recording data as "38.0000" and you told me I had
    measured it to only one decimal place accuracy, I wouldn't be too
    pleased. Maybe if I understood the context better?

    How about 38.12 and 38.1200?

    By the way, you contradict yourself here. Earlier, you described 38.0 as
    having zero decimal places (which is wrong). Here you describe it as
    having one, which is correct, and then in a later post you describe it as
    having zero decimal places again.



    --
    Steven D'Aprano
    http://import-that.dreamwidth.org/
     
    Steven D'Aprano, Apr 29, 2014
    #2
    1. Advertising

  3. On Tue, 29 Apr 2014 13:23:07 +1000, Ben Finney wrote:

    > Steven D'Aprano <> writes:
    >
    >> By the way, you contradict yourself here. Earlier, you described 38.0
    >> as having zero decimal places (which is wrong). Here you describe it as
    >> having one, which is correct, and then in a later post you describe it
    >> as having zero decimal places again.

    >
    > I get the impression that this is at the core of the misunderstanding.
    > Having a number's representation ending in “….0†does not mean zero
    > decimal places; it has exactly one. The value's representation contains
    > the digit “0†after the decimal point, but that digit is significant to
    > the precision of the representation.
    >
    > If the problem could be stated such that “38.0†and “38†and “38.000â€
    > are consistently described with the correct number of decimal digits of
    > precision (in those examples: one, zero, and three), maybe the
    > discussion would make more sense.



    It's actually trickier than that. Digits of precision can refer to
    measurement error, or to the underlying storage type. Python floats are C
    doubles, so they have 64 bits of precision (approximately 17 decimal
    digits, if I remember correctly) regardless of the precision of the
    measurement. The OP (Roy) is, I think, trying to guess the measurement
    precision after the fact, given a float. If the measurement error really
    does differ from value to value, I don't think he'll have much luck:
    given a float like 23.0, all we can say is that it has *at least* zero
    significant decimal places. 23.1 has at least one, 23.1111 has at least
    four.

    If you can put an upper bound on the precision, as Roy indicates he can,
    then perhaps a reasonable approach is to convert to a string rounded to
    four decimal places, then strip trailing zeroes:

    py> x = 1234.1 # actual internal is closer to 1234.099999999999909
    py> ("%.4f" % x).rstrip('0')
    '1234.1'

    then count the number of digits after the dot. (This assumes that the
    string formatting routines are correctly rounded, which they should be on
    *most* platforms.) But again, this only gives a lower bound to the number
    of significant digits -- it's at least one, but might be more.


    --
    Steven
     
    Steven D'Aprano, Apr 29, 2014
    #3
  4. Roy Smith

    Roy Smith Guest

    In article <535f0f9f$0$29965$c3e8da3$>,
    Steven D'Aprano <> wrote:

    > On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote:
    >
    > [...]
    > > Fundamentally, these numbers have between 0 and 4 decimal digits of
    > > precision,

    >
    > I'm surprised that you have a source of data with variable precision,
    > especially one that varies by a factor of TEN THOUSAND.


    OK, you're surprised.

    > I don't know what justification you have for combining such a
    > mix of data sources.


    Because that's the data that was given to me. Real life data is messy.

    > One possible interpretation of your post is that you have a source of
    > floats, where all the numbers are actually measured to the same
    > precision, and you've simply misinterpreted the fact that some of them
    > look like they have less precision.


    Another possibility is that they're latitude/longitude coordinates, some
    of which are given to the whole degree, some of which are given to
    greater precision, all the way down to the ten-thousandth of a degree.

    > What reason do you have to think that something recorded to 14
    > decimal places was only intended to have been recorded to 4?


    Because I understand the physical measurement these numbers represent.
    Sometimes, Steve, you have to assume that when somebody asks a question,
    they actually have asked the question then intended to ask.

    > Perhaps you need to explain why you're doing this, as it seems
    > numerically broken.


    These are latitude and longitude coordinates of locations. Some
    locations are known to a specific street address. Some are known to a
    city. Some are only known to the country. So, for example, the 38.0
    value represents the latitude, to the nearest whole degree, of the
    geographic center of the contiguous United States.

    > I really think you need to go back to the source. Trying to infer the
    > precision of the measurements from the accident of the string formatting
    > seems pretty dubious to me.


    Sure it is. But, like I said, real-life data is messy. You can wring
    your hands and say, "this data sucks, I can't use it", or you can figure
    out some way to deal with it. Which is the whole point of my post. The
    best I've come up with is inferring something from the string formatting
    and I'm hoping there might be something better I might do.

    > But I suppose if you wanted to infer the number of digits after the
    > decimal place, excluding trailing zeroes (why, I do not understand), up
    > to a maximum of four digits, then you could do:
    >
    > s = "%.4f" % number # rounds to four decimal places
    > s = s.rstrip("0") # ignore trailing zeroes, whether significant or not
    > count = len(s.split(".")[1])


    This at least seems a little more robust than just calling str(). Thank
    you :)

    > Assuming all the numbers fit in the range where they are shown in non-
    > exponential format.


    They're latitude/longitude, so they all fall into [-180, 180].

    > Perhaps you ought to be using Decimal rather than float.


    Like I said, "The numbers are given to me as Python floats; I have no
    control over that".

    > > I'm willing to accept that fact that I won't be able to differentiate
    > > between float("38.0") and float("38.0000"). Both of those map to 1,
    > > which is OK for my purposes.

    >
    > That seems... well, "bizarre and wrong" are the only words that come to
    > mind.


    I'm trying to intuit, from the values I've been given, which coordinates
    are likely to be accurate to within a few miles. I'm willing to accept
    a few false negatives. If the number is float("38"), I'm willing to
    accept that it might actually be float("38.0000"), and I might be
    throwing out a good data point that I don't need to.

    For the purpose I'm using the data for, excluding the occasional good
    data point won't hurt me. Including the occasional bad one, will.

    > By the way, you contradict yourself here. Earlier, you described 38.0 as
    > having zero decimal places (which is wrong). Here you describe it as
    > having one, which is correct, and then in a later post you describe it as
    > having zero decimal places again.


    I was sloppy there. I was copy-pasting data from my program output.
    Observe:

    >>> print float("38")

    38.0

    In standard engineering parlance, the string "38" represents a number
    with a precision of +/- 1 unit. Unfortunately, Python's default str()
    representation turns this into "38.0", which implies +/- 0.1 unit.

    Floats represented as strings (at least in some disciplines, such as
    engineering) include more information than just the value. By the
    number of trailing zeros, they also include information about the
    precision of the measurement. That information is lost when the string
    is converted to a IEEE float. I'm trying to intuit that information
    back, and as I mentioned earlier, am willing to accept that the
    intuiting process will be imperfect. There is real-life value in
    imperfect processes.
     
    Roy Smith, Apr 29, 2014
    #4
  5. On Tue, Apr 29, 2014 at 11:38 PM, Roy Smith <> wrote:
    > I'm trying to intuit, from the values I've been given, which coordinates
    > are likely to be accurate to within a few miles. I'm willing to accept
    > a few false negatives. If the number is float("38"), I'm willing to
    > accept that it might actually be float("38.0000"), and I might be
    > throwing out a good data point that I don't need to.


    You have one chance in ten, repeatably, of losing a digit. That is,
    roughly 10% of your four-decimal figures will appear to be
    three-decimal, and 1% of them will appear to be two-decimal, and so
    on. Is that "a few" false negatives? It feels like a lot IMO. But
    then, there's no alternative - the information's already gone.

    ChrisA
     
    Chris Angelico, Apr 29, 2014
    #5
  6. On 4/29/14 12:30 PM, Chris Angelico wrote:
    > On Tue, Apr 29, 2014 at 11:38 PM, Roy Smith <> wrote:
    >> I'm trying to intuit, from the values I've been given, which coordinates
    >> are likely to be accurate to within a few miles. I'm willing to accept
    >> a few false negatives. If the number is float("38"), I'm willing to
    >> accept that it might actually be float("38.0000"), and I might be
    >> throwing out a good data point that I don't need to.

    >
    > You have one chance in ten, repeatably, of losing a digit. That is,
    > roughly 10% of your four-decimal figures will appear to be
    > three-decimal, and 1% of them will appear to be two-decimal, and so
    > on. Is that "a few" false negatives? It feels like a lot IMO. But
    > then, there's no alternative - the information's already gone.
    >


    Reminds me of the story that the first survey of Mt. Everest resulted in
    a height of exactly 29,000 feet, but to avoid the appearance of an
    estimate, they reported it as 29,002: http://www.jstor.org/stable/2684102

    --
    Ned Batchelder, http://nedbatchelder.com
     
    Ned Batchelder, Apr 29, 2014
    #6
  7. Roy Smith

    Adam Funk Guest

    On 2014-04-29, Roy Smith wrote:

    > Another possibility is that they're latitude/longitude coordinates, some
    > of which are given to the whole degree, some of which are given to
    > greater precision, all the way down to the ten-thousandth of a degree.


    That makes sense. 1° of longitude is about 111 km at the equator,
    78 km at 45°N or S, & 0 km at the poles.


    "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
    bear, & walks 1 km north, where he's back at his tent. What color is
    the bear?" ;-)


    --
    War is God's way of teaching Americans geography.
    [Ambrose Bierce]
     
    Adam Funk, Apr 29, 2014
    #7
  8. On 4/29/14 3:16 PM, Adam Funk wrote:
    > "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
    > bear, & walks 1 km north, where he's back at his tent. What color is
    > the bear?" ;-)
    >


    Who manufactured the tent?


    marcus
     
    Mark H Harris, Apr 29, 2014
    #8
  9. Roy Smith

    Ryan Hiebert Guest

    On Tue, Apr 29, 2014 at 3:16 PM, Adam Funk <> wrote:

    >
    > "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
    > bear, & walks 1 km north, where he's back at his tent. What color is
    > the bear?" ;-)



    Skin or Fur?
     
    Ryan Hiebert, Apr 29, 2014
    #9
  10. On Wed, Apr 30, 2014 at 6:39 AM, Mark H Harris <> wrote:
    > On 4/29/14 3:16 PM, Adam Funk wrote:
    >>
    >> "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
    >> bear, & walks 1 km north, where he's back at his tent. What color is
    >> the bear?" ;-)
    >>

    >
    > Who manufactured the tent?


    A man pitches his tent 1 km south and kills a bear with it. Clearly
    that wasn't a tent, it was a cricket ball.

    ChrisA
     
    Chris Angelico, Apr 29, 2014
    #10
  11. Ned Batchelder wrote:
    > Reminds me of the story that the first survey of Mt. Everest resulted in
    > a height of exactly 29,000 feet, but to avoid the appearance of an
    > estimate, they reported it as 29,002: http://www.jstor.org/stable/2684102


    They could have said it was 29.000 kilofeet.

    --
    Greg
     
    Gregory Ewing, Apr 29, 2014
    #11
  12. Roy Smith

    emile Guest

    On 04/29/2014 01:16 PM, Adam Funk wrote:

    > "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
    > bear, & walks 1 km north, where he's back at his tent. What color is
    > the bear?" ;-)


    From how many locations on Earth can someone walk one mile south, one
    mile east, and one mile north and end up at their starting point?

    Emile
     
    emile, Apr 29, 2014
    #12
  13. On 29/04/2014 23:42, emile wrote:
    > On 04/29/2014 01:16 PM, Adam Funk wrote:
    >
    >> "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
    >> bear, & walks 1 km north, where he's back at his tent. What color is
    >> the bear?" ;-)

    >
    > From how many locations on Earth can someone walk one mile south, one
    > mile east, and one mile north and end up at their starting point?
    >
    > Emile
    >


    Haven't you heard of The Triangular Earth Society?

    --
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.

    Mark Lawrence

    ---
    This email is free from viruses and malware because avast! Antivirus protection is active.
    http://www.avast.com
     
    Mark Lawrence, Apr 30, 2014
    #13
  14. Roy Smith

    Roy Smith Guest

    In article <>,
    Chris Angelico <> wrote:

    > On Tue, Apr 29, 2014 at 11:38 PM, Roy Smith <> wrote:
    > > I'm trying to intuit, from the values I've been given, which coordinates
    > > are likely to be accurate to within a few miles. I'm willing to accept
    > > a few false negatives. If the number is float("38"), I'm willing to
    > > accept that it might actually be float("38.0000"), and I might be
    > > throwing out a good data point that I don't need to.

    >
    > You have one chance in ten, repeatably, of losing a digit. That is,
    > roughly 10% of your four-decimal figures will appear to be
    > three-decimal, and 1% of them will appear to be two-decimal, and so
    > on. Is that "a few" false negatives?


    You're looking at it the wrong way. It's not that the glass is 10%
    empty, it's that it's 90% full, and 90% is a lot of good data :)
     
    Roy Smith, Apr 30, 2014
    #14
  15. On Wed, Apr 30, 2014 at 9:53 AM, Roy Smith <> wrote:
    > In article <>,
    > Chris Angelico <> wrote:
    >
    >> On Tue, Apr 29, 2014 at 11:38 PM, Roy Smith <> wrote:
    >> > I'm trying to intuit, from the values I've been given, which coordinates
    >> > are likely to be accurate to within a few miles. I'm willing to accept
    >> > a few false negatives. If the number is float("38"), I'm willing to
    >> > accept that it might actually be float("38.0000"), and I might be
    >> > throwing out a good data point that I don't need to.

    >>
    >> You have one chance in ten, repeatably, of losing a digit. That is,
    >> roughly 10% of your four-decimal figures will appear to be
    >> three-decimal, and 1% of them will appear to be two-decimal, and so
    >> on. Is that "a few" false negatives?

    >
    > You're looking at it the wrong way. It's not that the glass is 10%
    > empty, it's that it's 90% full, and 90% is a lot of good data :)


    Hah! That's one way of looking at it.

    At least you don't have to worry about junk digits getting in. The
    greatest precision you're working with is three digits before the
    decimal and four after, and a Python float can handle that easily.
    (Which is what I was concerned about when I first queried your
    terminology - four digits to the right of the decimal and, say, 10-12
    to the left, and you're starting to see problems.)

    ChrisA
     
    Chris Angelico, Apr 30, 2014
    #15
  16. On Wed, Apr 30, 2014 at 10:13 AM, Ben Finney <> wrote:
    > The problem is you won't know *which* 90% is accurate, and which 10% is
    > inaccurate. This is very different from the glass, where it's evident
    > which part is good.
    >
    > So, I can't see that you have any choice but to say that *any* of the
    > precision predictions should expect, on average, to be (10 + 1 + …)
    > percent inaccurate. And you can't know which ones. Is that an acceptable
    > error rate?


    But they're all going to be *at least* as accurate as the algorithm
    says. A figure of 31.4 will be treated as 1 decimal, even though it
    might really have been accurate to 4; but a figure of 27.1828 won't be
    incorrectly reported as having only 2 decimals.

    ChrisA
     
    Chris Angelico, Apr 30, 2014
    #16
  17. On 29 Apr 2014 05:43:19 GMT, Steven D'Aprano <>
    declaimed the following:

    >does differ from value to value, I don't think he'll have much luck:
    >given a float like 23.0, all we can say is that it has *at least* zero
    >significant decimal places. 23.1 has at least one, 23.1111 has at least
    >four.
    >

    I wouldn't even give it that... Since internally they (ignore binary
    conversion) translate into

    2.30E1, 2.31E1, and 2.31111E1

    I'd claim 3-significant digits, 3-significant digits, and 6-significant
    digits. (Heck, as I recall classical FORTRAN, they would be 0.230E2...)

    >If you can put an upper bound on the precision, as Roy indicates he can,
    >then perhaps a reasonable approach is to convert to a string rounded to
    >four decimal places, then strip trailing zeroes:
    >

    That I'd agree with... once the data has been converted to binary
    float, all knowledge of the source significant digits has been lost.

    Then confuse matters with the facet that in a math class

    1.1 * 2.2 => 2.42

    but in a physics or chemistry class the recommended result is

    1.1 * 2.2 => 2.4

    (one reason slide-rules were acceptable for so long -- and even my high
    school trig course only required slide-rule significance even though half
    the class had scientific calculators [costing >$100, when a Sterling
    slide-rule could still be had for <$10]) <G>
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Apr 30, 2014
    #17
  18. On Wed, 30 Apr 2014 08:51:32 +1000, Chris Angelico <>
    declaimed the following:

    >
    >Any point where the mile east takes you an exact number of times
    >around the globe. So, anywhere exactly one mile north of that, which
    >is a number of circles not far from the south pole.
    >

    Yeah, but he'd have had to bring his own bear...

    Bears and Penguins don't mix. Seals, OTOH, are food to the bears, and
    eat the penquins.

    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Apr 30, 2014
    #18
  19. Roy Smith

    Roy Smith Guest

    In article <>,
    Dennis Lee Bieber <> wrote:

    > in a physics or chemistry class the recommended result is
    >
    > 1.1 * 2.2 => 2.4


    More than recommended. In my physics class, if you put down more
    significant digits than the input data justified, you got the problem
    marked wrong.

    > (one reason slide-rules were acceptable for so long -- and even my high
    > school trig course only required slide-rule significance even though half
    > the class had scientific calculators [costing >$100, when a Sterling
    > slide-rule could still be had for <$10]) <G>


    Sterling? Snort. K&E was the way to go.
     
    Roy Smith, Apr 30, 2014
    #19
  20. Roy Smith

    Roy Smith Guest

    In article <>,
    Adam Funk <> wrote:

    > On 2014-04-29, Roy Smith wrote:
    >
    > > Another possibility is that they're latitude/longitude coordinates, some
    > > of which are given to the whole degree, some of which are given to
    > > greater precision, all the way down to the ten-thousandth of a degree.

    >
    > That makes sense. 1° of longitude is about 111 km at the equator,
    > 78 km at 45°N or S, & 0 km at the poles.
    >
    >
    > "A man pitches his tent, walks 1 km south, walks 1 km east, kills a
    > bear, & walks 1 km north, where he's back at his tent. What color is
    > the bear?" ;-)


    Assuming he shot the bear, red.
     
    Roy Smith, Apr 30, 2014
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Corby

    Round float to X significant digits

    David Corby, May 1, 2004, in forum: C++
    Replies:
    8
    Views:
    30,072
    Siemel Naran
    May 2, 2004
  2. bd
    Replies:
    0
    Views:
    659
  3. Jason Lillywhite

    Increase significant digits in Float

    Jason Lillywhite, Mar 2, 2010, in forum: Ruby
    Replies:
    11
    Views:
    221
    Colin Bartlett
    Jul 1, 2010
  4. SMH
    Replies:
    0
    Views:
    244
  5. Ned Batchelder

    Re: Significant digits in a float?

    Ned Batchelder, Apr 28, 2014, in forum: Python
    Replies:
    2
    Views:
    67
    Ned Batchelder
    Apr 28, 2014
Loading...

Share This Page