On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote:

[...]

Fundamentally, these numbers have between 0 and 4 decimal digits of

precision,

I'm surprised that you have a source of data with variable precision,

especially one that varies by a factor of TEN THOUSAND. The difference

between 0 and 4 decimal digits is equivalent to measuring some lengths to

the nearest metre, some to the nearest centimetre, and some to the

nearest 0.1 of a millimetre. That's very unusual and I don't know what

justification you have for combining such a mix of data sources.

One possible interpretation of your post is that you have a source of

floats, where all the numbers are actually measured to the same

precision, and you've simply misinterpreted the fact that some of them

look like they have less precision. Since you indicate that 4 decimal

digits is the maximum, I'm going with 4 decimal digits. So if your data

includes the float 23.5, that's 23.5 measured to a precision of four

decimal places (that is, it's 23.5000, not 23.5001 or 23.4999).

On the other hand, if you're getting your values as *strings*, that's

another story. If you can trust the strings, they'll tell you how many

decimal places: "23.5" is only one decimal place, "23.5000" is four.

But then what to make of your later example?

Python floats (C doubles) are quite capable of distinguishing between

40.7528 and 40.75280000000001. They are distinct numbers:

py> 40.75280000000001 - 40.7528

7.105427357601002e-15

so if a number is recorded as 40.75280000000001 presumably it is because

it was measured as 40.75280000000001. (How that precision can be

justified, I don't know! Does it come from the Large Hadron Collider?) If

it were intended to be 40.7528, I expect it would have be recorded as

40.7528. What reason do you have to think that something recorded to 14

decimal places was only intended to have been recorded to 4?

Without knowing more about how your data is generated, I can't advise you

much, but the whole scenario as you have described it makes me think that

*somebody* is doing something wrong. Perhaps you need to explain why

you're doing this, as it seems numerically broken.

Is there any clean way to do that? The best I've come up with so far is

to str() them and parse the remaining string to see how many digits it

put after the decimal point.

I really think you need to go back to the source. Trying to infer the

precision of the measurements from the accident of the string formatting

seems pretty dubious to me.

But I suppose if you wanted to infer the number of digits after the

decimal place, excluding trailing zeroes (why, I do not understand), up

to a maximum of four digits, then you could do:

s = "%.4f" % number # rounds to four decimal places

s = s.rstrip("0") # ignore trailing zeroes, whether significant or not

count = len(s.split(".")[1])

Assuming all the numbers fit in the range where they are shown in non-

exponential format. If you have to handle numbers like 1.23e19 as well,

you'll have to parse the string more carefully. (Keep in mind that most

floats above a certain size are all integer-valued.)

The numbers are given to me as Python floats; I have no control over

that.

If that's the case, what makes you think that two floats from the same

data set were measured to different precision? Given that you don't see

strings, only floats, I would say that your problem is unsolvable.

Whether I measure something to one decimal place and get 23.5, or four

decimal places and get 23.5000, the float you see will be the same.

Perhaps you ought to be using Decimal rather than float. Floats have a

fixed precision, while Decimals can be configured. Then the right way to

answer your question is to inspect the number:

py> from decimal import Decimal as D

py> x = D("23.5000")

py> x.as_tuple()

DecimalTuple(sign=0, digits=(2, 3, 5, 0, 0, 0), exponent=-4)

The number of decimal digits precision is -exponent.

I'm willing to accept that fact that I won't be able to differentiate

between float("38.0") and float("38.0000"). Both of those map to 1,

which is OK for my purposes.

That seems... well, "bizarre and wrong" are the only words that come to

mind. If I were recording data as "38.0000" and you told me I had

measured it to only one decimal place accuracy, I wouldn't be too

pleased. Maybe if I understood the context better?

How about 38.12 and 38.1200?

By the way, you contradict yourself here. Earlier, you described 38.0 as

having zero decimal places (which is wrong). Here you describe it as

having one, which is correct, and then in a later post you describe it as

having zero decimal places again.