Floating point bug?

Paul Rubin · Feb 26, 2008

Mark Dickinson said:
def mean(number_list):
return sum(number_list)/len(number_list)

If you pass a list of floats, complex numbers, Fractions, or Decimal
instances to mean() then it'll work just fine. But if you pass
a list of ints or longs, it'll silently return the wrong result.

So use: return sum(number_list) / float(len(number_list))
That makes it somewhat more explicit what you want. Otherwise
I wouldn't be so sure the integer result is "wrong".

Mark Dickinson · Feb 27, 2008

So use: return sum(number_list) / float(len(number_list))
That makes it somewhat more explicit what you want. Otherwise

But that fails for a list of Decimals...

Mark

Gabriel Genellina · Feb 27, 2008

En Wed said:
And smaller numbers are problematic too:

9999999999.9999981

This despite the fact that the quotient *is* exactly representable
as a float...

But:
py> 10**60//10**50
10000000000L

Nobody has menctioned yet the -Q command line option: -Qwarn will issue a
warning when 3/4 is executed.
And there is a helper script, Tools\scripts\fixdiv.py, that helps on
locating and replacing / operators. It works by analyzing the warnings
issued when the target program is actually executed with -Qwarnall.
There is also a companion script, finddiv.py, that just scans the source
looking for / and /= operators.
They exist since this semantic change was introduced *seven* *years* ago,
in 2001, so it's not that suddenly the Python world is going to be upside
down... I can't believe how long this thread is by now...

Paul Rubin · Feb 27, 2008

Mark Dickinson said:
But that fails for a list of Decimals...

Again, that depends on what your application considers to be failure.
Heck, int/int = float instead of decimal might be a failure.

FWIW, I just checked Haskell: int/int is not allowed (compile time
type error). There is an integer division function `div`, like
Python's //, . that you can use if you want an integer quotient. If
you want a floating or rational quotient, you have to coerce the
operands manually. Explicit is better than implicit.

Paul Rubin · Feb 27, 2008

Gabriel Genellina said:
They exist since this semantic change was introduced *seven* *years*
ago, in 2001, so it's not that suddenly the Python world is going to
be upside down... I can't believe how long this thread is by now...

I don't think it's a sudden uproar about int/int being float, it's
just one of the periodic discussions about introducing a rational
type, like we not that long ago got a decimal type.

Dennis Lee Bieber · Feb 27, 2008

So, when you have five children over for a birthday party, and one cake,
do you say "Sorry kids, no cake for you: one cake divided by five is
zero"?

Ah, but one cake is composed of n-pieces, where n is a sufficiently
large number. So the problem becomes dividing n-pieces among m-guests...

And next we will feed the multitudes from a handful of fish <G>
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Ross Ridge · Feb 27, 2008

Mark Dickinson said:
True division and floor division are different operations. It doesn't
seem ridiculous to use different operators for them.

I don't have a problem with there being different operators for integer
and floating-point division. I have a problem with the behviour of the
slash (/) operator changing.

Ross Ridge

Steven D'Aprano · Feb 27, 2008

So use: return sum(number_list) / float(len(number_list)) That makes it
somewhat more explicit what you want. Otherwise I wouldn't be so sure
the integer result is "wrong".

Oh come on. With a function named "mean" that calculates the sum of a
list of numbers and then divides by the number of items, what else could
it be?

You can always imagine corner cases where some programmer, somewhere, has
some bizarre need for a mean() function that truncates when given a list
of integers but not when given a list of floats. Making that the default
makes life easy for the 0.1% corner cases and life harder for the 99.9%
of regular cases, which is far from the Python philosophy.

It's better to reverse the burden, as Python does. Not that it is much
harder to write truncating_mean_for_ints_but_not_floats(): just use
the // int operator instead of /. At which point somebody will chime up
that int division gives the wrong result with one negative operand,
because in *their* opinion it should truncate rather than round to floor:
-1

Paul Rubin · Feb 27, 2008

Steven D'Aprano said:
Oh come on. With a function named "mean" that calculates the sum of a
list of numbers and then divides by the number of items, what else could
it be?

You have a bunch of marbles you want to put into bins. The division
tells you how many marbles to put into each bin. That would be
an integer since you cannot cut up individual marbles.

You can always imagine corner cases where some programmer, somewhere, has
some bizarre need for a mean() function that truncates when given a list
of integers but not when given a list of floats. Making that the default
makes life easy for the 0.1% corner cases and life harder for the 99.9%
of regular cases, which is far from the Python philosophy.

I think it's more important that a program never give a wrong answer,
than save a few keystrokes. So, that polymorphic mean function is
a bit scary. It might be best to throw an error if the args are
all integers. There is no definitely correct way to handle it so
it's better to require explicit directions.

Steven D'Aprano · Feb 27, 2008

Again, that depends on what your application considers to be failure.
Heck, int/int = float instead of decimal might be a failure.

FWIW, I just checked Haskell: int/int is not allowed (compile time type
error). There is an integer division function `div`, like Python's //,
. that you can use if you want an integer quotient. If you want a
floating or rational quotient, you have to coerce the operands manually.
Explicit is better than implicit.

Argument by platitude now? I can play that game too. You forget that
practicality beats purity.

When it comes to mixed arithmetic, it's just too darn inconvenient to
forbid automatic conversions. Otherwise you end up either forbidding
things like 1 + 1.0 on the basis that it isn't clear whether the
programmer wants an int result or a float result, or else even more
complex rules ("if the left operator is an int, and the result of the
addition has a zero floating-point part, then the result is an int,
otherwise it's an error, but if the left operator is a float, the result
is always a float"). Or a proliferation of operators, with integer and
floating point versions of everything.

Paul Rubin · Feb 27, 2008

Steven D'Aprano said:
When it comes to mixed arithmetic, it's just too darn inconvenient to
forbid automatic conversions. Otherwise you end up either forbidding
things like 1 + 1.0 on the basis that it isn't clear whether the
programmer wants an int result or a float result,

You can parse 1 as either an integer or a floating 1, so 1 + 1.0 can
be correctly typed as a float. However (for example), len(x) is
always an int so len(x) + 1.0 would be forbidden.

or else even more complex rules ("if the left operator is an int,
and the result of the addition has a zero floating-point part, then
the result is an int,

That is ugly and unnecessary.

Steven D'Aprano · Feb 27, 2008

You have a bunch of marbles you want to put into bins. The division
tells you how many marbles to put into each bin. That would be an
integer since you cannot cut up individual marbles.

(Actually you can. As a small child, one of my most precious possessions
was a marble which had cracked into two halves.)

No, that doesn't follow, because you don't get the result you want if the
number of marbles is entered as Decimals or floats. Maybe the data came
from a marble-counting device that always returns floats.

You're expecting the function to magically know what you want to do with
the result and return the right kind of answer, which is the wrong way to
go about it. For example, there are situations where your data is given
in integers, but the number you want is a float.

# number of 20kg bags of flour per order

data = [5, 7, 20, 2, 7, 6, 1, 37, 3]
weights = [20*n for n in data]
mean(weights)

Click to expand...

Click to expand...

195.55555555555554

If I was using a library that arbitrarily decided to round the mean
weight per order to 195kg, I'd report that as a bug. Maybe I want the
next highest integer, not lowest. Maybe I do care about that extra 5/9th
of a kilo. It simply isn't acceptable for the function to try to guess
what I'm going to do with the result.

I think it's more important that a program never give a wrong answer,
than save a few keystrokes. So, that polymorphic mean function is a bit
scary. It might be best to throw an error if the args are all integers.
There is no definitely correct way to handle it so it's better to
require explicit directions.

Of course there's a correct way to handle it. You write a function that
returns the mathematical mean. And then, if you need special processing
of that mean, (say) truncating if the numbers are all ints, or on
Tuesdays, you do so afterwards:

x = mean(data)
if all(isinstance(n, int) for n in data) or today() == Tuesday:
x = int(x)

I suppose that if your application is always going to truncate the mean
you might be justified in writing an optimized function that does that.
But don't call it "truncated_mean", because that has a specific meaning
to statisticians that is not the same as what you're talking about.

Paul, I'm pretty sure you've publicly defended duck typing before. Now
you're all scared of some imagined type non-safety that results from
numeric coercions. I can't imagine why you think that this should be
allowed:

class Float(float): pass
x = Float(1.0)
mean([x, 2.0, 3.0, 5.0])

but this gives you the heebie-geebies:

mean([1, 2.0, 3.0, 5.0])

As a general principle, I'd agree that arbitrarily coercing any old type
into any other type is a bad idea. But in the specific case of numeric
coercions, 99% of the time the Right Way is to treat all numbers
identically, and then restrict the result if you want a restricted
result, so the language should make that the easy case, and leave the 1%
to the developer to write special code:

def pmean(data): # Paul Rubin's mean
"""Returns the arithmetic mean of data, unless data is all
ints, in which case returns the mean rounded to the nearest
integer less than the arithmetic mean."""
s = sum(data)
if isinstance(s, int): return s//len(data)
else: return s/len(data)

Paul Rubin · Feb 27, 2008

Steven D'Aprano said:
def pmean(data): # Paul Rubin's mean
"""Returns the arithmetic mean of data, unless data is all
ints, in which case returns the mean rounded to the nearest
integer less than the arithmetic mean."""
s = sum(data)
if isinstance(s, int): return s//len(data)
else: return s/len(data)

Scheme and Common Lisp do automatic conversion and they thought out
the semantics rather carefully, and I think both of them return
exact rationals in this situation (int/int division). I agree
with you that using // as above is pretty weird and it may be
preferable to raise TypeError on any use of int/int (require
either an explicit conversion, or use of //).

Dan Bishop · Feb 27, 2008

But that fails for a list of Decimals...

Mark

Or complex. Or for rationals if you aren't expecting a conversion to
float.

Steven D'Aprano · Feb 27, 2008

You can parse 1 as either an integer or a floating 1, so 1 + 1.0 can be
correctly typed as a float. However (for example), len(x) is always an
int so len(x) + 1.0 would be forbidden.

Okay, that's just insane, making distinctions between literals and
variables like that.

1 + 1.0 # okay

x = 1
x + 1.0 # is this okay or not? who knows?

len('s') + 1.0 # forbidden

I am so glad you're not the designer of Python.

That is ugly and unnecessary.

Which was my point.

Paul Rubin · Feb 27, 2008

Steven D'Aprano said:
Okay, that's just insane, making distinctions between literals and
variables like that.

1 + 1.0 # okay

=> Yes

x = 1
x + 1.0 # is this okay or not? who knows?

=> Yes, ok

len('s') + 1.0 # forbidden

Yes, forbidden.

More examples:

x = 1
y = len(s) + x

=> ok, decides that x is an int

x = 1
y = x + 3.0

=> ok, decides that x is a float

x = 1
y = x + 3.0
z = len(s) + x

=> forbidden, x cannot be an int and float at the same time.

I am so glad you're not the designer of Python.

This is how Haskell works and I don't notice much complaints about it.

Steven D'Aprano · Feb 28, 2008

Scheme and Common Lisp do automatic conversion and they thought out the
semantics rather carefully, and I think both of them return exact
rationals in this situation (int/int division).

Potentially better than returning a float, but equally objectionable to

I agree with you that
using // as above is pretty weird and it may be preferable to raise
TypeError on any use of int/int (require either an explicit conversion,
or use of //).

Then you have completely misunderstood my objection.

It's not that // is weird, but that the semantics that you want by
default:

"return the mean for arbitrary numeric data, except for all ints, in
which case return the mean rounded to the next smaller integer"

is weird. Weird or not, if you want those semantics, Python gives you the
tools to create it, as above. It's not even very much more work.

But the normal semantics:

"return the mean for arbitrary numeric data"

should be easier, and with Python it is:

def mean(data): return sum(data)/len(data)

That does the right thing for data, no matter of what it consists of:
floats, ints, Decimals, rationals, complex numbers, or a mix of all of
the above.

You want the pmean() case to be easy, and the mean() case to be hard, and
that's what boggles my brain.

Marc 'BlackJack' Rintsch · Feb 28, 2008

=> Yes

=> Yes, ok

Yes, forbidden.

More examples:

x = 1
y = len(s) + x

=> ok, decides that x is an int

x = 1
y = x + 3.0

=> ok, decides that x is a float

x = 1
y = x + 3.0
z = len(s) + x

=> forbidden, x cannot be an int and float at the same time.

This is how Haskell works and I don't notice much complaints about it.

Complain!

For implementing this in Python you have to carry an "is allowed to be
coerced to float" flag with every integer object to decide at run time if
it is an error to add it to a float or not. Or you make Python into a
statically typed language like Haskell. But then it's not Python anymore
IMHO.

Ciao,
Marc 'BlackJack' Rintsch

Paul Rubin · Feb 28, 2008

Marc 'BlackJack' Rintsch said:
For implementing this in Python you have to carry an "is allowed to be
coerced to float" flag with every integer object to decide at run time if
it is an error to add it to a float or not.

Yeah, I guess it's not workable in a dynamic language. Hmm. Well I
could think of some crazy ways to do it.

Or you make Python into a statically typed language like Haskell.
But then it's not Python anymore IMHO.

There are some languages like Boo, that are sort of halfway between
Python and Haskell, so maybe that kind of idea could be used in them.

Dennis Lee Bieber · Feb 28, 2008

When it comes to mixed arithmetic, it's just too darn inconvenient to
forbid automatic conversions. Otherwise you end up either forbidding
things like 1 + 1.0 on the basis that it isn't clear whether the
programmer wants an int result or a float result, or else even more
complex rules ("if the left operator is an int, and the result of the
addition has a zero floating-point part, then the result is an int,
otherwise it's an error, but if the left operator is a float, the result
is always a float"). Or a proliferation of operators, with integer and
floating point versions of everything.

Automatic conversions, okay... but converting a result when all
inputs are of one time, NO...

The only rule needed is very simple: promote simpler types to the
more complex type involved in the current expression (with expression
defined as "value operator value" -- so (1/2) * 3.0 is INTEGER 1/2,
resultant 0 then promoted to float 0.0 to be compatible with 3.0).

Very simple rule, used by very many traditional programming
languages.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

print - bug or feature - concatenated format strings in a printstatement	7	Mar 16, 2009
read() does not read whole file in activepython/DOS	1	Nov 7, 2008
problem with 'global'	7	Jan 20, 2008
Help building GUI with Tix	1	Jul 3, 2007
Bug when using with_statement with exec	1	Jul 14, 2008
Surprise with special floating point values	3	Nov 29, 2006
print 'hello' -> SyntaxError: invalid syntax	3	Feb 7, 2008
numpy: handling float('NaN') different in XP vs. Linux	6	Jun 13, 2008

Floating point bug?

Paul Rubin

Mark Dickinson

Gabriel Genellina

Paul Rubin

Paul Rubin

Dennis Lee Bieber

Ross Ridge

Steven D'Aprano

Paul Rubin

Steven D'Aprano

Paul Rubin

Steven D'Aprano

Paul Rubin

Dan Bishop

Steven D'Aprano

Paul Rubin

Steven D'Aprano

Marc 'BlackJack' Rintsch

Paul Rubin

Dennis Lee Bieber

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads