Floating point bug?

Lie · Mar 2, 2008

Well I think the idea is to have a hierarchy of nested numeric types,
not a single type.

You hit the right note, but what I meant is the numeric type
unification would make it _appear_ to consist of a single numeric type
(yeah, I know it isn't actually, but what appears from outside isn't
always what's inside).

Try with a=7, b=25

They should still compare true, but they don't. The reason why they
don't is because of float's finite precision, which is not exactly
what we're talking here since it doesn't change the fact that
multiplication and division are inverse of each other. One way to
handle this situation is to do an epsilon aware comparison (as should
be done with any comparison involving floats), but I don't do it cause
my intention is to clarify the real problem that multiplication is
indeed inverse of division and I want to avoid obscuring that with the
epsilon comparison.

Paul Rubin · Mar 2, 2008

Lie said:
You hit the right note, but what I meant is the numeric type
unification would make it _appear_ to consist of a single numeric type
(yeah, I know it isn't actually, but what appears from outside isn't
always what's inside).

That is clearly not intended; floats and decimals and integers are
really different from each other and Python has to treat them distinctly.

They should still compare true, but they don't. The reason why they
don't is because of float's finite precision, which is not exactly
what we're talking here since it doesn't change the fact that
multiplication and division are inverse of each other.

What? Obviously they are not exact inverses for floats, as that test
shows. They would be inverses for mathematical reals or rationals,
but Python does not have those.

One way to handle this situation is to do an epsilon aware
comparison (as should be done with any comparison involving floats),
but I don't do it cause my intention is to clarify the real problem
that multiplication is indeed inverse of division and I want to
avoid obscuring that with the epsilon comparison.

I think you are a bit confused. That epsilon aware comparison thing
acknowledges that floats only approximate the behavior of mathematical
reals. When we do float arithmetic, we accept that "equal" often
really only means "approximately equal". But when we do integer
arithmetic, we do not expect or accept equality as being approximate.
Integer equality means equal, not approximately equal. That is why
int and float arithmetic cannot work the same way.

Jeff Schwab · Mar 2, 2008

Paul said:
I can live with int/int=float but
find it sloppy and would be happier if int/int always threw an error
(convert explicitly if you want a particular type result).

Better yet, how hard would it be to define an otherwise int-like type
that did not define a non-flooring division operator? Are there any
real use cases for such a type? Maybe a division operator could be
defined to perform a run-time check that, for an operation n/d==q,
n==q*d; else, throw an exception. Code written to support duck-typed
integers should work with such a UDT "out of the box."

Mensanator · Mar 2, 2008

Not sure if this is common knowledge yet but Sympy,http://code.google.com/p/sympy, has a rational type.

I hadn't heard of this before, thanks for the link.

Very nifty, lots of goodies not found in gmpy (although
it seems to lack a modular inverse function and the linear
congruence solver that can be derived from it making it
of no value to me).

Alas, it's written in Python. Who writes a math library
in Python?

Nevertheless, I thought I would try out the Rational numbers.

Once I figured out how to use them, I converted my Polynomial
Finder by Newton's Forward Difference Method program to use
sympy instead of gmpy.

I have a test case where I create 1 66 degree polynomial where
the coefficients are large rationals. The polynomial was
calculated flawlessly

<partial output>
## sympy
## Term0: [66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66,
-66, 66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66, -66,
66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66,
-66, 66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66, -66,
66, -66, 66, -66, 66, -66, 66, -66, 66, -66, 66, 0]
##
## Seq: [66, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 66]
##
## The Polynomial:
##
##
## 1
##
-------------------------------------------------------------------------------------------
* n**66
##
8247650592082470666723170306785496252186258551345437492922123134388955774976000000000000000
##
##
## -67
##
------------------------------------------------------------------------------------------
* n**65
##
249928805820680929294641524448045340975341168222589014937034034375422902272000000000000000
##
##
## 67
##
---------------------------------------------------------------------------------------
* n**64
##
230703513065243934733515253336657237823391847590082167634185262500390371328000000000000

But because they are calculated using Python,
it took 175 seconds compared to 0.2 seconds
for gmpy to do the same polynomial.

So, I'll keep it around for it's neat features
that gmpy doesn't have, but it won't replace gmpy
for any serious work.

In [2]: from sympy import *

In [3]: Rational(21,4)
Out[3]: 21/4

In [4]: Rational(21,4)+Rational(3,4)
Out[4]: 6

Paul Rubin · Mar 2, 2008

Jeff Schwab said:
Better yet, how hard would it be to define an otherwise int-like type
that did not define a non-flooring division operator? Are there any
real use cases for such a type?

User defined types in python are fairly heavyweight compared with the
built-in types, and a type like that is just another thing for the
user to have to remember.

The C library has a bunch of different types like off_t (offset in a
file) and size_t, so if you pass an off_t to a function that expects a
size_t as that arg, the compiler notices the error. But they are
really just integers and they compile with no runtime overhead.

So, I think Python won't easily support lots of different types of
integers, and we've got what we've got.

There's an interesting talk linked from LTU about future languages:

http://lambda-the-ultimate.org/node/1277

Jeff Schwab · Mar 2, 2008

Paul said:
User defined types in python are fairly heavyweight compared with the
built-in types,

Yet they continue to form the basis of almost all non-trivial Python
programs. Anyway, it's a bit soon to be optimizing.

and a type like that is just another thing for the
user to have to remember.

How so? A well-written function generally shouldn't depending on the
exact types of its arguments, anyway. If someone has written a function
to find (e.g.) the median of a collection of numbers, their code should
already be prepared to accept values of user-defined numeric types. If
I want to call such a function with my hand-rolled DivisionSafeInteger
type, it should just work, unless specifically documented to work only
with a particular subset of Python data types.

The C library has a bunch of different types like off_t (offset in a

off_t is vendor-specific extension, not part of the standard C library.
In gcc, it's a typedef for __off_t, which is a macro for _G_off_t,
which is in turn a macro for a compiler-specific type called _I

ff_t.
Elsewhere, it may just be a typedef of long int.

file) and size_t, so if you pass an off_t to a function that expects a
size_t as that arg, the compiler notices the error.

On what compiler? I've never seen a C compiler that would mind any kind
of calculation involving two native, unsigned types.

$ cat main.c
#include <sys/types.h>
int main() { off_t ot = 0; long li = 3L; ot = li; }
$ make
cc -ansi -pedantic -Wall -std=c99 main.c -o main
$

But they are
really just integers and they compile with no runtime overhead.

They do indeed have run-time overhead, as opposed to (e.g.) meta-types
whose operations are performed at compile-time. If you mean they have
less overhead than types whose operations perform run-time checks, then
yes, of course that's true. You specifically stated (then snipped) that
you "would be happier if int/int always threw an error." The beauty of
a language with such extensive support for user-defined types that can
be used like built-in type is that you are free to define types that
meet your needs. The weight of such hand-rolled solutions may lead to
performance problems at first, but C-linkable extensions go a long way
to help; hence numpy et al.

So, I think Python won't easily support lots of different types of
integers, and we've got what we've got.

My understanding is that Python will easily support lots of different
types of just about anything. That's the point. In theory at least, it
supports programming in a way that lets the translator (compiler +
interpreter) keep track of the exact types being used, so that the
programmer doesn't have to. The fact that the tracking is done
dynamically, rather than statically, is a fundamental design decision
that was made early in the language's development.

There's an interesting talk linked from LTU about future languages:

http://lambda-the-ultimate.org/node/1277

Thanks, but that just seems to have links to the slides. Is there a
written article, or a video of Mr. Sweeney's talk?

Paul Rubin · Mar 3, 2008

Jeff Schwab said:
Yet they continue to form the basis of almost all non-trivial Python
programs. Anyway, it's a bit soon to be optimizing.

Large python programs usually have some classes for complex data
structures, but it's not typical Pythonic practice to define new
classes for things as small as integers.

How so? A well-written function generally shouldn't depending on the
exact types of its arguments, anyway.

By "another thing to remember" I mean doing the right thing should
happen with the normal integers that result from writing literals like
1 and 2, without resorting to a nonstandard user defined type.

If someone has written a
function to find (e.g.) the median of a collection of numbers, their
code should already be prepared to accept values of user-defined
numeric types.

It's important to be able to write such generic or polymorphic
functions, but most typical functions are monomorphic.

If I want to call such a function with my hand-rolled
DivisionSafeInteger type, it should just work,

Sure, however, the Pythonic approach is to make the defaults do the
right thing without requiring such user-written workarounds. Of course
there is occasional unclarity about what the right thing is.

On what compiler? I've never seen a C compiler that would mind any
kind of calculation involving two native, unsigned types.

You are right, C is even worse than I remembered.

They do indeed have run-time overhead, as opposed to (e.g.) meta-types
whose operations are performed at compile-time.

Not sure what you mean; by "no runtime overhead" I just mean they
compile to the same code as regular ints, no runtime checks. OK, it
turns out that for all intents and purposes it looks like they ARE
regular ints even at compile time, but in other languages it's not
like that.

If you mean they have less overhead than types whose operations
perform run-time checks, then yes, of course that's true. You
specifically stated (then snipped) that you "would be happier if
int/int always threw an error." The beauty of a language with such
extensive support for user-defined types that can be used like
built-in type is that you are free to define types that meet your
needs.

But those are not ints then. We're discussing an issue of language
design, which is what the behavior of the ordinary, standard, default
ints should be. My reason for suggesting int/int->error is that I
think it would increase program reliability in general. But that is
only if it applies to all the ints by default, with int/int=float
being a possible result of a nonstandard user-defined type. From the
zen list: "In the face of ambiguity, refuse the temptation to guess."

My understanding is that Python will easily support lots of different
types of just about anything. That's the point.

No I don't think so. Also from the zen list: "There should be one--
and preferably only one --obvious way to do it."

Thanks, but that just seems to have links to the slides. Is there a
written article, or a video of Mr. Sweeney's talk?

I don't think there's an article. There might be video somewhere. I
thought the slides were enough to get the ideas across so I didn't
have much interest in sitting through a video.

Arnaud Delobelle · Mar 3, 2008

On Mar 3 said:
You are right, C is even worse than I remembered.

It's good enough to be the language used for the reference
implementation of python

[...]

Lie · Mar 3, 2008

That is clearly not intended; floats and decimals and integers are
really different from each other and Python has to treat them distinctly.

In certain operations it would, such as:
a = Decimal('32.324')
b = 90.3453
c = 43
d = a + b + c <<< this should work without manual type casting

This behavior is what is intended in numeric type unification, floats
and decimals and integers should work together flawlessly without the
need for manual type casting. This gives the _impression_ of a single
numeric type (although programmers should still be aware that the
underlying type still exists). It's true Python have to treat them
differently, but programmers would be able to treat them all the same
(at least in most parts)

What? Obviously they are not exact inverses for floats, as that test
shows. They would be inverses for mathematical reals or rationals,
but Python does not have those.

When I said multiplication and division are inverse, I was pointing
out the fact that even though float's inexactness make them imperfect
inverse, mult & div are still inverse of each other. In-practice, the
inversing behavior is impossible unless we have a way to represent
real number (which we don't), and the *workaround* to make them work
is to do epsilon comparison.

When I'm talking about things I usually talk in purely theoretical
condition first and considers practical implementations that doesn't
work that way as making up a workaround inside their limitations. In
this case, the theoretical condition is that multiplication and
division is inverse of each other. The practical consideration is
float is inexact and reals is impossible, and thus epsilon comparison
is necessary to walk around float's limitations so multiplication and
division could still be inverses.

Aside: Python would have rationals

I think you are a bit confused. That epsilon aware comparison thing
acknowledges that floats only approximate the behavior of mathematical
reals.

Yes, I realized that floats aren't the same as reals.

When we do float arithmetic, we accept that "equal" often
really only means "approximately equal". But when we do integer
arithmetic, we do not expect or accept equality as being approximate.
Integer equality means equal, not approximately equal. That is why
int and float arithmetic cannot work the same way.

No, no, they don't work the same way, but they should appear to work
the same way as reals in pure mathematics do. Again, I'm talking in
theory first: ints and floats should work the same way, but since
practical considerations make them impossible, then they should at
least appear to work the same way (or they would have become
completely different things, remember duck typing?).

NickC · Mar 4, 2008

When I said multiplication and division are inverse, I was pointing
out the fact that even though float's inexactness make them imperfect
inverse, mult & div are still inverse of each other. In-practice, the
inversing behavior is impossible unless we have a way to represent
real number (which we don't), and the *workaround* to make them work
is to do epsilon comparison.

A mildly interesting Py3k experiment:

Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information..... wrong = 0
.... for x in range(1, max_val):
.... for y in range(1, max_val):
.... wrong += (x / num_type(y)) * y != x
.... return wrong
....0

The conclusions I came to based on running that experiment are:
- Decimal actually appears to suffer more rounding problems than float
for rational arithmetic
- Decimal appears to be significantly slower than Fraction for small
denominator rational arithmetic
- both Decimal and Fraction are significantly slower than builtin
floats

The increased number of inaccurate answers with Decimal (31% vs 10%)
is probably due to the fact that it is actually more precise than
float - for the builtin floats, the rounding error in the division
step may be cancelled out by a further rounding error in the
multiplication step (this can be seen happening in the case of ((1 /
3.0) * 3) == 1.0, where the result of the multiplication ends up being
1.0 despite the rounding error on division, due to the next smallest
floating value being 0.99999999999999989).

The speed difference between Decimal and Fraction is likely due to the
fact that Fraction can avoid actually doing any division most of the
time - it does addition and multiplication instead. The main reason
behind the overall speed advantage of builtin floats should hopefully
be obvious

Regardless, the result of integer division is going to be a binary
floating point value in Py3k. For cases where that isn't adequate or
acceptable, the application should really be tightly controlling its
numeric types anyway and probably using a high performance math
library like numpy or gmpy instead of the standard numeric types (as
others have already noted in this thread).

Mark Dickinson · Mar 4, 2008

The increased number of inaccurate answers with Decimal (31% vs 10%)
is probably due to the fact that it is actually more precise than
float

I suspect it has more to do with the fact that 10 is bigger than 2,
though I'm not sure I could precisely articulate the reasons why
this matters. (A bigger base means a bigger 'wobble', the wobble
being the variation in the relationship between an error of 1ulp and
a relative error of base**-precision.)

Rerunning your example after a

getcontext().prec = 16

(i.e. with precision comparable to that of float) gives
310176

Mark

Gabriel Genellina · Mar 4, 2008

En Tue said:
A mildly interesting Py3k experiment:

Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.... wrong = 0
... for x in range(1, max_val):
... for y in range(1, max_val):
... wrong += (x / num_type(y)) * y != x
... return wrong
...0

The conclusions I came to based on running that experiment are:
- Decimal actually appears to suffer more rounding problems than float
for rational arithmetic

Mmm, but I doubt that counting how many times the results are equal, is
the right way to evaluate "accuracy".
A stopped clock shows the right time twice a day; a clock that loses one
minute per day shows the right time once every two years. Clearly the
stopped clock is much better!
http://mybanyantree.wordpress.com/category/lewis-carrol/

Mark Dickinson · Mar 4, 2008

I suspect it has more to do with the fact that 10 is bigger than 2,
though I'm not sure I could precisely articulate the reasons why
this matters.

Indeed, a quick lunchtime back-of-the-envelope calculation suggests
that precision is largely irrelevant: it's the base that matters.
For randomly chosen(*) base B floats x and y, the probability that
(x/y)*y == x is approximately given by

1/2 + 1/log(B) - 1/log(B)**2 + 1/B/log(B)**2

For Decimal, this gives a probability of:
0.76454395459279922

while for randomly chosen floats it's the same thing with all 10s
replaced by 2s:
0.90201055038615952

(*) Here, randomly chosen means I'm assuming that log(x) and log(y)
are independent and uniformly distributed over some largish
interval. Maybe not the most obvious definition of random, but
not unreasonable. (And it makes the analysis easier!)

A quick check, for floats:

#####
from __future__ import division
from random import random
from math import exp

def random_float():
return exp(random()*400.-200.)

def test():
x = random_float()
y = random_float()
return (x/y)*y == x

print sum(test() for i in xrange(10**8))/10**8
#####

produces (eventually):

0.90199129

Mark

Paul Rubin · Mar 4, 2008

Mark Dickinson said:
For randomly chosen(*) base B floats x and y, the probability that
(x/y)*y == x is approximately given by

I remember hearing that IEEE 754 was carefully designed to make this
identity hold as often as possible when y is a small integer.

Piet van Oostrum · Mar 12, 2008

Gabriel Genellina said:
GG> En Tue said:

A mildly interesting Py3k experiment:

Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
from fractions import Fraction
from decimal import Decimal
def check_accuracy(num_type, max_val=1000):
... wrong = 0
... for x in range(1, max_val):
... for y in range(1, max_val):
... wrong += (x / num_type(y)) * y != x
... return wrong
...
check_accuracy(float)
101502
check_accuracy(Decimal)
310013
check_accuracy(Fraction)
0

The conclusions I came to based on running that experiment are:
- Decimal actually appears to suffer more rounding problems than float
for rational arithmetic

Click to expand...

Click to expand...

GG> Mmm, but I doubt that counting how many times the results are equal, is
GG> the right way to evaluate "accuracy".
GG> A stopped clock shows the right time twice a day; a clock that loses one
GG> minute per day shows the right time once every two years. Clearly the
GG> stopped clock is much better!

But if the answer is incorrect (in the float calculation) the error is
limited. IEEE 754 prescribes that the error should be at most 1 LSB, IIRC.
And then the number of errors is the proper measure.

Mark Dickinson · Mar 12, 2008

But if the answer is incorrect (in the float calculation) the error is
limited. IEEE 754 prescribes that the error should be at most 1 LSB, IIRC.
And then the number of errors is the proper measure.

There are two operations here, both of which can introduce error of
up to 0.5 ulp (ulp = unit in the last place). Moreover, the second
operation (the multiplication) can magnify the error introduced by
the first (the division).

You're correct that for IEEE 754 binary floating-point arithmetic,
(x/y)*y and x will either be equal or differ by exactly 1ulp (except
perhaps in rarely-occurring corner cases). But for decimal numbers,
(x/y)*y and x could be as much as 5 ulps apart.

Mark

print - bug or feature - concatenated format strings in a printstatement	7	Mar 16, 2009
read() does not read whole file in activepython/DOS	1	Nov 7, 2008
problem with 'global'	7	Jan 20, 2008
Help building GUI with Tix	1	Jul 3, 2007
Bug when using with_statement with exec	1	Jul 14, 2008
Surprise with special floating point values	3	Nov 29, 2006
print 'hello' -> SyntaxError: invalid syntax	3	Feb 7, 2008
numpy: handling float('NaN') different in XP vs. Linux	6	Jun 13, 2008

Floating point bug?

Lie

Paul Rubin

Jeff Schwab

Mensanator

Paul Rubin

Jeff Schwab

Paul Rubin

Arnaud Delobelle

Lie

NickC

Mark Dickinson

Gabriel Genellina

Mark Dickinson

Paul Rubin

Piet van Oostrum

Mark Dickinson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads