Maths error

  • Thread starter Rory Campbell-Lange
  • Start date
R

Rory Campbell-Lange

(1.0/10.0) + (2.0/10.0) + (3.0/10.0)
0.600000000000000090.59999999999999998

Is using the decimal module the best way around this? (I'm expecting the first
sum to match the second). It seem anachronistic that decimal takes strings as
input, though.

Help much appreciated;
Rory
 
B

Bjoern Schliessmann

Rory said:
Is using the decimal module the best way around this? (I'm
expecting the first sum to match the second). It seem
anachronistic that decimal takes strings as input, though.

What's your problem with the result, or what's your goal? Such
precision errors with floating point numbers are normal because the
precision is limited technically.

For floats a and b, you'd seldom say "if a == b:" (because it's
often false as in your case) but rather
"if a - b < threshold:" for a reasonable threshold value which
depends on your application.

Also check the recent thread "bizarre floating point output".

Regards,


Björn
 
G

Gabriel Genellina

Rory said:
Is using the decimal module the best way around this? (I'm
expecting the first sum to match the second). It seem
anachronistic that decimal takes strings as input, though.
[...]
Also check the recent thread "bizarre floating point output".

And the last section on the Python Tutorial "Floating Point
Arithmetic: Issues and Limitations"


--
Gabriel Genellina
Softlab SRL






__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
 
D

Dan Bishop

0.59999999999999998

Is using the decimal module the best way around this? (I'm expecting the first
sum to match the second).

Probably not. Decimal arithmetic is NOT a cure-all for floating-point
arithmetic errors.
Decimal("1.999999999999999999999999999")

It seem anachronistic that decimal takes strings as
input, though.

How else would you distinguish Decimal('0.1') from
Decimal('0.1000000000000000055511151231257827021181583404541015625')?
 
N

Nick Maclaren

|> Rory Campbell-Lange wrote:
|>
|> > Is using the decimal module the best way around this? (I'm
|> > expecting the first sum to match the second). It seem
|> > anachronistic that decimal takes strings as input, though.

As Dan Bishop says, probably not. The introduction to the decimal
module makes exaggerated claims of accuracy, amounting to propaganda.
It is numerically no better than binary, and has some advantages
and some disadvantages.

|> Also check the recent thread "bizarre floating point output".

No, don't. That is about another matter entirely, and will merely
confuse you. I have a course on computer arithmetic, and am just
now writing one on Python numerics, and confused people may contact
me - though I don't guarantee to help.


Regards,
Nick Maclaren.
 
C

Carsten Haese

|> Rory Campbell-Lange wrote:
|>
|> > Is using the decimal module the best way around this? (I'm
|> > expecting the first sum to match the second). It seem
|> > anachronistic that decimal takes strings as input, though.

As Dan Bishop says, probably not. The introduction to the decimal
module makes exaggerated claims of accuracy, amounting to propaganda.
It is numerically no better than binary, and has some advantages
and some disadvantages.

Please elaborate. Which exaggerated claims are made, and how is decimal
no better than binary?

-Carsten
 
T

Tim Peters

[Rory Campbell-Lange]
[Nick Maclaren]
[Carsten Haese]
Please elaborate. Which exaggerated claims are made,

Well, just about any technical statement can be misleading if not qualified
to such an extent that the only people who can still understand it knew it
to begin with <0.8 wink>. The most dubious statement here to my eyes is
the intro's "exactness carries over into arithmetic". It takes a world of
additional words to explain exactly what it is about the example given (0.1
+ 0.1 + 0.1 - 0.3 = 0 exactly in decimal fp, but not in binary fp) that
does, and does not, generalize. Roughly, it does generalize to one
important real-life use-case: adding and subtracting any number of decimal
quantities delivers the exact decimal result, /provided/ that precision is
set high enough that no rounding occurs.
and how is decimal no better than binary?

Basically, they both lose info when rounding does occur. For example,
Decimal("0.9999999999999999999999999999")

That is, (1/3)*3 != 1 in decimal. The reason why is obvious "by eyeball",
but only because you have a lifetime of experience working in base 10. A
bit ironically, the rounding in binary just happens to be such that (1/3)/3
does equal 1:
1.0

It's not just * and /. The real thing at work in the 0.1 + 0.1 + 0.1 - 0.3
example is representation error, not sloppy +/-: 0.1 and 0.3 can't be
/represented/ exactly as binary floats to begin with. Much the same can
happen if you instead you use inputs exactly representable in base 2 but
not in base 10 (and while there are none such if precision is infinite,
precision isn't infinite):
1E-54

The same in binary f.p. is exact, because 1./2**90 is exactly representable
in binary fp:
0.0

If you boost decimal's precision high enough, then this specific example is
also exact using decimal; but with the default precision of 28, 1./2**90
can't be represented exactly in decimal to begin with; e.g.,
Decimal("0.9999999999999999999999999999")

All forms of fp are subject to representation and rounding errors. The
biggest practical difference here is that the `decimal` module is not
subject to representation error for "natural" decimal quantities, provided
precision is set high enough to retain all the input digits. That's worth
something to many apps, and is the whole ball of wax for some apps -- but
leaves a world of possible "surprises" nevertheless.
 
N

Nick Maclaren

|>
|> Well, just about any technical statement can be misleading if not qualified
|> to such an extent that the only people who can still understand it knew it
|> to begin with <0.8 wink>. The most dubious statement here to my eyes is
|> the intro's "exactness carries over into arithmetic". It takes a world of
|> additional words to explain exactly what it is about the example given (0.1
|> + 0.1 + 0.1 - 0.3 = 0 exactly in decimal fp, but not in binary fp) that
|> does, and does not, generalize. Roughly, it does generalize to one
|> important real-life use-case: adding and subtracting any number of decimal
|> quantities delivers the exact decimal result, /provided/ that precision is
|> set high enough that no rounding occurs.

Precisely. There is one other such statement, too: "Decimal numbers can
be represented exactly." What it MEANS is that numbers with a short
representation in decimal can be represented exactly in decimal, which
is tautologous, but many people READ it to say that numbers that they
are interested in can be represented exactly in decimal. Such as pi,
sqrt(2), 1/3 and so on ....

|> > and how is decimal no better than binary?
|>
|> Basically, they both lose info when rounding does occur. For example,

Yes, but there are two ways in which binary is superior. Let's skip
the superior 'smoothness', as being too arcane an issue for this group,
and deal with the other. In binary, calculating the mid-point of two
numbers (a very common operation) is guaranteed to be within the range
defined by those numbers, or to over/under-flow.

Neither (x+y)/2.0 nor (x/2.0+y/2.0) are necessarily within the range
(x,y) in decimal, even for the most respectable values of x and y.
This was a MAJOR "gotcha" in the days before binary became standard,
and will clearly return with decimal.



Regards,
Nick Maclaren.
 
T

Terry Reedy

| On Tue, 2007-01-09 at 11:38 +0000, Nick Maclaren wrote:
| > As Dan Bishop says, probably not. The introduction to the decimal
| > module makes exaggerated claims of accuracy, amounting to propaganda.
| > It is numerically no better than binary, and has some advantages
| > and some disadvantages.
|
| Please elaborate. Which exaggerated claims are made, and how is decimal
| no better than binary?

As to the latter question: calculating with decimals instead of binaries
eliminates conversion errors introduced when one has *exact* decimal
inputs, such as in financial calculations (which were the motivating use
case for the decimal module). But it does not eliminate errors inherent in
approximating reals with (a limited set of) ratrionals. Nor does it
eliminate errors inherent in approximation algorithms (such as using a
finite number of terms of an infinite series.

Terry Jan Reedy
 
S

Simon Brunning

Well, just about any technical statement can be misleading if not qualified
to such an extent that the only people who can still understand it knew it
to begin with <0.8 wink>.

+1 QTOW
 
R

Robert Kern

Bjoern said:
It isn't.

Actually it really is. That thread is about the difference between
str(some_float) and repr(some_float) and why str(some_tuple) uses the repr() of
its elements.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
N

Nick Maclaren

|> >
|> >> No, don't. That is about another matter entirely,
|> >
|> > It isn't.
|>
|> Actually it really is. That thread is about the difference between
|> str(some_float) and repr(some_float) and why str(some_tuple) uses the repr() of
|> its elements.

Precisely. And it also applies to strings, which I had failed to
notice:
1 2


Regards,
Nick Maclaren.
 
T

Tim Peters

[Tim Peters]
....
|> Well, just about any technical statement can be misleading if not
|> qualified to such an extent that the only people who can still
|> understand it knew it to begin with <0.8 wink>. The most dubious
|> statement here to my eyes is the intro's "exactness carries over
|> into arithmetic". It takes a world of additional words to explain
|> exactly what it is about the example given (0.1 + 0.1 + 0.1 - 0.3 =
|> 0 exactly in decimal fp, but not in binary fp) that does, and does
|> not, generalize. Roughly, it does generalize to one important
|> real-life use-case: adding and subtracting any number of decimal
|> quantities delivers the exact decimal result, /provided/ that
|> precision is set high enough that no rounding occurs.

[Nick Maclaren]
Precisely. There is one other such statement, too: "Decimal numbers
can be represented exactly." What it MEANS is that numbers with a
short representation in decimal can be represented exactly in decimal,
which is tautologous, but many people READ it to say that numbers that
they are interested in can be represented exactly in decimal. Such as
pi, sqrt(2), 1/3 and so on ....

Huh. I don't read it that way. If it said "numbers can be ..." I
might, but reading that way seems to requires effort to overlook the
"decimal" in "decimal numbers can be ...".

[attribution lost]
|>> and how is decimal no better than binary?
|> Basically, they both lose info when rounding does occur. For
|> example,
Yes, but there are two ways in which binary is superior. Let's skip
the superior 'smoothness', as being too arcane an issue for this
group,

With 28 decimal digits used by default, few apps would care about this
anyway.
and deal with the other. In binary, calculating the mid-point
of two numbers (a very common operation) is guaranteed to be within
the range defined by those numbers, or to over/under-flow.

Neither (x+y)/2.0 nor (x/2.0+y/2.0) are necessarily within the range
(x,y) in decimal, even for the most respectable values of x and y.
This was a MAJOR "gotcha" in the days before binary became standard,
and will clearly return with decimal.

I view this as being an instance of "lose info when rounding does
occur". For example,
Decimal("1.000000000000000000000000000")

"The problems" there are due to rounding error:
Decimal("0.5000000000000000000000000000")
Decimal("2.000000000000000000000000000")

It's always something ;-)
 
N

Nick Maclaren

|>
|> Huh. I don't read it that way. If it said "numbers can be ..." I
|> might, but reading that way seems to requires effort to overlook the
|> "decimal" in "decimal numbers can be ...".

I wouldn't expect YOU to read it that way, but I can assure you from
experience that many people do. What it MEANS is "Numbers with a
short representation in decimal can be represented exactly in decimal
arithmetic", which is tautologous. What they READ it to mean is
"One advantage of representing numbers in decimal is that they can be
represented exactly", and they then assume that also applies to pi,
sqrt(2), 1/3 ....

The point is that the "decimal" could apply equally well to the external
or internal representation and, if you aren't fairly clued-up in this
area, it is easy to choose the wrong one.

|> >|>> and how is decimal no better than binary?
|>
|> >|> Basically, they both lose info when rounding does occur. For
|> >|> example,
|>
|> > Yes, but there are two ways in which binary is superior. Let's skip
|> > the superior 'smoothness', as being too arcane an issue for this
|> > group,
|>
|> With 28 decimal digits used by default, few apps would care about this
|> anyway.

Were you in the computer arithmetic area during the "base wars" of the
1960s and 1970s that culminated with binary winning out? A lot of very
well-respected numerical analysts said that larger bases led to a
faster build-up of error (independent of the precision). My limited
investigations indicated that there was SOME truth in that, but it
wasn't a major matter; I never say the matter settled definitively.

|> > and deal with the other. In binary, calculating the mid-point
|> > of two numbers (a very common operation) is guaranteed to be within
|> > the range defined by those numbers, or to over/under-flow.
|> >
|> > Neither (x+y)/2.0 nor (x/2.0+y/2.0) are necessarily within the range
|> > (x,y) in decimal, even for the most respectable values of x and y.
|> > This was a MAJOR "gotcha" in the days before binary became standard,
|> > and will clearly return with decimal.
|>
|> I view this as being an instance of "lose info when rounding does
|> occur". For example,

No, absolutely NOT! This is an orthogonal matter, and is about the
loss of an important invariant when using any base above 2.

Back in the days when there were multiple bases, virtually every
programmer who wrote large numerical code got caught by it at least
once, and many got caught several times (it has multiple guises).
For example, take the following algorithm for binary chop:

while 1 :
c = (a+b)/2
if f(x) < y :
if c == b :
break
b = c
else :
if c == a :
break
a = c

That works in binary, but in no base above 2 (assuming that I haven't
made a stupid error writing it down). In THAT case, it is easy to fix
for decimal, but there are ways that it can show up that can be quite
tricky to fix.


Regards,
Nick Maclaren.
 
T

Tim Peters

[Tim Peters]
....
[Nick Maclaren]
I wouldn't expect YOU to read it that way,

Of course I meant "putting myself in others' shoes, I don't ...".
but I can assure you from experience that many people do.

Sure. Possibly even most. Short of writing a long & gentle tutorial,
can that be improved? Alas, most people wouldn't read that either <0.5
wink>.
What it MEANS is "Numbers with a short representation in decimal

"short" is a red herring here: Python's Decimal constructor ignores the
precision setting, retaining all the digits you give. For example, if
you pass a string with a million decimal digits, you'll end up with a
very fat Decimal instance -- no info is lost.
can be represented exactly in decimal arithmetic", which is
tautologous. What they READ it to mean is "One advantage of
representing numbers in decimal is that they can be represented
exactly", and they then assume that also applies to pi, sqrt(2),
1/3 ....

The point is that the "decimal" could apply equally well to the
external or internal representation and, if you aren't fairly
clued-up in this area, it is easy to choose the wrong one.

Worse, I expect most people have no real idea of that there's a possible
difference between internal and external representations. This is often
given as a selling point for decimal arithmetic: it's WYSIWYG in ways
binary fp can't be (short of inventing power-of-2 fp representations for
I/O, which few people would use).

[attribution lost]
[Tim][Nick]
With 28 decimal digits used by default, few apps would care about
this anyway.
Were you in the computer arithmetic area during the "base wars" of the
1960s and 1970s that culminated with binary winning out?

Yes, although I came in on the tail end of that and never actually used
a non-binary machine.
A lot of very well-respected numerical analysts said that larger bases
led to a faster build-up of error (independent of the precision). My
limited investigations indicated that there was SOME truth in that,
but it wasn't a major matter; I never say the matter settled
definitively.

My point was that 28 decimal digits of precision is far greater than
supplied even by 64-bit binary floats today (let alone the smaller sizes
in most-common use back in the 60s and 70s). "Pollution" of low-order
bits is far less of a real concern when there are some number of low-
order bits you don't care about at all.
No, absolutely NOT!

Of course it is. If there were no rounding errors, the computed result
would be exactly right -- that's darned near tautological too. You
snipped the examples I gave showing exactly where and how rounding error
created the problems in (x+y)/2 and x/2+y/2 for some specific values of
x and y using decimal arithmetic. If you don't like those examples,
supply your own, and if you get a similarly surprising result you'll
find rounding error(s) occur(s) in yours too.

It so happens that rounding errors in binary fp can't lead to the same
counterintuitive /outcome/, essentially because x+x == y+y implies x ==
y in base 2 fp, which is indeed a bit of magic specific to base 2. The
fact that there /do/ exist fp x and y such that x != y yet x+x == y+y in
bases > 2 is entirely due to fp rounding error losing info.
This is an orthogonal matter,
Disagree.

and is about the loss of an important invariant when using any base
above 2.

It is that.
Back in the days when there were multiple bases, virtually every
programmer who wrote large numerical code got caught by it at least
once, and many got caught several times (it has multiple guises).
For example, take the following algorithm for binary chop:

while 1 :
c = (a+b)/2
if f(x) < y :
if c == b :
break
b = c
else :
if c == a :
break
a = c

That works in binary, but in no base above 2 (assuming that I haven't
made a stupid error writing it down). In THAT case, it is easy to fix
for decimal, but there are ways that it can show up that can be quite
tricky to fix.

If you know a < b, doing

c = a + (b-a)/2

instead of

c = (a+b)/2

at least guarantees (ignoring possible overflow) a <= c <= b. As shown
last time, it's not even always the case that (x+x)/2 == x in decimal fp
(or in any fp base > 2, for that matter).
 
N

Nick Maclaren

|>
|> Sure. Possibly even most. Short of writing a long & gentle tutorial,
|> can that be improved? Alas, most people wouldn't read that either <0.5
|> wink>.

Yes. Improved wording would be only slightly longer, and it is never
appropriate to omit all negative aspects. The truth, the whole truth
and nothing but the truth :)

|> Worse, I expect most people have no real idea of that there's a possible
|> difference between internal and external representations. This is often
|> given as a selling point for decimal arithmetic: it's WYSIWYG in ways
|> binary fp can't be (short of inventing power-of-2 fp representations for
|> I/O, which few people would use).

Right. Another case when none of the problems show up on dinky little
examples but do in real code :-(

|> > A lot of very well-respected numerical analysts said that larger bases
|> > led to a faster build-up of error (independent of the precision). My
|> > limited investigations indicated that there was SOME truth in that,
|> > but it wasn't a major matter; I never say the matter settled
|> > definitively.
|>
|> My point was that 28 decimal digits of precision is far greater than
|> supplied even by 64-bit binary floats today (let alone the smaller sizes
|> in most-common use back in the 60s and 70s). "Pollution" of low-order
|> bits is far less of a real concern when there are some number of low-
|> order bits you don't care about at all.

Yes, but that wasn't their point. It was that in (say) iterative
algorithms, the error builds up by a factor of the base at every step.
If it wasn't for the fact that errors build up, almost all programs
could ignore numerical analysis and still get reliable answers!

Actually, my (limited) investigations indicated that such an error
build-up was extremely rare - I could achieve it only in VERY artificial
programs. But I did find that the errors built up faster for higher
bases, so that a reasonable rule of thumb is that 28 digits with a decimal
base was comparable to (say) 80 bits with a binary base.

And, IN GENERAL, programs won't be using 128-bit IEEE representations.
Given Python's overheads, there is no reason not to, unless the hardware
is catastrophically slower (which is plausible).

|> If you know a < b, doing
|>
|> c = a + (b-a)/2
|>
|> instead of
|>
|> c = (a+b)/2
|>
|> at least guarantees (ignoring possible overflow) a <= c <= b. As shown
|> last time, it's not even always the case that (x+x)/2 == x in decimal fp
|> (or in any fp base > 2, for that matter).

Yes. Back in the days before binary floating-point started to dominate,
we taught that as a matter of routine, but it has not been taught to
all users of floating-point for a couple of decades. Indeed, a lot of
modern programmers regard having to distort simple expressions in that
way as anathema.

It isn't a major issue, because our experience from then is that it is
both teachable and practical, but it IS a way in which any base above
2 is significantly worse than base 2.


Regards,
Nick Maclaren.
 
H

Hendrik van Rooyen

Nick Maclaren said:
Yes, but that wasn't their point. It was that in (say) iterative
algorithms, the error builds up by a factor of the base at every step.
If it wasn't for the fact that errors build up, almost all programs
could ignore numerical analysis and still get reliable answers!

Actually, my (limited) investigations indicated that such an error
build-up was extremely rare - I could achieve it only in VERY artificial
programs. But I did find that the errors built up faster for higher
bases, so that a reasonable rule of thumb is that 28 digits with a decimal
base was comparable to (say) 80 bits with a binary base.

I would have thought that this sort of thing was a natural consequence
of rounding errors - if I round (or worse truncate) a binary, I can be off
by at most one, with an expectation of a half of a least significant digit,
while if I use hex digits, my expectation is around eight, and for decimal
around five...

So it would seem natural that errors would propagate
faster on big base systems, AOTBE, but this may be
a naive view..

- Hendrik
 
N

Nick Maclaren

|>
|> I would have thought that this sort of thing was a natural consequence
|> of rounding errors - if I round (or worse truncate) a binary, I can be off
|> by at most one, with an expectation of a half of a least significant digit,
|> while if I use hex digits, my expectation is around eight, and for decimal
|> around five...
|>
|> So it would seem natural that errors would propagate
|> faster on big base systems, AOTBE, but this may be
|> a naive view..

Yes, indeed, and that is precisely why the "we must use binary" camp won
out. The problem was that computers of the early 1970s were not quite
powerful enough to run real applications with simulated floating-point
arithmetic. I am one of the half-dozen people who did ANY actual tests
on real numerical code, but there may have been some work since!

Nowadays, it would be easy, and it would make quite a good PhD. The
points to look at would be the base and the rounding rules (including
IEEE rounding versus probabilistic versus last bit forced[*]). We know
that the use or not of denormalised numbers and the exact details of
true rounding make essentially no difference.

In a world ruled by reason rather than spin, this investigation
would have been done before claiming that decimal floating-point is an
adequate replacement for binary for numerical work, but we don't live
in such a world. No matter. Almost everyone in the area agrees that
decimal floating-point isn't MUCH worse than binary, from a numerical
point of view :)


[*] Assuming signed magnitude, calculate the answer truncated towards
zero but keep track of whether it is exact. If not, force the last
bit to 1. An old, cheap approximation to rounding.


Regards,
Nick Maclaren.
 
H

Hendrik van Rooyen

Nick Maclaren said:
|>
|> I would have thought that this sort of thing was a natural consequence
|> of rounding errors - if I round (or worse truncate) a binary, I can be off
|> by at most one, with an expectation of a half of a least significant digit,
|> while if I use hex digits, my expectation is around eight, and for decimal
|> around five...
|>
|> So it would seem natural that errors would propagate
|> faster on big base systems, AOTBE, but this may be
|> a naive view..

Yes, indeed, and that is precisely why the "we must use binary" camp won
out. The problem was that computers of the early 1970s were not quite
powerful enough to run real applications with simulated floating-point
arithmetic. I am one of the half-dozen people who did ANY actual tests
on real numerical code, but there may have been some work since!

*grin* - I was around at that time, and some of the inappropriate habits
almost forced by the lack of processing power still linger in my mind,
like - "Don't use division if you can possibly avoid it, - its EXPENSIVE!"
- it seems so silly nowadays.
Nowadays, it would be easy, and it would make quite a good PhD. The
points to look at would be the base and the rounding rules (including
IEEE rounding versus probabilistic versus last bit forced[*]). We know
that the use or not of denormalised numbers and the exact details of
true rounding make essentially no difference.

In a world ruled by reason rather than spin, this investigation
would have been done before claiming that decimal floating-point is an
adequate replacement for binary for numerical work, but we don't live
in such a world. No matter. Almost everyone in the area agrees that
decimal floating-point isn't MUCH worse than binary, from a numerical
point of view :)

As an old slide rule user - I can agree with this - if you know the order
of the answer, and maybe two points after the decimal, it will tell you
if the bridge will fall down or not. Having an additional fifty decimal
places of accuracy does not really add any real information in these
cases. Its nice of course if its free, like it has almost become - but
I think people get mesmerized by the numbers, without giving any
thought to what they mean - which is probably why we often see
threads complaining about the "error" in the fifteenth decimal place..
[*] Assuming signed magnitude, calculate the answer truncated towards
zero but keep track of whether it is exact. If not, force the last
bit to 1. An old, cheap approximation to rounding.
This is not so cheap - its good solid reasoning in my book -
after all, "something" is a lot more than "nothing" and should
not be thrown away...

- Hendrik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top