# Bug in Decimal??

G

#### Guest

Hello,
I believe I found a bug in the Decimal library. The natural logarithm results seem to be off in a certain range, as compared with Wolfram Alpha.

Here's an example:

from decimal import *
getcontext().prec=2016
one=Decimal(1)
number=Decimal('1e-1007')

partial=(one+number)/(one-number)

final.ln()

The result should be 2.00000... with all zeroes and 7 at the end. Instead, I'm getting 1.999999.. with 2 digits different at the end.

I checked that this happens for any 1e-N with N relatively large (above about 300 it starts showing bad digits at the end.
Set the precision even higher and you'll see more digits off, it seems for number=10^-N, the error begins around 10^(-2N). The partial result (1+x)/(1-x) is correct to the number of digits specified, so it seems ln() is having accuracy problems.

Can other people confirm this or is it just my imagination? I tested with python 2.7.5 in XUbuntu, and also with my own build of mpdecimal 2.4.0 (latest, built from source). I compared the results with wolfram Alpha, and alsowith an open source arbitrary precision calculator, which matches Alpha results.

S

#### Steven D'Aprano

from decimal import *
getcontext().prec=2016
one=Decimal(1)
number=Decimal('1e-1007')
partial=(one+number)/(one-number)
final.ln()

What's final? Did you mean partial?

When I try it in Python 3.3, I get:

py> from decimal import *
py> getcontext().prec=2016
py> one = Decimal(1)
py> number = Decimal('1e-1007')
py> partial = (one+number)/(one-number)
py> partial.ln()
Decimal('1.9999[...]9999987E-1007')

with a lot of 9s deleted for brevity.
The result should be 2.00000... with all zeroes and 7 at the end.

Can you demonstrate exactly how you tested it on Wolfram-Alpha, because I
cannot reproduce that. When I try it with ln((1+1e-1007)/(1-1e-1007)),
the decimal expansion shown starts off as:

2.00000000000000000000000000000000000000000000000000000... Ã— 10^-1007

By repeatedly clicking "More digits", I get:

2.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...
Ã— 10^-1007

For brevity, I will truncate some repeated digits from this point on.

1.99999999999999999[...]999999999999999999999999999999999999999999999...
Ã— 10^-1007

as above, except with even more nines (477 of them if I have calculated
correctly)

2.00000000000000000[...]0000000000000000000000000000000000000000000000...
Ã— 10^-1007

as above, with even more zeroes

and finally the last available approximation is

2.000000[...]000000666666[...]666666... Ã— 10^-1007

with a lot of zeroes and sixes not shown. But not ending in 7.

Instead, I'm getting 1.999999.. with 2 digits different at the end.

Well, it's hard to say exactly what's going on, since Wolfram Alpha
doesn't show the context, but it looks like Python's version may agree
with two out of the seven decimal approximations Alpha makes available.
This suggests to me that it's a representation issue.

[...]
Can other people confirm this or is it just my imagination? I tested
with python 2.7.5 in XUbuntu, and also with my own build of mpdecimal
2.4.0 (latest, built from source). I compared the results with wolfram
Alpha, and also with an open source arbitrary precision calculator,
which matches Alpha results.

I think it's worth raising a ticket for it on the bug tracker.

I think it would help if you can provide a link or instructions to how to
replicate this in Alpha, also give the name and version number of the
arbitrary precision calculator you used. Anything that can be used to

G

#### Gregory Ewing

I compared the results with wolfram Alpha, and
also with an open source arbitrary precision calculator, which matches Alpha
results.

Decimal is *not* an arbitrary precision data type, so you
can't expect exact results from it. You can set the precision
to be very large, but it's still essentially a floating point
type, and as such you can expect the last few digits to
depend heavily on the details of the algorithm used.

G

#### Guest

What's final? Did you mean partial?

Final is my final result: ln((1+x)/(1-x))
I called the quotient partial because I wanted to investigate if perhaps the result of the division (that's my partial result) had a rounding error, which would throw off the ln().

When I try it in Python 3.3, I get:

py> from decimal import *

py> getcontext().prec=2016

py> one = Decimal(1)

py> number = Decimal('1e-1007')

py> partial = (one+number)/(one-number)

py> partial.ln()

Decimal('1.9999[...]9999987E-1007')

with a lot of 9s deleted for brevity.
The result should be 2.00000... with all zeroes and 7 at the end.

Can you demonstrate exactly how you tested it on Wolfram-Alpha, because I

cannot reproduce that. When I try it with ln((1+1e-1007)/(1-1e-1007)),

That's exactly how you test it.
The result should be 2.0000... 7 (at 2016 digits precision, at higher precisions it will continue adding another 2014 sixes until the digits become something else.

This is from the definition of the yperbolic arc tangent:

atanh(x) = 1/2 * ln ( (1+x)/(1-x) )

As it turns out to be the ln() part of 1e-N is the integer 2, followed by 2N zeroes, then followed by 2N-1 sixes, then a 0, a seven, then another group of around 2N sixes (I didn't actually count them), then other interestingdigit patterns.
Try with 1e-1007 using 4000 digits or more and you'll see python giving a better result... but there's a 4 in the middle of the sixes! That 4 shouldn't be there.
getcontext().prec=4000, then do the rest the same (you need to recalculate the partial result at the higher precision too).
the decimal expansion shown starts off as:

2.00000000000000000000000000000000000000000000000000000... × 10^-1007

By repeatedly clicking "More digits", I get:

2.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...

× 10^-1007

For brevity, I will truncate some repeated digits from this point on.

1.99999999999999999[...]999999999999999999999999999999999999999999999...

× 10^-1007

as above, except with even more nines (477 of them if I have calculated

correctly)

You're correct! It seems Alpha has its own problems too. Depending on the precision, it's giving the correct or incorrect result, changing as you hit 'more digits'.
I can imagine this could happen because the (1+x)/(1-x) ratio becomes 1. followed by 2N zeroes, then a 2, then 2N zeroes, then another 2, etc.
If the precision is such that your last digit just includes the next 2 or not, it can throw the result off a little bit. But this should self-correct as the precision goes higher, as it does in Wolfram Alpha.
But python is giving bad data even with very high precision, as I mentionedabove with 4000 digits, there's a bad digit in the middle of the sixes, atmuch less precision than the required by the system (hundreds of digits less).

2.00000000000000000[...]0000000000000000000000000000000000000000000000...

× 10^-1007

as above, with even more zeroes

and finally the last available approximation is

2.000000[...]000000666666[...]666666... × 10^-1007

with a lot of zeroes and sixes not shown. But not ending in 7.

Ending in 7 is because I had exactly 2016 digits precision, and 2016 is 2*1007+1, so exactly the last digit was the first 6 of the 6666666... which becomes 7 because it was correctly rounded.

Well, it's hard to say exactly what's going on, since Wolfram Alpha

doesn't show the context, but it looks like Python's version may agree

with two out of the seven decimal approximations Alpha makes available.

This suggests to me that it's a representation issue.

Definitely not a representation issue. I'm working on a high precision calculator, and was using mpdecimal to generate tables for the CORDIC method, (the tables have atanh(1e-N) for N=1..Precision) and that's how I stumbledupon this issue, the last 2/3 of my tables were consistently showing the wrong results. And I'm not printing the numbers, I'm accessing the digits directly from the data structures, so what you see is exactly what gets calculated, this is not a problem with printing.
It might be a printing issue in Alpha, since they probably use binary numbers (GMP and the like), as opposed to decimal like python.
[...]
Can other people confirm this or is it just my imagination? I tested
with python 2.7.5 in XUbuntu, and also with my own build of mpdecimal
2.4.0 (latest, built from source). I compared the results with wolfram
Alpha, and also with an open source arbitrary precision calculator,
which matches Alpha results.

I think it's worth raising a ticket for it on the bug tracker.

I think it would help if you can provide a link or instructions to how to

replicate this in Alpha,

ln((1+1e-1007)/(1-1e-1007))

also give the name and version number of the

arbitrary precision calculator you used. Anything that can be used to

I used this to double check:
http://preccalc.sourceforge.net

I think you replicated my results perfectly and confirmed something is going on. I'm now especially baffled at the number 4 that appeared in the middle of the sixes, using 4000 for the precision.

I just checked this with python 2.7.6 running on PC-BSD 10, and it's confirmed. So it's not a problem with the compiler used for the build either (PC-BSD/FreeBSD used Clang 3.3, XUbuntu used GCC).

G

#### Guest

What's final? Did you mean partial?

I see what you mean now. Sorry, I meant

final=partial.ln()

M

#### Mark Dickinson

Hello, I believe I found a bug in the Decimal library. The natural logarithm
results seem to be off in a certain range, as compared with Wolfram Alpha.

I had a quick look: this isn't a bug - it's just the result of propagation of
the error in "partial" to "final".

In more detail: we've got a working precision of 2016 significant figures. For
any small x, we have (1 + x) / (1 - x) = 1 + 2x + 2x^2 + 2x^3 + .... For your
value of x, `Decimal('1e-1007'), we've got enough precision to store
1 + 2x + 2x^2 exactly, but that's all.

So partial has an absolute error of around 2x^3, or 2e-3021.

And since the derivative of the natural log function is almost exactly 1 at the
point we're interested in, we expect the absolute error in the output to be
close to 2e-3021, too.

And that's *precisely* what you're seeing: the Decimal module is giving you a
result that's exactly `Decimal('2e-1007') - Decimal('1.3e-3021')`, while the
result you were expecting is `Decimal('2e-1007') + Decimal('0.7e-3021')`. A
difference of exactly `Decimal('2e-3021')`, as expected.

G

#### Guest

I had a quick look: this isn't a bug - it's just the result of propagation of

the error in "partial" to "final".

In more detail: we've got a working precision of 2016 significant figures.. For

any small x, we have (1 + x) / (1 - x) = 1 + 2x + 2x^2 + 2x^3 + .... For your

value of x, `Decimal('1e-1007'), we've got enough precision to store

1 + 2x + 2x^2 exactly, but that's all.

So partial has an absolute error of around 2x^3, or 2e-3021.

And since the derivative of the natural log function is almost exactly 1 at the

point we're interested in, we expect the absolute error in the output to be

close to 2e-3021, too.

Please take a deeper look at my second post. Try the same but this time setthe precision to 4000, 8000 or whatever you need to convince yourself there's no error propagation, yet there's a 4 in the middle that shouldn't be there. See for yourself!

I've tested on all platforms I know of and confirmed it. The wrong digit occurs in the middle of the number. Propagation error would have a bad digit near the end, and garbage after that. Here there's a perfect sequence of numbers, but with one single digit changed in the middle of the number.
No error propagation in a series expansion can do that.

I generated a table of 1000 numbers, one correctly generated and one with mpdecimal. Then I did a diff of both files and it's horrible. The differenceis all over the place, sometimes as high as digit 500 (out of 2000). Almost every result has the bad digit somewhere. The bad digit moves around a lot, from about position 500 to 2000. All other digits are correct, and in a 2000 digit sequence is hard to spot the difference (I used the visual diff tool that comes with TortoiseSVN or TortoiseGit). I think there's a bad pointer or something that's rounding the wrong digit.
You cannot possibly have 1999 correct digits and only 1 changed on every number if it was propagation error.

I'll follow up directly with the author of mpdecimal, as this is somewhat serious on a language that's so widely used as python.

But please test it and confirm, am I seeing ghost digits?

M

#### Mark H Harris

Please take a deeper look at my second post. Try the same but this time
set the precision to 4000, 8000 or whatever you need to convince yourself
there's no error propagation, yet there's a 4 in the middle that shouldn't
be there. See for yourself!
I've tested on all platforms I know of and confirmed it. The wrong digit
occurs in the middle of the number. Propagation error would have a bad digit
near the end, and garbage after that. Here there's a perfect sequence of
numbers, but with one single digit changed in the middle of the number.
No error propagation in a series expansion can do that.

I generated a table of 1000 numbers, one correctly generated and one with
mpdecimal. Then I did a diff of both files and it's horrible. The difference
is all over the place, sometimes as high as digit 500 (out of 2000). Almost
every result has the bad digit somewhere. The bad digit moves around a lot,
from about position 500 to 2000. All other digits are correct, and in a 2000
digit sequence is hard to spot the difference (I used the visual diff tool
that comes with TortoiseSVN or TortoiseGit). I think there's a bad pointer
or something that's rounding the wrong digit.
You cannot possibly have 1999 correct digits and only 1 changed on every
number if it was propagation error.
I'll follow up directly with the author of mpdecimal, as this is somewhat
serious on a language that's so widely used as python.

But please test it and confirm, am I seeing ghost digits?

I don't believe it. I've been running converging series to hudreds of
thousands of decimal places (not only on different platforms, but on
different BigFloat systmes, Jullia, BC, Decimal with pdeclib) and I'm
not seeing the 'middle of the number' error you are talking about. I ran
Ï€ to over 100,000+ digits and compared between BC and Python Decimal
with pilib pdeclib and both compared correct; also compared correct to
the on-line standard. Soooo... not sure ...

I can work on this some this weekend (comparing between Julia BigFloat
and Python Decimal with pdeclib and see. I'm very skeptical at this point.

marcus

M

#### Mark Dickinson

I've tested on all platforms I know of and confirmed it. The wrong digit
occurs in the middle of the number. Propagation error would have a bad digit
near the end, and garbage after that. Here there's a perfect sequence of
numbers, but with one single digit changed in the middle of the number. No
error propagation in a series expansion can do that.

I can see how it might be surprising if you don't think about it too hard, but
I'm afraid that you're wrong here: error propagation is *exactly* what's
causing the effects you're seeing.

Here's another way of looking at it: if you truncate the Taylor series about 0
for (1 + x) / (1 - x) to k (>= 1) terms, you get the polynomial (1 + x - 2x^k)
/ (1 - x). For example, taking k to be 3, we're getting (1 + x - 2x^3) / (1 -
x). Given that the particular value of x you're testing with has the form
10**<negative>, rounding your intermediate result to the working precision has
exactly the effect of truncating the series at some k.

Now you can compute and compare (by hand, via Wolfram alpha, or however you
like) the Taylor series expansions for log((1 + x) / (1 - x)) and log((1 + x -
2x^3) / (1 - x)). For the first you'll see:

2x + 2/3 x^3 + 2/5 x^5 - 2/7 x^7 + 2/9 x^9 - ...

and for the second you'll get:

2x - 4/3 x^3 + 2 x^4 - 8/5 x^5 + 16/7 x^7 - ...

The difference between the two series is:

-2x^3 + 2x^4 - 2x^5 + 2x^7 - 4x^8 + ...

So again with x a small power of 10, you're going to see a single-digit error
from the -2x^3 term, and another single-digit error further along from
the 2x^3 term, and so on.

Here's a simpler example of the same phenomenon. Note how the error propagation
leads to a single incorrect digit in the *middle* of the digit string.

Python 3.4.0 (default, Mar 25 2014, 11:07:05)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin