Bug in Decimal??

Discussion in 'Python' started by pleasedontspam@isp.com, Apr 30, 2014.

  1. Guest

    Hello,
    I believe I found a bug in the Decimal library. The natural logarithm results seem to be off in a certain range, as compared with Wolfram Alpha.

    Here's an example:

    from decimal import *
    getcontext().prec=2016
    one=Decimal(1)
    number=Decimal('1e-1007')

    partial=(one+number)/(one-number)

    final.ln()


    The result should be 2.00000... with all zeroes and 7 at the end. Instead, I'm getting 1.999999.. with 2 digits different at the end.

    I checked that this happens for any 1e-N with N relatively large (above about 300 it starts showing bad digits at the end.
    Set the precision even higher and you'll see more digits off, it seems for number=10^-N, the error begins around 10^(-2N). The partial result (1+x)/(1-x) is correct to the number of digits specified, so it seems ln() is having accuracy problems.

    Can other people confirm this or is it just my imagination? I tested with python 2.7.5 in XUbuntu, and also with my own build of mpdecimal 2.4.0 (latest, built from source). I compared the results with wolfram Alpha, and alsowith an open source arbitrary precision calculator, which matches Alpha results.
    , Apr 30, 2014
    #1
    1. Advertising

  2. On Tue, 29 Apr 2014 19:37:17 -0700, pleasedontspam wrote:

    > from decimal import *
    > getcontext().prec=2016
    > one=Decimal(1)
    > number=Decimal('1e-1007')
    > partial=(one+number)/(one-number)
    > final.ln()


    What's final? Did you mean partial?

    When I try it in Python 3.3, I get:

    py> from decimal import *
    py> getcontext().prec=2016
    py> one = Decimal(1)
    py> number = Decimal('1e-1007')
    py> partial = (one+number)/(one-number)
    py> partial.ln()
    Decimal('1.9999[...]9999987E-1007')

    with a lot of 9s deleted for brevity.

    > The result should be 2.00000... with all zeroes and 7 at the end.


    Can you demonstrate exactly how you tested it on Wolfram-Alpha, because I
    cannot reproduce that. When I try it with ln((1+1e-1007)/(1-1e-1007)),
    the decimal expansion shown starts off as:

    2.00000000000000000000000000000000000000000000000000000... × 10^-1007

    By repeatedly clicking "More digits", I get:

    2.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...
    × 10^-1007

    For brevity, I will truncate some repeated digits from this point on.


    1.99999999999999999[...]999999999999999999999999999999999999999999999...
    × 10^-1007

    as above, except with even more nines (477 of them if I have calculated
    correctly)

    2.00000000000000000[...]0000000000000000000000000000000000000000000000...
    × 10^-1007

    as above, with even more zeroes

    and finally the last available approximation is

    2.000000[...]000000666666[...]666666... × 10^-1007

    with a lot of zeroes and sixes not shown. But not ending in 7.


    > Instead, I'm getting 1.999999.. with 2 digits different at the end.


    Well, it's hard to say exactly what's going on, since Wolfram Alpha
    doesn't show the context, but it looks like Python's version may agree
    with two out of the seven decimal approximations Alpha makes available.
    This suggests to me that it's a representation issue.


    [...]
    > Can other people confirm this or is it just my imagination? I tested
    > with python 2.7.5 in XUbuntu, and also with my own build of mpdecimal
    > 2.4.0 (latest, built from source). I compared the results with wolfram
    > Alpha, and also with an open source arbitrary precision calculator,
    > which matches Alpha results.


    I think it's worth raising a ticket for it on the bug tracker.

    I think it would help if you can provide a link or instructions to how to
    replicate this in Alpha, also give the name and version number of the
    arbitrary precision calculator you used. Anything that can be used to
    replicate your results.



    --
    Steven D'Aprano
    http://import-that.dreamwidth.org/
    Steven D'Aprano, Apr 30, 2014
    #2
    1. Advertising

  3. wrote:
    > I compared the results with wolfram Alpha, and
    > also with an open source arbitrary precision calculator, which matches Alpha
    > results.


    Decimal is *not* an arbitrary precision data type, so you
    can't expect exact results from it. You can set the precision
    to be very large, but it's still essentially a floating point
    type, and as such you can expect the last few digits to
    depend heavily on the details of the algorithm used.

    --
    Greg
    Gregory Ewing, Apr 30, 2014
    #3
  4. Guest

    On Tuesday, April 29, 2014 11:39:12 PM UTC-4, Steven D'Aprano wrote:
    > On Tue, 29 Apr 2014 19:37:17 -0700, pleasedontspam wrote:
    >
    >
    >
    > > from decimal import *

    >
    > > getcontext().prec=2016

    >
    > > one=Decimal(1)

    >
    > > number=Decimal('1e-1007')

    >
    > > partial=(one+number)/(one-number)

    >
    > > final.ln()

    >
    >
    >
    > What's final? Did you mean partial?


    Final is my final result: ln((1+x)/(1-x))
    I called the quotient partial because I wanted to investigate if perhaps the result of the division (that's my partial result) had a rounding error, which would throw off the ln().



    >
    >
    >
    > When I try it in Python 3.3, I get:
    >
    >
    >
    > py> from decimal import *
    >
    > py> getcontext().prec=2016
    >
    > py> one = Decimal(1)
    >
    > py> number = Decimal('1e-1007')
    >
    > py> partial = (one+number)/(one-number)
    >
    > py> partial.ln()
    >
    > Decimal('1.9999[...]9999987E-1007')
    >
    >
    >
    > with a lot of 9s deleted for brevity.


    >
    >
    >
    > > The result should be 2.00000... with all zeroes and 7 at the end.

    >
    >
    >
    > Can you demonstrate exactly how you tested it on Wolfram-Alpha, because I
    >
    > cannot reproduce that. When I try it with ln((1+1e-1007)/(1-1e-1007)),


    That's exactly how you test it.
    The result should be 2.0000... 7 (at 2016 digits precision, at higher precisions it will continue adding another 2014 sixes until the digits become something else.

    This is from the definition of the yperbolic arc tangent:

    atanh(x) = 1/2 * ln ( (1+x)/(1-x) )

    As it turns out to be the ln() part of 1e-N is the integer 2, followed by 2N zeroes, then followed by 2N-1 sixes, then a 0, a seven, then another group of around 2N sixes (I didn't actually count them), then other interestingdigit patterns.
    Try with 1e-1007 using 4000 digits or more and you'll see python giving a better result... but there's a 4 in the middle of the sixes! That 4 shouldn't be there.
    getcontext().prec=4000, then do the rest the same (you need to recalculate the partial result at the higher precision too).

    >
    > the decimal expansion shown starts off as:
    >
    >
    >
    > 2.00000000000000000000000000000000000000000000000000000... × 10^-1007
    >
    >
    >
    > By repeatedly clicking "More digits", I get:
    >
    >
    >
    > 2.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...
    >
    > × 10^-1007
    >
    >
    >
    > For brevity, I will truncate some repeated digits from this point on.
    >
    >
    >
    >
    >
    > 1.99999999999999999[...]999999999999999999999999999999999999999999999...
    >
    > × 10^-1007
    >
    >
    >
    > as above, except with even more nines (477 of them if I have calculated
    >
    > correctly)
    >


    You're correct! It seems Alpha has its own problems too. Depending on the precision, it's giving the correct or incorrect result, changing as you hit 'more digits'.
    I can imagine this could happen because the (1+x)/(1-x) ratio becomes 1. followed by 2N zeroes, then a 2, then 2N zeroes, then another 2, etc.
    If the precision is such that your last digit just includes the next 2 or not, it can throw the result off a little bit. But this should self-correct as the precision goes higher, as it does in Wolfram Alpha.
    But python is giving bad data even with very high precision, as I mentionedabove with 4000 digits, there's a bad digit in the middle of the sixes, atmuch less precision than the required by the system (hundreds of digits less).


    >
    >
    > 2.00000000000000000[...]0000000000000000000000000000000000000000000000...
    >
    > × 10^-1007
    >
    >
    >
    > as above, with even more zeroes
    >
    >
    >
    > and finally the last available approximation is
    >
    >
    >
    > 2.000000[...]000000666666[...]666666... × 10^-1007
    >
    >
    >
    > with a lot of zeroes and sixes not shown. But not ending in 7.


    Ending in 7 is because I had exactly 2016 digits precision, and 2016 is 2*1007+1, so exactly the last digit was the first 6 of the 6666666... which becomes 7 because it was correctly rounded.


    >
    >
    >
    >
    >
    > > Instead, I'm getting 1.999999.. with 2 digits different at the end.

    >
    >
    >
    > Well, it's hard to say exactly what's going on, since Wolfram Alpha
    >
    > doesn't show the context, but it looks like Python's version may agree
    >
    > with two out of the seven decimal approximations Alpha makes available.
    >
    > This suggests to me that it's a representation issue.
    >


    Definitely not a representation issue. I'm working on a high precision calculator, and was using mpdecimal to generate tables for the CORDIC method, (the tables have atanh(1e-N) for N=1..Precision) and that's how I stumbledupon this issue, the last 2/3 of my tables were consistently showing the wrong results. And I'm not printing the numbers, I'm accessing the digits directly from the data structures, so what you see is exactly what gets calculated, this is not a problem with printing.
    It might be a printing issue in Alpha, since they probably use binary numbers (GMP and the like), as opposed to decimal like python.

    >
    >
    >
    >
    > [...]
    >
    > > Can other people confirm this or is it just my imagination? I tested

    >
    > > with python 2.7.5 in XUbuntu, and also with my own build of mpdecimal

    >
    > > 2.4.0 (latest, built from source). I compared the results with wolfram

    >
    > > Alpha, and also with an open source arbitrary precision calculator,

    >
    > > which matches Alpha results.

    >
    >
    >
    > I think it's worth raising a ticket for it on the bug tracker.
    >
    >
    >
    > I think it would help if you can provide a link or instructions to how to
    >
    > replicate this in Alpha,


    You already did it right!

    ln((1+1e-1007)/(1-1e-1007))


    > also give the name and version number of the
    >
    > arbitrary precision calculator you used. Anything that can be used to
    >
    > replicate your results.


    I used this to double check:
    http://preccalc.sourceforge.net

    I think you replicated my results perfectly and confirmed something is going on. I'm now especially baffled at the number 4 that appeared in the middle of the sixes, using 4000 for the precision.

    I just checked this with python 2.7.6 running on PC-BSD 10, and it's confirmed. So it's not a problem with the compiler used for the build either (PC-BSD/FreeBSD used Clang 3.3, XUbuntu used GCC).


    >
    >
    >
    >
    >
    >
    >
    > --
    >
    > Steven D'Aprano
    >
    > http://import-that.dreamwidth.org/
    , Apr 30, 2014
    #4
  5. Guest

    On Tuesday, April 29, 2014 11:39:12 PM UTC-4, Steven D'Aprano wrote:
    > On Tue, 29 Apr 2014 19:37:17 -0700, pleasedontspam wrote:
    >
    >
    >
    > > from decimal import *

    >
    > > getcontext().prec=2016

    >
    > > one=Decimal(1)

    >
    > > number=Decimal('1e-1007')

    >
    > > partial=(one+number)/(one-number)

    >
    > > final.ln()

    >
    >
    >
    > What's final? Did you mean partial?


    I see what you mean now. Sorry, I meant

    final=partial.ln()
    , Apr 30, 2014
    #5
  6. On Tue, 29 Apr 2014 19:37:17 -0700, pleasedontspam wrote:

    > Hello, I believe I found a bug in the Decimal library. The natural logarithm
    > results seem to be off in a certain range, as compared with Wolfram Alpha.


    I had a quick look: this isn't a bug - it's just the result of propagation of
    the error in "partial" to "final".

    In more detail: we've got a working precision of 2016 significant figures. For
    any small x, we have (1 + x) / (1 - x) = 1 + 2x + 2x^2 + 2x^3 + .... For your
    value of x, `Decimal('1e-1007'), we've got enough precision to store
    1 + 2x + 2x^2 exactly, but that's all.

    So partial has an absolute error of around 2x^3, or 2e-3021.

    And since the derivative of the natural log function is almost exactly 1 at the
    point we're interested in, we expect the absolute error in the output to be
    close to 2e-3021, too.

    And that's *precisely* what you're seeing: the Decimal module is giving you a
    result that's exactly `Decimal('2e-1007') - Decimal('1.3e-3021')`, while the
    result you were expecting is `Decimal('2e-1007') + Decimal('0.7e-3021')`. A
    difference of exactly `Decimal('2e-3021')`, as expected.

    --
    Mark
    Mark Dickinson, May 4, 2014
    #6
  7. Guest

    On Sunday, May 4, 2014 6:53:06 AM UTC-4, Mark Dickinson wrote:

    > I had a quick look: this isn't a bug - it's just the result of propagation of
    >
    > the error in "partial" to "final".
    >
    >
    >
    > In more detail: we've got a working precision of 2016 significant figures.. For
    >
    > any small x, we have (1 + x) / (1 - x) = 1 + 2x + 2x^2 + 2x^3 + .... For your
    >
    > value of x, `Decimal('1e-1007'), we've got enough precision to store
    >
    > 1 + 2x + 2x^2 exactly, but that's all.
    >
    >
    >
    > So partial has an absolute error of around 2x^3, or 2e-3021.
    >
    >
    >
    > And since the derivative of the natural log function is almost exactly 1 at the
    >
    > point we're interested in, we expect the absolute error in the output to be
    >
    > close to 2e-3021, too.
    >


    Please take a deeper look at my second post. Try the same but this time setthe precision to 4000, 8000 or whatever you need to convince yourself there's no error propagation, yet there's a 4 in the middle that shouldn't be there. See for yourself!

    I've tested on all platforms I know of and confirmed it. The wrong digit occurs in the middle of the number. Propagation error would have a bad digit near the end, and garbage after that. Here there's a perfect sequence of numbers, but with one single digit changed in the middle of the number.
    No error propagation in a series expansion can do that.

    I generated a table of 1000 numbers, one correctly generated and one with mpdecimal. Then I did a diff of both files and it's horrible. The differenceis all over the place, sometimes as high as digit 500 (out of 2000). Almost every result has the bad digit somewhere. The bad digit moves around a lot, from about position 500 to 2000. All other digits are correct, and in a 2000 digit sequence is hard to spot the difference (I used the visual diff tool that comes with TortoiseSVN or TortoiseGit). I think there's a bad pointer or something that's rounding the wrong digit.
    You cannot possibly have 1999 correct digits and only 1 changed on every number if it was propagation error.

    I'll follow up directly with the author of mpdecimal, as this is somewhat serious on a language that's so widely used as python.

    But please test it and confirm, am I seeing ghost digits?
    , May 15, 2014
    #7
  8. On 5/15/14 3:45 PM, wrote:
    > Please take a deeper look at my second post. Try the same but this time
    > set the precision to 4000, 8000 or whatever you need to convince yourself
    > there's no error propagation, yet there's a 4 in the middle that shouldn't


    > be there. See for yourself!
    > I've tested on all platforms I know of and confirmed it. The wrong digit
    > occurs in the middle of the number. Propagation error would have a bad digit
    > near the end, and garbage after that. Here there's a perfect sequence of
    > numbers, but with one single digit changed in the middle of the number.
    > No error propagation in a series expansion can do that.
    >
    > I generated a table of 1000 numbers, one correctly generated and one with
    > mpdecimal. Then I did a diff of both files and it's horrible. The difference
    > is all over the place, sometimes as high as digit 500 (out of 2000). Almost
    > every result has the bad digit somewhere. The bad digit moves around a lot,
    > from about position 500 to 2000. All other digits are correct, and in a 2000
    > digit sequence is hard to spot the difference (I used the visual diff tool
    > that comes with TortoiseSVN or TortoiseGit). I think there's a bad pointer
    > or something that's rounding the wrong digit.
    > You cannot possibly have 1999 correct digits and only 1 changed on every
    > number if it was propagation error.


    > I'll follow up directly with the author of mpdecimal, as this is somewhat
    > serious on a language that's so widely used as python.
    >
    > But please test it and confirm, am I seeing ghost digits?
    >


    I don't believe it. I've been running converging series to hudreds of
    thousands of decimal places (not only on different platforms, but on
    different BigFloat systmes, Jullia, BC, Decimal with pdeclib) and I'm
    not seeing the 'middle of the number' error you are talking about. I ran
    π to over 100,000+ digits and compared between BC and Python Decimal
    with pilib pdeclib and both compared correct; also compared correct to
    the on-line standard. Soooo... not sure ...

    I can work on this some this weekend (comparing between Julia BigFloat
    and Python Decimal with pdeclib and see. I'm very skeptical at this point.

    marcus
    Mark H Harris, May 15, 2014
    #8
  9. <pleasedontspam <at> isp.com> writes:

    > I've tested on all platforms I know of and confirmed it. The wrong digit
    > occurs in the middle of the number. Propagation error would have a bad digit
    > near the end, and garbage after that. Here there's a perfect sequence of
    > numbers, but with one single digit changed in the middle of the number. No
    > error propagation in a series expansion can do that.


    I can see how it might be surprising if you don't think about it too hard, but
    I'm afraid that you're wrong here: error propagation is *exactly* what's
    causing the effects you're seeing.

    Here's another way of looking at it: if you truncate the Taylor series about 0
    for (1 + x) / (1 - x) to k (>= 1) terms, you get the polynomial (1 + x - 2x^k)
    / (1 - x). For example, taking k to be 3, we're getting (1 + x - 2x^3) / (1 -
    x). Given that the particular value of x you're testing with has the form
    10**<negative>, rounding your intermediate result to the working precision has
    exactly the effect of truncating the series at some k.

    Now you can compute and compare (by hand, via Wolfram alpha, or however you
    like) the Taylor series expansions for log((1 + x) / (1 - x)) and log((1 + x -
    2x^3) / (1 - x)). For the first you'll see:

    2x + 2/3 x^3 + 2/5 x^5 - 2/7 x^7 + 2/9 x^9 - ...

    and for the second you'll get:

    2x - 4/3 x^3 + 2 x^4 - 8/5 x^5 + 16/7 x^7 - ...

    The difference between the two series is:

    -2x^3 + 2x^4 - 2x^5 + 2x^7 - 4x^8 + ...

    So again with x a small power of 10, you're going to see a single-digit error
    from the -2x^3 term, and another single-digit error further along from
    the 2x^3 term, and so on.

    Here's a simpler example of the same phenomenon. Note how the error propagation
    leads to a single incorrect digit in the *middle* of the digit string.

    Python 3.4.0 (default, Mar 25 2014, 11:07:05)
    [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from decimal import Decimal
    >>> x = Decimal('1e-15')
    >>> y = (1 - 2 * x) / (1 - x)
    >>> 2 * x + (y - 1) * (1 - x) # Mathematically, expect to get 'x' back.

    Decimal('1.000000000000001000000000000E-15')
    >>> x

    Decimal('1E-15')

    --
    Mark
    Mark Dickinson, May 19, 2014
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ven
    Replies:
    3
    Views:
    1,306
  2. Gilbert Fine
    Replies:
    8
    Views:
    888
    Zentrader
    Aug 1, 2007
  3. Vitaliy
    Replies:
    1
    Views:
    464
    Peter Otten
    May 29, 2008
  4. valpa
    Replies:
    11
    Views:
    1,491
    Steven D'Aprano
    Mar 24, 2009
  5. Replies:
    0
    Views:
    275
Loading...

Share This Page