RE: PEP 327: Decimal Data Type

Discussion in 'Python' started by Batista, Facundo, Feb 2, 2004.

  1. danb_83 wrote:

    #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    #- that humans height comes in discrete packets of 0.01 m. It
    #- means that
    #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    #- posture and the time of day, and "1.80" is just a convenient
    #- approximation. And it wouldn't be inaccurate to express my height as
    #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    #- these are within the tolerance of the measurement. So number base
    #- doesn't matter here.

    Are you saying that it's ok to store your number imprecisely because you
    don't take well measures?


    #- But even if the number base of a measurement doesn't matter,
    #- precision
    #- and speed of calculations often does. And on digital computers,
    #- non-binary arithmetic is inherently imprecise and slow. Imprecise
    #- because register bits are limited and decimal storage wastes them.
    #- (For example, representing the integer 999 999 999 requires
    #- 36 bits in
    #- BCD but only 30 bits in binary. Also, for floating point,
    #- only binary
    #- allows the precision-gaining "hidden bit" trick.) Slow because
    #- decimal requires more complex hardware. (For example, a BCD
    #- adder has
    #- more than twice as many gates as a binary adder.)

    In my dreams, speed and storage are both infinite, :p

    .. Facundo
    Batista, Facundo, Feb 2, 2004
    #1
    1. Advertising

  2. At some point, "Batista, Facundo" <> wrote:

    > danb_83 wrote:
    >
    > #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    > #- that humans height comes in discrete packets of 0.01 m. It
    > #- means that
    > #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    > #- posture and the time of day, and "1.80" is just a convenient
    > #- approximation. And it wouldn't be inaccurate to express my height as
    > #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    > #- these are within the tolerance of the measurement. So number base
    > #- doesn't matter here.
    >
    > Are you saying that it's ok to store your number imprecisely because you
    > don't take well measures?


    What we need for this is an interval type. 1.80 m shouldn't be stored
    as '1.80', but as '1.80 +/- 0.005', and operations such as addition
    and multiplication should propogate the intervals.

    How to do that is another question: for addition, do you add the
    magnitudes of the intervals, or use the square root of the sums of the
    squares, or something else? It greatly depends on what _type_ of error
    0.005 measures (is it the width of a Gaussian distribution? a uniform
    distribution? something skewed that's not representable by one
    number?).

    My 0.0438126 Argentina pesos [1]

    [1] $0.02 Canadian, which hilights the other problem with any
    representation of a number without units -- decimal or otherwise.

    --
    |>|\/|<
    /--------------------------------------------------------------------------\
    |David M. Cooke
    |cookedm(at)physics(dot)mcmaster(dot)ca
    David M. Cooke, Feb 2, 2004
    #2
    1. Advertising

  3. On Mon, 02 Feb 2004 17:07:52 -0500,
    (David M. Cooke) wrote:

    >At some point, "Batista, Facundo" <> wrote:
    >
    >> danb_83 wrote:
    >>
    >> #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    >> #- that humans height comes in discrete packets of 0.01 m. It
    >> #- means that
    >> #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    >> #- posture and the time of day, and "1.80" is just a convenient
    >> #- approximation. And it wouldn't be inaccurate to express my height as
    >> #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    >> #- these are within the tolerance of the measurement. So number base
    >> #- doesn't matter here.
    >>
    >> Are you saying that it's ok to store your number imprecisely because you
    >> don't take well measures?

    >
    >What we need for this is an interval type. 1.80 m shouldn't be stored
    >as '1.80', but as '1.80 +/- 0.005', and operations such as addition
    >and multiplication should propogate the intervals.


    I disagree with this, not because it is a bad idea to keep track of
    precision, but because this should not be a part of the float type or
    of basic arithmetic operations.

    When you write a value with its precision specified in the form of an
    interval, that interval is a second number. The value with the
    precision is a compound representation, built up using simpler
    components. It doesn't mean that the components no longer have uses
    outside of the compound. In Python, the same should apply - a numeric
    type that can track precision sounds useful, but it shouldn't replace
    the existing float.

    One good reason is simply that knowledge of the precision is only
    sometimes useful. As an obvious example, what would the point be of
    keeping track of the precision of the calculations in a 3D game -
    there is no point as the information about precision has no bearing on
    the rendering of the image.

    Besides this, there is a much more fundamental problem.

    The whole point of using an imprecise representation is because
    manipulating a perfect representation is impractical - mainly slow.

    It is true that in general the source is inherently approximate too,
    meaning that floats are a quite a good match for the physical
    measurements they are often used to represent, but still if it were
    practical to do perfect arithmetic on those approximate values it
    would give slightly more precise answers as the arithmetic would not
    introduce additional sources of error.

    Having an approximate representation with an interval sounds good, but
    remember that one error source is the arithmetic itself - e.g. 1.0 /
    3.0 cannot be finitely represented in either binary or decimal without
    error (except as a rational, of course).

    So therefore, in answer to your question...

    >How to do that is another question: for addition, do you add the
    >magnitudes of the intervals, or use the square root of the sums of the
    >squares, or something else? It greatly depends on what _type_ of error
    >0.005 measures (is it the width of a Gaussian distribution? a uniform
    >distribution? something skewed that's not representable by one
    >number?).


    None of these is sufficient - they may track the errors resulting from
    measurement issues (if you choose the appropriate method for your
    application) but neither takes into account errors resulting from the
    imprecision of the arithmetic. Furthermore, to keep track of such
    imprecision precisely means you need an infinitely precise numeric
    representation for your interval - and if it was practical to do that,
    it would be far better to just use that representation for the value
    itself.

    This doesn't mean that tracking precision is a bad idea. It just means
    that when it is done, the error interval itself should be imprecise.
    You should have the guarantee that the real value is never going to be
    outside of the given bounds, but not the guarantee that the bounds are
    as close together as possible - the bounds should be allowed to get a
    little further apart to allow for imprecision in the calculation of
    the interval.

    And if the error interval is itself an approximation, why track it on
    every single arithmetic operation? Unless you have a specific good
    reason to do so, it makes much more sense to handle the precision
    tracking at a higher level. And as those higher level operations are
    often going to be application specific, having a single library for it
    (ie not tailored to some particular type of task) is IMO unlikely to
    work.

    For instance, consider calculating and applying a 3D rotation matrix
    to a vector. If you track errors on every float value, that is 9
    values in the matrix with error values (due to limited precision trig
    functions etc) and 3 values in the vector, a dozen for the
    intermediate results in the matrix multiplication, and 3 error
    intervals for the 3 dimensions of the output vector. But the odds are
    that all you want is a single float value - the maximum distance
    between the real point and the point represented by the output vector,
    and you can probably get a good value for that by multiplying the
    length of the input vector by some 'potential error from rotation'
    constant.

    Incidentally, it would not always be appropriate to include arithmetic
    errors in error intervals. For instance, some statistical interval
    types do not guarantee that all values are within the interval range.
    They may guarantee that 95% of values are within the interval, for
    instance - _and_ that 5% of values are outside the interval. The 5%
    outside is as important as the 95% inside, so there is no acceptable
    direction to move the bounds a little 'just to be safe'.

    In some cases, you might even want to track the error interval (from
    arithmetic error) for your error interval value. I can certainly
    imagine a result with the form...

    The average widginess of a blodgit is 9.5 +/- 0.2
    95% differ from the average by less than 2.7 +/- 0.03

    Thus I can say that this randomly chosen blodgit has a
    widginess of (9.5 +/- 0.2) +/- (2.7 +/- 0.03) with 95% confidence.

    You might even get results like that it you had estimated the average
    and distribution of widginess from a sample of the blodgits - in which
    case, you may still need to account from the arithmetic error which
    requires potentially another four values ;-)


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
    Stephen Horne, Feb 4, 2004
    #3
  4. At some point, Stephen Horne <> wrote:

    > On Mon, 02 Feb 2004 17:07:52 -0500,
    > (David M. Cooke) wrote:
    >
    >>At some point, "Batista, Facundo" <> wrote:
    >>
    >>> danb_83 wrote:
    >>>
    >>> #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    >>> #- that humans height comes in discrete packets of 0.01 m. It
    >>> #- means that
    >>> #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    >>> #- posture and the time of day, and "1.80" is just a convenient
    >>> #- approximation. And it wouldn't be inaccurate to express my height as
    >>> #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    >>> #- these are within the tolerance of the measurement. So number base
    >>> #- doesn't matter here.
    >>>
    >>> Are you saying that it's ok to store your number imprecisely because you
    >>> don't take well measures?

    >>
    >>What we need for this is an interval type. 1.80 m shouldn't be stored
    >>as '1.80', but as '1.80 +/- 0.005', and operations such as addition
    >>and multiplication should propogate the intervals.

    >
    > I disagree with this, not because it is a bad idea to keep track of
    > precision, but because this should not be a part of the float type or
    > of basic arithmetic operations.
    >


    I was being a bit facetious :) This is certainly something that can
    be done without being builtin, like this:
    http://pedro.dnp.fmph.uniba.sk/~stanys/Uncertainities.py

    > Having an approximate representation with an interval sounds good, but
    > remember that one error source is the arithmetic itself - e.g. 1.0 /
    > 3.0 cannot be finitely represented in either binary or decimal without
    > error (except as a rational, of course).


    Hey, if my measurement error is so small that arithmetic error becomes
    significant, I'm happy.

    --
    |>|\/|<
    /--------------------------------------------------------------------------\
    |David M. Cooke
    |cookedm(at)physics(dot)mcmaster(dot)ca
    David M. Cooke, Feb 4, 2004
    #4
  5. On Wed, 04 Feb 2004 14:52:42 -0500,
    (David M. Cooke) wrote:

    >I was being a bit facetious :)


    Ah - sorry for taking it the wrong way.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
    Stephen Horne, Feb 4, 2004
    #5
  6. On Wed, 04 Feb 2004 01:59:41 +0000, Stephen Horne <> wrote:
    [...]
    A bunch of stuff including stuff about intervals which probably could
    benefit from revision in the light of, e.g.,

    http://www.americanscientist.org/template/AssetDetail/assetid/28331;jsessionid=aaa41kNy_Uu1-c

    or the whole in "printer-friendly" format

    http://www.americanscientist.org/template/AssetDetail/assetid/28331/page/3?&print=yes

    or the .pdf (nicer) at

    http://www.americanscientist.org/template/PDFDetail/assetid/28315;jsessionid=aaa41kNy_Uu1-c

    see also

    http://www.cs.utep.edu/interval-comp/

    Google is your friend ;-)

    Regards,
    Bengt Richter
    Bengt Richter, Feb 6, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Christoph Becker-Freyseng

    PEP for new modules (I read PEP 2)

    Christoph Becker-Freyseng, Jan 15, 2004, in forum: Python
    Replies:
    3
    Views:
    362
    Gerrit Holl
    Jan 16, 2004
  2. Batista, Facundo

    PEP 327: Decimal Data Type

    Batista, Facundo, Jan 30, 2004, in forum: Python
    Replies:
    9
    Views:
    316
    Jeff Epler
    Feb 6, 2004
  3. Batista, Facundo

    RE: PEP 327: Decimal Data Type

    Batista, Facundo, Jan 30, 2004, in forum: Python
    Replies:
    15
    Views:
    489
  4. Batista, Facundo

    RE: PEP 327: Decimal Data Type

    Batista, Facundo, Feb 3, 2004, in forum: Python
    Replies:
    5
    Views:
    343
    Bengt Richter
    Feb 9, 2004
  5. Gilbert Fine
    Replies:
    8
    Views:
    888
    Zentrader
    Aug 1, 2007
Loading...

Share This Page