# RE: PEP 327: Decimal Data Type

Discussion in 'Python' started by Batista, Facundo, Feb 2, 2004.

1. ### Batista, FacundoGuest

danb_83 wrote:

#- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
#- that humans height comes in discrete packets of 0.01 m. It
#- means that
#- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
#- posture and the time of day, and "1.80" is just a convenient
#- approximation. And it wouldn't be inaccurate to express my height as
#- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
#- these are within the tolerance of the measurement. So number base
#- doesn't matter here.

Are you saying that it's ok to store your number imprecisely because you
don't take well measures?

#- But even if the number base of a measurement doesn't matter,
#- precision
#- and speed of calculations often does. And on digital computers,
#- non-binary arithmetic is inherently imprecise and slow. Imprecise
#- because register bits are limited and decimal storage wastes them.
#- (For example, representing the integer 999 999 999 requires
#- 36 bits in
#- BCD but only 30 bits in binary. Also, for floating point,
#- only binary
#- allows the precision-gaining "hidden bit" trick.) Slow because
#- decimal requires more complex hardware. (For example, a BCD
#- more than twice as many gates as a binary adder.)

In my dreams, speed and storage are both infinite,

.. Facundo

Batista, Facundo, Feb 2, 2004

2. ### David M. CookeGuest

At some point, "Batista, Facundo" <> wrote:

> danb_83 wrote:
>
> #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
> #- that humans height comes in discrete packets of 0.01 m. It
> #- means that
> #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
> #- posture and the time of day, and "1.80" is just a convenient
> #- approximation. And it wouldn't be inaccurate to express my height as
> #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
> #- these are within the tolerance of the measurement. So number base
> #- doesn't matter here.
>
> Are you saying that it's ok to store your number imprecisely because you
> don't take well measures?

What we need for this is an interval type. 1.80 m shouldn't be stored
as '1.80', but as '1.80 +/- 0.005', and operations such as addition
and multiplication should propogate the intervals.

How to do that is another question: for addition, do you add the
magnitudes of the intervals, or use the square root of the sums of the
squares, or something else? It greatly depends on what _type_ of error
0.005 measures (is it the width of a Gaussian distribution? a uniform
distribution? something skewed that's not representable by one
number?).

My 0.0438126 Argentina pesos [1]

[1] \$0.02 Canadian, which hilights the other problem with any
representation of a number without units -- decimal or otherwise.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

David M. Cooke, Feb 2, 2004

3. ### Stephen HorneGuest

On Mon, 02 Feb 2004 17:07:52 -0500,
(David M. Cooke) wrote:

>At some point, "Batista, Facundo" <> wrote:
>
>> danb_83 wrote:
>>
>> #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
>> #- that humans height comes in discrete packets of 0.01 m. It
>> #- means that
>> #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
>> #- posture and the time of day, and "1.80" is just a convenient
>> #- approximation. And it wouldn't be inaccurate to express my height as
>> #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
>> #- these are within the tolerance of the measurement. So number base
>> #- doesn't matter here.
>>
>> Are you saying that it's ok to store your number imprecisely because you
>> don't take well measures?

>
>What we need for this is an interval type. 1.80 m shouldn't be stored
>as '1.80', but as '1.80 +/- 0.005', and operations such as addition
>and multiplication should propogate the intervals.

I disagree with this, not because it is a bad idea to keep track of
precision, but because this should not be a part of the float type or
of basic arithmetic operations.

When you write a value with its precision specified in the form of an
interval, that interval is a second number. The value with the
precision is a compound representation, built up using simpler
components. It doesn't mean that the components no longer have uses
outside of the compound. In Python, the same should apply - a numeric
type that can track precision sounds useful, but it shouldn't replace
the existing float.

One good reason is simply that knowledge of the precision is only
sometimes useful. As an obvious example, what would the point be of
keeping track of the precision of the calculations in a 3D game -
there is no point as the information about precision has no bearing on
the rendering of the image.

Besides this, there is a much more fundamental problem.

The whole point of using an imprecise representation is because
manipulating a perfect representation is impractical - mainly slow.

It is true that in general the source is inherently approximate too,
meaning that floats are a quite a good match for the physical
measurements they are often used to represent, but still if it were
practical to do perfect arithmetic on those approximate values it
would give slightly more precise answers as the arithmetic would not

Having an approximate representation with an interval sounds good, but
remember that one error source is the arithmetic itself - e.g. 1.0 /
3.0 cannot be finitely represented in either binary or decimal without
error (except as a rational, of course).

>How to do that is another question: for addition, do you add the
>magnitudes of the intervals, or use the square root of the sums of the
>squares, or something else? It greatly depends on what _type_ of error
>0.005 measures (is it the width of a Gaussian distribution? a uniform
>distribution? something skewed that's not representable by one
>number?).

None of these is sufficient - they may track the errors resulting from
measurement issues (if you choose the appropriate method for your
application) but neither takes into account errors resulting from the
imprecision of the arithmetic. Furthermore, to keep track of such
imprecision precisely means you need an infinitely precise numeric
representation for your interval - and if it was practical to do that,
it would be far better to just use that representation for the value
itself.

This doesn't mean that tracking precision is a bad idea. It just means
that when it is done, the error interval itself should be imprecise.
You should have the guarantee that the real value is never going to be
outside of the given bounds, but not the guarantee that the bounds are
as close together as possible - the bounds should be allowed to get a
little further apart to allow for imprecision in the calculation of
the interval.

And if the error interval is itself an approximation, why track it on
every single arithmetic operation? Unless you have a specific good
reason to do so, it makes much more sense to handle the precision
tracking at a higher level. And as those higher level operations are
often going to be application specific, having a single library for it
(ie not tailored to some particular type of task) is IMO unlikely to
work.

For instance, consider calculating and applying a 3D rotation matrix
to a vector. If you track errors on every float value, that is 9
values in the matrix with error values (due to limited precision trig
functions etc) and 3 values in the vector, a dozen for the
intermediate results in the matrix multiplication, and 3 error
intervals for the 3 dimensions of the output vector. But the odds are
that all you want is a single float value - the maximum distance
between the real point and the point represented by the output vector,
and you can probably get a good value for that by multiplying the
length of the input vector by some 'potential error from rotation'
constant.

Incidentally, it would not always be appropriate to include arithmetic
errors in error intervals. For instance, some statistical interval
types do not guarantee that all values are within the interval range.
They may guarantee that 95% of values are within the interval, for
instance - _and_ that 5% of values are outside the interval. The 5%
outside is as important as the 95% inside, so there is no acceptable
direction to move the bounds a little 'just to be safe'.

In some cases, you might even want to track the error interval (from
arithmetic error) for your error interval value. I can certainly
imagine a result with the form...

The average widginess of a blodgit is 9.5 +/- 0.2
95% differ from the average by less than 2.7 +/- 0.03

Thus I can say that this randomly chosen blodgit has a
widginess of (9.5 +/- 0.2) +/- (2.7 +/- 0.03) with 95% confidence.

You might even get results like that it you had estimated the average
and distribution of widginess from a sample of the blodgits - in which
case, you may still need to account from the arithmetic error which
requires potentially another four values ;-)

--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk

Stephen Horne, Feb 4, 2004
4. ### David M. CookeGuest

At some point, Stephen Horne <> wrote:

> On Mon, 02 Feb 2004 17:07:52 -0500,
> (David M. Cooke) wrote:
>
>>At some point, "Batista, Facundo" <> wrote:
>>
>>> danb_83 wrote:
>>>
>>> #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
>>> #- that humans height comes in discrete packets of 0.01 m. It
>>> #- means that
>>> #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
>>> #- posture and the time of day, and "1.80" is just a convenient
>>> #- approximation. And it wouldn't be inaccurate to express my height as
>>> #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
>>> #- these are within the tolerance of the measurement. So number base
>>> #- doesn't matter here.
>>>
>>> Are you saying that it's ok to store your number imprecisely because you
>>> don't take well measures?

>>
>>What we need for this is an interval type. 1.80 m shouldn't be stored
>>as '1.80', but as '1.80 +/- 0.005', and operations such as addition
>>and multiplication should propogate the intervals.

>
> I disagree with this, not because it is a bad idea to keep track of
> precision, but because this should not be a part of the float type or
> of basic arithmetic operations.
>

I was being a bit facetious This is certainly something that can
be done without being builtin, like this:
http://pedro.dnp.fmph.uniba.sk/~stanys/Uncertainities.py

> Having an approximate representation with an interval sounds good, but
> remember that one error source is the arithmetic itself - e.g. 1.0 /
> 3.0 cannot be finitely represented in either binary or decimal without
> error (except as a rational, of course).

Hey, if my measurement error is so small that arithmetic error becomes
significant, I'm happy.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

David M. Cooke, Feb 4, 2004
5. ### Stephen HorneGuest

On Wed, 04 Feb 2004 14:52:42 -0500,
(David M. Cooke) wrote:

>I was being a bit facetious

Ah - sorry for taking it the wrong way.

--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk

Stephen Horne, Feb 4, 2004
6. ### Bengt RichterGuest

On Wed, 04 Feb 2004 01:59:41 +0000, Stephen Horne <> wrote:
[...]
A bunch of stuff including stuff about intervals which probably could
benefit from revision in the light of, e.g.,

http://www.americanscientist.org/template/AssetDetail/assetid/28331;jsessionid=aaa41kNy_Uu1-c

or the whole in "printer-friendly" format

http://www.americanscientist.org/template/AssetDetail/assetid/28331/page/3?&print=yes

or the .pdf (nicer) at

http://www.americanscientist.org/template/PDFDetail/assetid/28315;jsessionid=aaa41kNy_Uu1-c

http://www.cs.utep.edu/interval-comp/