Numerics, NaNs, IEEE 754 and C99

N

Nick Maclaren

The numerical robustness of Python is very poor - this is not its fault,
but that of IEEE 754 and (even more) C99. In particular, erroneous
numerical operations often create apparently valid numbers, and the
NaN state can be lost without an exception being raised. For example,
try int(float("nan")).

Don't even ASK about complex, unless you know FAR more about numerical
programming than 99.99% of programmers :-(

Now, I should like to improve this, but there are two problems. The
first is political, and is whether it would be acceptable in Python to
restore the semantics that were standard up until about 1980 in the
numerical programming area. I.e. one where anything that is numerically
undefined or at a singularity which can deliver more than one value is
an error state (e.g. raises an an exception or returns a NaN). This
is heresy in the C99 and Java camps, and is none too acceptable in the
IEEE 754R one.

My question here is would such an attempt be opposed tooth and nail
in the Python context, the way it was in C99?

The second is technical. I can trivially provide options to select
between a restricted range of behaviours, but the question is how.
Adding a method to an built-in class doesn't look easy, from my
investigations of floatobject.c, and it is very doubtful that is the
best way, anyway - one of the great problems with "object orientation"
is how it handles issues that occur at class conversions. A run-time
option or reading an environment variable has considerable merit from
a sanity point of view, but calling a global function is also possible.

Any ideas?


Regards,
Nick Maclaren.
 
G

Grant Edwards

The numerical robustness of Python is very poor - this is not its fault,
but that of IEEE 754 and (even more) C99. In particular, erroneous
numerical operations often create apparently valid numbers, and the
NaN state can be lost without an exception being raised. For example,
try int(float("nan")).

Don't even ASK about complex, unless you know FAR more about numerical
programming than 99.99% of programmers :-(

Now, I should like to improve this, but there are two problems. The
first is political, and is whether it would be acceptable in Python to
restore the semantics that were standard up until about 1980 in the
numerical programming area. I.e. one where anything that is numerically
undefined or at a singularity which can deliver more than one value is
an error state (e.g. raises an an exception or returns a NaN).

That's fine as long as the behavior is selectable. I almost
always want a quiet NaN.

While you're at it, the pickle modules need to be fixed so they
support NaN and Inf. ;)
 
N

Nick Maclaren

|> >
|> > Now, I should like to improve this, but there are two problems. The
|> > first is political, and is whether it would be acceptable in Python to
|> > restore the semantics that were standard up until about 1980 in the
|> > numerical programming area. I.e. one where anything that is numerically
|> > undefined or at a singularity which can deliver more than one value is
|> > an error state (e.g. raises an an exception or returns a NaN).
|>
|> That's fine as long as the behavior is selectable. I almost
|> always want a quiet NaN.

That is one of the two modes that I regard as respectable. However,
because integer arithmetic doesn't have a NaN value (which could be
fixed, in Python), anything that returns an integer has to raise an
exception. On that matter, division by zero and several other currently
trapped numeric errors could be modified to return NaN for people like
you (if the option were selected, of course).

I will take a look at adding NaN to integers, but that is a much
hairier hack - and it STILL doesn't deal with comparisons (which can
be done only in a purely functional programming language).

|> While you're at it, the pickle modules need to be fixed so they
|> support NaN and Inf. ;)

Yup. On my list :)


Regards,
Nick Maclaren.
 
G

Grant Edwards

|>> Now, I should like to improve this, but there are two problems. The
|>> first is political, and is whether it would be acceptable in Python to
|>> restore the semantics that were standard up until about 1980 in the
|>> numerical programming area. I.e. one where anything that is numerically
|>> undefined or at a singularity which can deliver more than one value is
|>> an error state (e.g. raises an an exception or returns a NaN).
|>
|> That's fine as long as the behavior is selectable. I almost
|> always want a quiet NaN.

That is one of the two modes that I regard as respectable. However,
because integer arithmetic doesn't have a NaN value (which could be
fixed, in Python), anything that returns an integer has to raise an
exception. On that matter, division by zero and several other currently
trapped numeric errors could be modified to return NaN for people like
you (if the option were selected, of course).

The division by zero trap is really annoying. In my world the
right thing to do is to return Inf.
 
C

Christophe

Grant Edwards a écrit :
The division by zero trap is really annoying. In my world the
right thing to do is to return Inf.

Your world is flawed then, this is a big mistake. NaN is the only
aceptable return value for a division by zero.
 
S

Scott David Daniels

Grant said:
While you're at it, the pickle modules need to be fixed so they
support NaN and Inf. ;)
The NaN problem is portability -- NaN values are not standard, and
pretending they are won't help. There are many possible NaNs, several
of which have desirable behaviors, and different processors (and
Floating Point settings) choose different bit representations for
those NaNs. There are at least: Inf, -Inf, NaN, Ind (Indeterminant).

Being able to pickle some of these will produce values that "don't
behave right" on a different machine. Up until now, I think you can
send a pickle of a data structure and unpickle it on a different
processor to get equivalent data.

--Scott David Daniels
(e-mail address removed)
 
G

Grant Edwards

Grant Edwards a écrit :

Your world is flawed then, this is a big mistake. NaN is the
only aceptable return value for a division by zero.

You're probably right if you're talking about math, but I'm not
doing math. I'm doing engineering. In all of the situations
I've ever encountered, Inf was a much better choice.

Aside from Python, every FP library or processor I've ever used
returned Inf for divide by zero (which is the behavior required
by IEEE 754).

I need my Python programs to work the same way as everything
else.


http://standards.ieee.org/reading/ieee/interp/754-1985.html

In IEEE Std 754-1985, subclause 7.2- Division by Zero, it says:

"If the divisor is zero and the dividend is a finite nonzero
number, then the division by zero shall be signaled. The
result, when no trap occurs, shall be a correctly signed
(infinity symbol)(6.3)."

While this is apparently the convention decided on by the
committee, it is mathematically incorrect and it seems as if
it should have been designated as Not-a-Number, since
division by zero is mathematically undefined and implies that
0*infinity=1, which is patently absurd.

Why was this convention chosen instead of NaN, since it leads
to a further degradation of our children's math abilities,
given that the IEEE floating-point standard would be considered
to be authoritative on this subject, yet produces an erroneous
results.

Interpretation for IEEE Std 754-1985

When a non-zero number is divided by a zero number, that is a
divide by zero. It is interpreted as an attempt to
take a limit of the ratio of two numbers as the denominator
becomes too small to be represented in the number system
while the numerator remains representable. Such a limit is best
represented by an infinity of the appropriate sign.

When zero is divided by zero, no such extrapolation can be
made. If it is caused by an attempt to take the limit of
the ratio of two numbers when both become two small to be
represented, then the limit cannot be determined. If it
is caused by some mistake in the programming, then no limit
exists. Thus, this case is thought to be invalid and a NaN
of appropriate sign is returned. (The sign is the only bit of
information that can be determined.)

While counter examples to the mathematical interpretation of
both of these results can be constructed they tend to be either
the result of extreme scaling or an attempt to evaluate a
non-analytic function. The former can be resolved by
rescaling. But, as the latter involve functions that cannot
(formally) be evaluated on a computer (without extreme effort
anyway) in the region of their non-analyticity, usually no good
solution exists.
 
G

Grant Edwards

The NaN problem is portability -- NaN values are not standard,

My copy of IEEE 754 defines them quite precisely. :)
and pretending they are won't help. There are many possible
NaNs, several of which have desirable behaviors, and different
processors (and Floating Point settings) choose different bit
representations for those NaNs. There are at least: Inf,
-Inf, NaN, Ind (Indeterminant).

I don't think +Inf and -Inf aren't NaNs (in IEEE 754
terminology). I think Pickle ought to handle them as well.
Being able to pickle some of these will produce values that
"don't behave right" on a different machine.

The values "don't behave right" now. Having them "behave
right" on 99.9% of the hosts in the world would be a vast
improvement.
Up until now, I think you can send a pickle of a data
structure and unpickle it on a different processor to get
equivalent data.

No, you can't. Nan, Inf, and Ind floating point values don't
work.
 
G

Gary Herron

Christophe said:
Grant Edwards a écrit :


Your world is flawed then, this is a big mistake. NaN is the only
aceptable return value for a division by zero.
Sorry, but this is not true.

The IEEE standard specifies (plus or minus) infinity as the result of
division by zero. This makes sense since such is the limit of division
by a quantity that goes to zero. The IEEE standard then goes on to
define reasonable results for arithmetic between infinities and real
values. The production of, and arithmetic on, infinities is a choice
that any application may want allow or not.

Gary Herron
 
C

Christophe

Grant Edwards a écrit :
You're probably right if you're talking about math, but I'm not
doing math. I'm doing engineering. In all of the situations
I've ever encountered, Inf was a much better choice.

You should have been more precise then : "In my ideal world, when
dividing a non zero value by a zero value, the result should be +Inf or
-Inf according the the sign rules"

On that point, you should also note that +0 and -0 are sometimes
considered two different floating point numbers in Python :)
 
N

Nick Maclaren

|> > Grant Edwards a =E9crit :
|> >
|> >> The division by zero trap is really annoying. In my world the
|> >> right thing to do is to return Inf.
|> >
|> > Your world is flawed then, this is a big mistake. NaN is the only=20
|> > aceptable return value for a division by zero.
|> >
|> Sorry, but this is not true.
|>
|> The IEEE standard specifies (plus or minus) infinity as the result of
|> division by zero. This makes sense since such is the limit of division
|> by a quantity that goes to zero. The IEEE standard then goes on to
|> define reasonable results for arithmetic between infinities and real
|> values. The production of, and arithmetic on, infinities is a choice
|> that any application may want allow or not.

That is true, and it is numerical nonsense. Christophe is right. Despite
Kahan's eminence as a numerical analyst, he is no software engineer. And
I am amazed at ANY engineer that can believe that inverting zero should
give infinity - while there ARE circumstances where it is correct, it is
NEVER a safe thing to do by default.

The reason is that it is correct only when you know for certain that the
sign of zero is meaningful. IEEE 754 gets this wrong by conflating true
zero, sign-unknown zero and positive zero. Inter alia, it means that
you get situations like:

A = 0.0; B = -A; C = B+0.0; A == B == C; 1/A != 1/B != 1/C;


Regards,
Nick Maclaren.
 
N

Nick Maclaren

|>
|> > While you're at it, the pickle modules need to be fixed so they
|> > support NaN and Inf. ;)
|>
|> The NaN problem is portability -- NaN values are not standard, and
|> pretending they are won't help. There are many possible NaNs, several
|> of which have desirable behaviors, and different processors (and
|> Floating Point settings) choose different bit representations for
|> those NaNs. There are at least: Inf, -Inf, NaN, Ind (Indeterminant).

The main meaning of NaN is Indeterminate (i.e. it could be anything).
If you mean 'Missing', that is what was and is done in statistical
packages, and it follows very different rules. There are several other
meanings of NaN, but let's not go there, here. Though I have document
I could post if people are interested.

|> Being able to pickle some of these will produce values that "don't
|> behave right" on a different machine. Up until now, I think you can
|> send a pickle of a data structure and unpickle it on a different
|> processor to get equivalent data.

No, it could be done right. The unpickling would need to detect those
values and raise an exception. You have to allow for it even on the
same 'systems' because one Python might have been compiled with hard
underflow and one with soft. Your really DON'T want to import denorms
into programs that don't expect them.


Regards,
Nick Maclaren.
 
G

Grant Edwards

You should have been more precise then : "In my ideal world, when
dividing a non zero value by a zero value, the result should be +Inf or
-Inf according the the sign rules"

True. I've been dealing with IEEE 754 so long that I assume
things like that go without saying.
On that point, you should also note that +0 and -0 are sometimes
considered two different floating point numbers in Python :)

Different but equal.

[Don't tell the Supreme Court.]
 
?

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?=

Jeez, 12 posts in this IEEE 754 thread, and still
no message from uncle timmy ? ;)

Please, we need enlightenment here and *now* :)

platform-dependent accident'ly yours,

SB
 
N

Nick Maclaren

|>
|> The IEEE standard specifies (plus or minus) infinity as the result of
|> division by zero. This makes sense since such is the limit of division
|> by a quantity that goes to zero. The IEEE standard then goes on to
|> define reasonable results for arithmetic between infinities and real
|> values. The production of, and arithmetic on, infinities is a choice
|> that any application may want allow or not.

The mistake you have made (and it IS a mistake) is in assuming that the
denominator approaches zero from the direction indicated by its sign.
There are many reasons why it is likely to not be, but let's give only
two:

It may be a true zero - i.e. a count that is genuinely zero, or
the result of subtracting a number from itself.

It may be a negative zero that has had its sign flipped by an
aritfact of the code. For example:

lim(x->0 from above) 0.001*b/(a-1.001*a)

I fully agree that infinity arithmetic is fairly well-defined for
most operations, but it most definitely is not in this case. It should
be reserved for when the operations have overflowed.


Regards,
Nick Maclaren.
 
G

Grant Edwards

Jeez, 12 posts in this IEEE 754 thread, and still no message
from uncle timmy ? ;)

Please, we need enlightenment here and *now* :)

What we need is fewer people like me who do nothing but
complain about it...
 
G

Grant Edwards

|>
|> The IEEE standard specifies (plus or minus) infinity as the result of
|> division by zero. This makes sense since such is the limit of division
|> by a quantity that goes to zero. The IEEE standard then goes on to
|> define reasonable results for arithmetic between infinities and real
|> values. The production of, and arithmetic on, infinities is a choice
|> that any application may want allow or not.

The mistake you have made (and it IS a mistake) is in assuming
that the denominator approaches zero from the direction
indicated by its sign.

I assume the "you" in that sentence refers to the IEEE FP
standards group. I just try to follow the standard, but I have
found that the behavior required by the IEEE standard is
generally what works best for my applications.
There are many reasons why it is likely to not be, but let's give only
two:

It may be a true zero - i.e. a count that is genuinely zero, or
the result of subtracting a number from itself.

I do real-world engineering stuff with measured physical
quatities. There generally is no such thing as "true zero".
I fully agree that infinity arithmetic is fairly well-defined for
most operations, but it most definitely is not in this case. It should
be reserved for when the operations have overflowed.

All I can say is that 1/0 => Inf sure seems to work well for
me.
 
N

Nick Maclaren

|>
|> I assume the "you" in that sentence refers to the IEEE FP
|> standards group. I just try to follow the standard, but I have
|> found that the behavior required by the IEEE standard is
|> generally what works best for my applications.

Well, it could be, but actually it was a reference to the sentence "This
makes sense since such is the limit of division by a quantity that goes
to zero."

|> I do real-world engineering stuff with measured physical
|> quatities. There generally is no such thing as "true zero".

It is extremely unusual for even such programs to use ONLY continuous
interval scale quantities, but they might dominate your usage, I agree.
Such application areas are very rare, but do exist. For example, I can
tell that you don't use statistics in your work, and therefore do not
handle events (including the analysis of failure rates).

|> > I fully agree that infinity arithmetic is fairly well-defined for
|> > most operations, but it most definitely is not in this case. It should
|> > be reserved for when the operations have overflowed.
|>
|> All I can say is that 1/0 => Inf sure seems to work well for me.

Now, can you explain why 1/0 => -Inf wouldn't work as well? I.e. why
are ALL of your zeroes, INCLUDING those that arise from subtractions,
are known to be positive?

If you can, then you have a case (and an EXTREMELY unusual application
domain. If you can't, then I am afraid that your calculations are
unreliable, at best.

The point here is that +infinity is the correct answer when the zero is
known to be a positive infinitesimal, just as -infinity is when it is
known to be a negative one. NaN is the only numerically respectable
result if the sign is not known, or it might be a true zero.


Regards,
Nick Maclaren.
 
C

Christophe

Nick Maclaren a écrit :
Now, can you explain why 1/0 => -Inf wouldn't work as well? I.e. why
are ALL of your zeroes, INCLUDING those that arise from subtractions,
are known to be positive?

I would say that the most common reason people assume 1/0 = Inf is
probably because they do not make use of negative numbers or they forgot
they exist at all.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top