N
Nick Maclaren
|> In article <[email protected]>,
|> (e-mail address removed) writes:
|>
|> > This makes no sense, as the outcome of the operation is undefined and
|> > should be NaN.
|> > max(NaN,0.) = NaN
|>
|> Why?
|>
|> > After researching, it appears the first outcome is accepted behavior,
|> > and might be included in the revised IEEE 754 standard, which affects
|> > not only Fortran. The discussion posted at
|> > www.cs.berkeley.edu/~ejr/Projects/ieee754/meeting-minutes/02-11-21.html#minmax
|> > suggests that "There is no mathematical reason to prefer one reason to
|> > another."
|>
|> Don't take this the wrong way. But, the members of the IEEE754
|> committee probably have much more experience than you (and
|> many of the people here in c.l.f) in floating point mathematics.
|> If they came to the conclusion that
|>
|> "There is no mathematical reason to prefer one reason to another."
|>
|> then you may want to pay attention to them, and guard against
|> suspect comparisons.
Well, it is unlikely that any of them have more experience than me in
practical numerical software engineering, though I am less than 1% of
the numerical analyst that Kahan is, to give just one example. Note
that I am not saying that some of them don't have comparable experience,
though I can't think of any offhand.
They are quite simply wrong. There IS a very strong mathematical
argument, and the reasons for making that statement are dogmatism, not
science. I give the reasons for the DECISION below, but that is a
consequent matter. It is the reason for making that STATEMENT I am
talking about above.
The background is that traditional design started by creating a fairly
precise mathematical model, and then deriving the operations to fit in
with that model. This maximises the possibilities of reasoning about
the behaviour of the program (e.g. validation, program proving, software
engineering, etc. etc.) at the expense of restricting flexibility.
The alternative approach is to start with the facilities, maximise
them for flexibility, and let the overall model lie where it falls.
This maximises the flexibility of the design, at the cost of making
static validation somewhere between harder and impossible.
One of Kahan's papers points out that IEEE 754 did the latter, and that
it was a deliberate deviation from the traditional approach.
Jumping aside, the need for missing values occurs almost entirely in
vector or other composite operations, and EVERY language that has
supported them needs BOTH semantics. In particular, the requirement
order for the operations in statistics is:
Count non-missing values in vector
Sum non-missing values in vector
Take mininum/maximum of non-missing values in vector
Take product of non-missing values in vector
Derived operations and more esoteric ones
Also, EVERY language needs BOTH semantics, according to context. For
example, in the following:
top = max(max(vector_A,vector_B))
sum should be the maximum of the elements of vector_A and vector_B
where BOTH of a pair are non-missing. Look at any decent book on
statistics or good statistical package for ample evidence.
IEEE 754 NaNs are VERY clearly indications of 'invalid' values (though
even that has several interpretations, but the subtleties are irrelevant
to this). If they were to be treated as missing values, then it is
immediately clear that NaN+1.23 should be 1.23. No ifs or buts.
I have a paper on this somewhere, which I have circulated but not
published, if anyone is interested.
Regards,
Nick Maclaren.
|> (e-mail address removed) writes:
|>
|> > This makes no sense, as the outcome of the operation is undefined and
|> > should be NaN.
|> > max(NaN,0.) = NaN
|>
|> Why?
|>
|> > After researching, it appears the first outcome is accepted behavior,
|> > and might be included in the revised IEEE 754 standard, which affects
|> > not only Fortran. The discussion posted at
|> > www.cs.berkeley.edu/~ejr/Projects/ieee754/meeting-minutes/02-11-21.html#minmax
|> > suggests that "There is no mathematical reason to prefer one reason to
|> > another."
|>
|> Don't take this the wrong way. But, the members of the IEEE754
|> committee probably have much more experience than you (and
|> many of the people here in c.l.f) in floating point mathematics.
|> If they came to the conclusion that
|>
|> "There is no mathematical reason to prefer one reason to another."
|>
|> then you may want to pay attention to them, and guard against
|> suspect comparisons.
Well, it is unlikely that any of them have more experience than me in
practical numerical software engineering, though I am less than 1% of
the numerical analyst that Kahan is, to give just one example. Note
that I am not saying that some of them don't have comparable experience,
though I can't think of any offhand.
They are quite simply wrong. There IS a very strong mathematical
argument, and the reasons for making that statement are dogmatism, not
science. I give the reasons for the DECISION below, but that is a
consequent matter. It is the reason for making that STATEMENT I am
talking about above.
The background is that traditional design started by creating a fairly
precise mathematical model, and then deriving the operations to fit in
with that model. This maximises the possibilities of reasoning about
the behaviour of the program (e.g. validation, program proving, software
engineering, etc. etc.) at the expense of restricting flexibility.
The alternative approach is to start with the facilities, maximise
them for flexibility, and let the overall model lie where it falls.
This maximises the flexibility of the design, at the cost of making
static validation somewhere between harder and impossible.
One of Kahan's papers points out that IEEE 754 did the latter, and that
it was a deliberate deviation from the traditional approach.
Jumping aside, the need for missing values occurs almost entirely in
vector or other composite operations, and EVERY language that has
supported them needs BOTH semantics. In particular, the requirement
order for the operations in statistics is:
Count non-missing values in vector
Sum non-missing values in vector
Take mininum/maximum of non-missing values in vector
Take product of non-missing values in vector
Derived operations and more esoteric ones
Also, EVERY language needs BOTH semantics, according to context. For
example, in the following:
top = max(max(vector_A,vector_B))
sum should be the maximum of the elements of vector_A and vector_B
where BOTH of a pair are non-missing. Look at any decent book on
statistics or good statistical package for ample evidence.
IEEE 754 NaNs are VERY clearly indications of 'invalid' values (though
even that has several interpretations, but the subtleties are irrelevant
to this). If they were to be treated as missing values, then it is
immediately clear that NaN+1.23 should be 1.23. No ifs or buts.
I have a paper on this somewhere, which I have circulated but not
published, if anyone is interested.
Regards,
Nick Maclaren.