std::norm defect?

Kai-Uwe Bux · Mar 3, 2009

SG said:
That's why I tried to give reasons for that. I'm sorry if it sounded
condescending.

That's not what I tried to communicate. I tried to present an
explanation for it. I guess I failed.

Now, rereading what you wrote, you actually succeeded

I merely said that the last part of it should be obvious. The last
part of the reasoning compared
real * real * sqr( imag/real )
to
imag * imag

Oops, I see. Sorry for the noise.

Taking both parts of the argument together, it provides a pretty good a
priori reason to think that norm_c should be better than norm_b. I guess
that is what you wanted to convey and what I failed to realize.

Best

Kai-Uwe Bux

Zeppe · Mar 3, 2009

highegg wrote [02/03/09 20:53]:

That is exactly what I meant by taking the standard too literally.
std::abs does not, and can not, return the magnitude of x, because in
most cases the magnitude of x is not representable. You should
interpret the standard here as "return an approximation to..." because
nothing else makes sense. With that interpretation, of course, you
lose your reasoning.

yes, I was referring to consistency between functions. But I agree, this
is not mandatory anyway I guess.

I don't think so. The implementation that is turned on by ffast-math
is not violating anything.
But this is what this dispute is all about.

This is the original post in which the gcc implementers discussed the
issue: http://gcc.gnu.org/ml/gcc/2001-10/msg01593.html

You'll find out that the issue that convinced them to leave your
implementation as an option is a loss of precision in certain cases.

You can find numerous applications for sumsq. Nearest neighbour
identification, for instance. Not commonly done in complex spaces,
admittedly.
It's difficult to show real-life examples on demand, but I think most
Octave's user would agree that a factor 8.3 speed-up for a core
reduction function is worth the trouble, even if they don't use it
every other day.

I partly agree, but I value so much "robust" functions that behave
consistently the most of the times. I mean, consistently with the common
sense rather than the standard clauses.

You can only expect that if it is defined that way.

fair enough, but again: if the standard leaves me the freedom to
implement the function in two different ways, I'm not sure I'd go for
the optimised one when another one would give me a more robust result.

Best wishes,

Zeppe

highegg · Mar 3, 2009

highegg wrote [02/03/09 20:53]:

That is exactly what I meant by taking the standard too literally.
std::abs does not, and can not, return the magnitude of x, because in
most cases the magnitude of x is not representable. You should
interpret the standard here as "return an approximation to..." because
nothing else makes sense. With that interpretation, of course, you
lose your reasoning.

Click to expand...

yes, I was referring to consistency between functions. But I agree, this
is not mandatory anyway I guess.

I don't think so. The implementation that is turned on by ffast-math
is not violating anything.
But this is what this dispute is all about.

Click to expand...

This is the original post in which the gcc implementers discussed the
issue:http://gcc.gnu.org/ml/gcc/2001-10/msg01593.html

You'll find out that the issue that convinced them to leave your
implementation as an option is a loss of precision in certain cases.

I see. Good lord. Thanks a billion for that link. So there *is* a
(minuscule) numerical advantage after all. Thinking about underflow
didn't occur to me. As if I didn't know xLASSQ from LAPACK! What a
blunder.
Still, the current implementation is sub-optimal, because doing the
sqrt is really not necessary. It also makes the result less smooth.
Hmmm. But I suppose I'm free to propose a better implementation (if
only I had the time. sigh.)

I wonder what to do for Octave's sumsq, though. I don't want to lose
the factor 8.3 in performance. Doing a splitting point trick will
solve this easily in the column reduction case, but seems painful for
row reduction. hmmmmmm. It's exactly like the min/max vs. NaNs issue.

anyway, many thanks for the insight.

SG · Mar 3, 2009

This is the original post in which the gcc implementers discussed the
issue:http://gcc.gnu.org/ml/gcc/2001-10/msg01593.html

You'll find out that the issue that convinced them to leave your
implementation as an option is a loss of precision in certain cases.

Interesting. Still, I'd like to point out that the cases they tested
were dangerous corner cases anyways. Everyone should try hard to
avoid de-normalized FP numbers. They optimized these corner cases at
the cost of reduced accuracy and performance for 99,9999999% of all
the other cases. Looks like a bad choice to me.

Cheers!
SG

highegg · Mar 3, 2009

Interesting. Still, I'd like to point out that the cases they tested
were dangerous corner cases anyways. Everyone should try hard to
avoid de-normalized FP numbers. They optimized these corner cases at
the cost of reduced accuracy and performance for 99,9999999% of all
the other cases. Looks like a bad choice to me.

Cheers!
SG

SG,
it really depends. You see, in languages like C++ or Fortran,
encapsulated functions for seemingly simple operations typically
prefer numerical robustness, because programmers can (usually)
relatively easily plug in the straightforward formula if they're only
interested in performance, as user-coded functions carry no penalty
compared to built-in functions.

An instance is LAPACK's LASSQ, which uses a sum-and-scale approach to
calculating sum of squares, precisely to avoid underflows, or even
overflows (in some cases it is not even necessary to form the result
explicitly).
An user not interested in these issues can easily code a
straightforward loop (or array reduction in Fortran 90).

However, in interpreted languages like Octave (which I'm a co-
developer of, and which brought me here), the situation is different,
because user-coded functions typically *do* carry a performance
penalty compared to built-in functions, often a significant one (and
Octave still has no JIT :-(). That's why I intend to leave the
behavior of sumsq in the current state, using the straightforward
formula instead of std::norm. Those who prefer accuracy should use
"norm" and square the result.
The funny thing is that I actually faced the same issue before when
reimplementing "norm" for Octave 3.2 (which will avoid any underflows
and overflows etc). I just failed to see it this time.

Still, the std::norm implementation in gcc *can* be improved; the
square root can indeed be avoided, while avoiding the underflows as
well. It's just not that easy.
If I find the time, I'll send them a patch for that part.

many thanks to all who responded

SG · Mar 3, 2009

I see. Good lord. Thanks a billion for that link. So there *is* a
(minuscule) numerical advantage after all. Thinking about underflow
didn't occur to me.

I see that you revoked your GCC bug report. I was just about to
support you on this.

The value of this "numerical advantage" is at least questionable.
This implementation only improves the accuracy in a rather small
subset of corner cases that are dangerously close to "underflow world"
anyways. In all other cases the precision is worse.

IMHO, this doesn't justify an implementation of std::norm that is also
slower.

Cheers!
SG

SG · Mar 3, 2009

SG,
it really depends. You see, in languages like C++ or Fortran,
encapsulated functions for seemingly simple operations typically
prefer numerical robustness, [...]

An instance is LAPACK's LASSQ, which uses a sum-and-scale approach to
calculating sum of squares, precisely to avoid underflows, or even
overflows (in some cases it is not even necessary to form the result
explicitly).

This is all fine and dandy. I would agree with you that it's worth
the hassle to prevent underflows/overflows for things like std::abs.
A std::abs function with overflow/underflow prevention effectivly
doubles the cases it works on with reasonable accuracy. Instead of
only supporting exponents in the range of -511 ... 511 (rough
estimate) it'll work also in cases where numbers with exponents in the
range -1022 ... 1022 are involved (assuming IEEE 754 doubles). That's
a big difference that is worth mentioning.

But I don't see where a complicated std::norm like that helps
anybody. If your program works with numbers/vectors where the squared
norm is almost de-normalized you have a problem either way. A
complicated version of std::norm isn't going to help here much except
that it extends "its domain of not-sucking" by only a teensy bit.
That's not even worth mentioning, IMHO.

simple std::norm version:
-------------------------
PRO: better accuracy in most cases
PRO: fast
CON: bad accuracy on a tiny fraction of cases
that are dangerously close to values where
even the complicated version won't work

complicated std::norm version:
------------------------------
PRO: Underflow prevention for an insignificant
fraction of cases (not really a PRO, is it?)
CON: slow
CON: worse overall accuracy compared to simple
version for almost every case.

Am I making sense?

Cheers!
SG

Martin Eisenberg · Mar 3, 2009

SG said:
On 3 Mrz., 11:28, Zeppe

Interesting. Still, I'd like to point out that the cases they
tested were dangerous corner cases anyways.

In approximating functions, when one makes the right thing happen at
the esoteric extremes it's not for their own sake but because it
typically constrains deviations at not quite so far-out but still
large exponents.

Everyone should try hard to avoid de-normalized FP numbers. They
optimized these corner cases at the cost of reduced accuracy and
performance for 99,9999999% of all the other cases.

It's true that formulations with a lower incidence of denormalized
intermediates are preferable, but denormals exist for when that won't
suffice. Do you feel std library providers are entitled to ignore
them?

A way to implement norm() with good accuracy everywhere and no
complicated operations would be to apply a resumming dot-product
algorithm, such as presented in the paper at
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547 .

Martin

highegg · Mar 4, 2009

SG,
it really depends. You see, in languages like C++ or Fortran,
encapsulated functions for seemingly simple operations typically
prefer numerical robustness, [...]

Click to expand...

An instance is LAPACK's LASSQ, which uses a sum-and-scale approach to
calculating sum of squares, precisely to avoid underflows, or even
overflows (in some cases it is not even necessary to form the result
explicitly).

Click to expand...

This is all fine and dandy. I would agree with you that it's worth
the hassle to prevent underflows/overflows for things like std::abs.
A std::abs function with overflow/underflow prevention effectivly
doubles the cases it works on with reasonable accuracy. Instead of
only supporting exponents in the range of -511 ... 511 (rough
estimate) it'll work also in cases where numbers with exponents in the
range -1022 ... 1022 are involved (assuming IEEE 754 doubles). That's
a big difference that is worth mentioning.

But I don't see where a complicated std::norm like that helps
anybody. If your program works with numbers/vectors where the squared
norm is almost de-normalized you have a problem either way. A
complicated version of std::norm isn't going to help here much except
that it extends "its domain of not-sucking" by only a teensy bit.
That's not even worth mentioning, IMHO.

simple std::norm version:
-------------------------
PRO: better accuracy in most cases
PRO: fast
CON: bad accuracy on a tiny fraction of cases
that are dangerously close to values where
even the complicated version won't work

complicated std::norm version:
------------------------------
PRO: Underflow prevention for an insignificant
fraction of cases (not really a PRO, is it?)
CON: slow
CON: worse overall accuracy compared to simple
version for almost every case.

Am I making sense?

Just an update: I've sent the GCC developers the following patch to
the "complex" header:

Index: libstdc++-v3/include/std/complex
===================================================================
--- libstdc++-v3/include/std/complex (revision 144596)
+++ libstdc++-v3/include/std/complex (working copy)
@@ -649,8 +649,13 @@ _GLIBCXX_BEGIN_NAMESPACE(std)
template<typename _Tp>
static inline _Tp _S_do_it(const complex<_Tp>& __z)
{
- _Tp __res = std::abs(__z);
- return __res * __res;
+ const _Tp __x = abs(__z.real());
+ const _Tp __y = abs(__z.imag());
+ const _Tp __s = std::max(__x, __y);
+ const _Tp __r = std::min(__x, __y);
+ if (__s == _Tp())
+ return __s;
+ return __s * (__s + __r * (__r / __s));
}
};

This seems to win over the present implementation in terms of both
performance and accuracy. It's still not as fast as the
straightforward formula, but I'm now convinced that it *is* a good
idea to avoid the underflow issues.

best regards

SG · Mar 8, 2009

Just an update: I've sent the GCC developers the following patch to
the "complex" header:

Index: libstdc++-v3/include/std/complex
===================================================================
--- libstdc++-v3/include/std/complex (revision 144596)
+++ libstdc++-v3/include/std/complex (working copy)
@@ -649,8 +649,13 @@ _GLIBCXX_BEGIN_NAMESPACE(std)
template<typename _Tp>
static inline _Tp _S_do_it(const complex<_Tp>& __z)
{
- _Tp __res = std::abs(__z);
- return __res * __res;
+ const _Tp __x = abs(__z.real());
+ const _Tp __y = abs(__z.imag());
+ const _Tp __s = std::max(__x, __y);
+ const _Tp __r = std::min(__x, __y);
+ if (__s == _Tp())
+ return __s;
+ return __s * (__s + __r * (__r / __s));
}
};

This is me just nit-picking: You used qualified names for max and min.
In generic user code this is discouraged because it prevents ADL. But
ADL is all you have left if you want min/max calls to resolve to your
own function templates:

namespace mymath
{
template<typename Integer> class rational;

template<typename Integer>
rational<Integer> min(rational<Integer> const& a,
rational<Integer> const& b);
}

This is the only possibility because you can't partially specialize
std::min and std::max. Also, overloading them in the namespace "std"
is illegal as far as the C++ standard is concerned (it usually works,
though).

So, in generic user code (some other namespace than std) you would
write:

template<typename T>
T do_it(const complex<_Tp>& __z)
{
using std::min;
using std::max;
...
const T s = max(x,y); // ADL still works
}

If I remember correctly this is covered in "Effective C++" by Scott
Meyers.

Cheers!
SG

A better technique than initializer_list. Worth a defect report?	6	Jul 27, 2010
std::min and 32/64 bits.	21	Jan 22, 2013
std::abs ambiguity	12	Apr 27, 2009
Visual C++ : Defect report or not?	6	Nov 1, 2007
How does a lambda actually get assigned to a std::function<>?	9	Jan 14, 2012
explicit instantiation of member class	1	Jun 2, 2009
ambiguous overload?	9	Dec 12, 2008
Which one(s) of the following std::vector's member functions has thepossibility/authority to reduce	2	Mar 10, 2014

std::norm defect?

Kai-Uwe Bux

Zeppe

highegg

SG

highegg

SG

SG

Martin Eisenberg

highegg

SG

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads