subnormal floating point numbers

ultimatewarrior · May 15, 2007

Hi all,
first of all I beg your pardon if this question has been asked
before, but I was unable to find anything in the past posts.

I have written a piece of code that was supposed to be quite portable,
and uses a lot fp numbers. Everything goes well on PPC cpus, but on
some x86 CPU I get a dramatic loss of performance. After some
investigations I've discovered that the problem is caused by some
numbers becoming subnormal, PPC cpus seems to treat them quietly
without any significant loss of speed whereas on some x86 CPU I get
really very poor results as soon as these "strange" entities begin to
pop out...

Does someone know if there is a way to disable subnormal floating
point numbers, maybe using standard c/c++ libs? I don't need them and
for me it will be just ok to have a small quantity dropped to zero -
in fact, the code is written assuming implicitly that very small
numbers will eventually become zero.

I hope that there is a simple solution for the problem, apart from
reviewing any very single line of code and putting a lot of
if(issubnormal...) or whatever else...

Thanx in advance to everybody that can/will help me...

Walter Roberson · May 15, 2007

ultimatewarrior said:
Does someone know if there is a way to disable subnormal floating
point numbers, maybe using standard c/c++ libs?

Not using C89; I don't -think- it's part of C99 but I'd have to
check.

The manner in which floating point underflow is handled is
system-specific, and is often more complicated than just "on" or "off".
Sorry, I can't point to 'typical' OS calls for this for the x86;
in SGI IRIX, it is handle_sigfpes() but that's probably not a common
name for it.

Ben Pfaff · May 15, 2007

ultimatewarrior said:
Does someone know if there is a way to disable subnormal floating
point numbers, maybe using standard c/c++ libs? I don't need them and
for me it will be just ok to have a small quantity dropped to zero -
in fact, the code is written assuming implicitly that very small
numbers will eventually become zero.

Recent x86 processors do support "flush-to-zero" and
"denormals-are-zeros" modes of operations for SSE-based
arithmetic, but standard C doesn't provide an interface to them.
If you want to access these modes, you'll have to use an
implementation-specific extension. (Note that many compilers
don't use SSE for arithmetic by default anyhow.)

Malcolm McLean · May 16, 2007

ultimatewarrior said:
Hi all,
first of all I beg your pardon if this question has been asked
before, but I was unable to find anything in the past posts.

I have written a piece of code that was supposed to be quite portable,
and uses a lot fp numbers. Everything goes well on PPC cpus, but on
some x86 CPU I get a dramatic loss of performance. After some
investigations I've discovered that the problem is caused by some
numbers becoming subnormal, PPC cpus seems to treat them quietly
without any significant loss of speed whereas on some x86 CPU I get
really very poor results as soon as these "strange" entities begin to
pop out...

Does someone know if there is a way to disable subnormal floating
point numbers, maybe using standard c/c++ libs? I don't need them and
for me it will be just ok to have a small quantity dropped to zero -
in fact, the code is written assuming implicitly that very small
numbers will eventually become zero.

I hope that there is a simple solution for the problem, apart from
reviewing any very single line of code and putting a lot of
if(issubnormal...) or whatever else...

Thanx in advance to everybody that can/will help me...

They are normally called denormalised numbers.
If the code is causing a lot of numbers to go very close to zero there might
well be something wrong with it numerically. That is to say, the mathematics
are accurate but demand unreasonable machine precision. However not
necessarily, and you may not want to rewrite functioning code.
Putting in a lot of if < epsilon then set to zero lines is a horrible answer
you want to avoid if possible, but it might be the only way.
There might be a compiler switch. Search for "denormalise" in your
documentation.

ultimatewarrior · May 16, 2007

They are normally called denormalised numbers.
If the code is causing a lot of numbers to go very close to zero there might
well be something wrong with it numerically. That is to say, the mathematics

Well, I don't think so. In my code there are many statements like:

value -= value * factor * DeltaTime;

where factor * DeltaTime are less than unity. Such statements control
decay of some variables and are a simple implementation of many
natural decay processes.
I'm not interested in the fact that value will eventually become too
small, in fact for my code it's just ok if at some point value become
zero (or stop decaying from a very little value because value * factor
* DeltaTime is too small and become zero), so I'm not demanding for
"unreasonable precision", I simply need a variable that progressively
come to zero (and when it's close to zero, treat it at such!).

Unfortunately such a piece of code is probably responsible for
generating subnormal numbers, and it seems that some x86 chips (P4
especially) are really bad at dealing with such numbers:

http://www.cygnus-software.com/papers/x86andinfinity.html

....and it seems that there is no way to disable subnormals on x87
FPUs, so no compiler switch may exist...

ultimatewarrior · May 16, 2007

I've probably found a workaround, using SSE instead of
standard FPU you can control (through the MXCSR register) how to deal
with subnormal numbers, you can instruct the CPU to treat them as
zeros (exactly what I need), unfortunately you need SSE because the
x87 FPU has no control over this (ok, SSE is there from PIII and
AthlonXP so I think I'll drop support for older CPUs...)

Has someone any experience with issues regarding the use of SSE for
floats (instead of the standard FPU)?

(BTW, thanx for your kind answers!)

Pierre Asselin · May 16, 2007

ultimatewarrior said:
I've probably found a workaround, [ to slow denormals ]
using SSE instead of standard FPU [ ... ]

Has someone any experience with issues regarding the use of SSE for
floats (instead of the standard FPU)?

Just that it slows my code down a little, compared to x87 instructions.
Also, I have the same problem: my code generates denormals and
slows down to a crawl.

In my case, I have lots of unit vectors (x,y,z), x^2+y^2+z^2 = 1.
The slowdown occurs when the vectors line up along one of the
coordinate axes, e.g. (1.0, denormal, denormal). That's completely
normal and a hard underflow to (1.0, 0.0, 0.0) would suit me just
fine. At the moment, I test for small numbers in one of my routines
and manually flush them to zero

I consider myself lucky that
my code has a natural place to perform this test.

For this code, I wish I could disable the FPU's denormals and just
flush to zero but I don't see how. C99 has better IEEE-754 support
than C89, but still not enough. That darned standard is pretty
hard to integrate with high-level languages.

Thanks for the cygnus url, BTW.

Ernie Wright · May 16, 2007

ultimatewarrior said:
Well, I don't think so. In my code there are many statements like:

value -= value * factor * DeltaTime;

That would be exactly what Malcolm was talking about. Repeated addition
and subtraction tends to accumulate roundoff error. I'd probably write
this as

value *= 1.0 - factor * DeltaTime;

Mathematically equivalent, same number of operations, but numerically
more stable, and might be slightly faster since one of the terms is a
constant. It's also a little clearer that 'value' goes to 0 when

factor * DeltaTime >= 1.0

A different 'factor' would allow you to eliminate the subtraction. If
'factor' and 'DeltaTime' aren't changing, you could precalculate a
decay_constant that eliminates a multiply,

value *= decay_constant;

But if that's the case, then mathematically, 'value' *never* reaches 0
(except trivially when decay_constant == 0) and computationally, it
*always* decays to a denormal. You might save thousands of iterations
and eliminate your denormal problem by explicitly setting 'value' to 0
when it gets "small enough."

- Ernie http://home.comcast.net/~erniew

user923005 · May 16, 2007

That would be exactly what Malcolm was talking about. Repeated addition
and subtraction tends to accumulate roundoff error. I'd probably write
this as

value *= 1.0 - factor * DeltaTime;

Mathematically equivalent, same number of operations, but numerically
more stable, and might be slightly faster since one of the terms is a
constant. It's also a little clearer that 'value' goes to 0 when

factor * DeltaTime >= 1.0

A different 'factor' would allow you to eliminate the subtraction. If
'factor' and 'DeltaTime' aren't changing, you could precalculate a
decay_constant that eliminates a multiply,

value *= decay_constant;

But if that's the case, then mathematically, 'value' *never* reaches 0
(except trivially when decay_constant == 0) and computationally, it
*always* decays to a denormal. You might save thousands of iterations
and eliminate your denormal problem by explicitly setting 'value' to 0
when it gets "small enough."

Many people are surprised to learn that adding a column of floating
point numbers forwards and then backwards rarely results in the same
value.

There are two good ways to add numbers to a sum to reduce roundoff:
1. Sort from smallest to largest (magnitude) and add the smallest
first.
2. Use Kahan's compensated summation.

#include <stdio.h>
#include <stdlib.h>
#include <float.h>
#include <time.h>
#include <math.h>
#define A_LEN 1000
float foo[A_LEN];
double bar[A_LEN];

int compared(void *a, void *b)
{
double aa = fabs(*(double *) a);
double bb = fabs(*(double *) b);
return (aa > bb ? 1 : aa == bb ? 0 : -1);
}
int comparef(void *a, void *b)
{
float aa = fabsf(*(float *) a);
float bb = fabsf(*(float *) b);
return (aa > bb ? 1 : aa == bb ? 0 : -1);
}

float fkcsum(float *x, size_t i)
{
float f = 0.;
float e = 0.;
float temp,
y;
size_t index;

for (index = 0; index < i; index++) {
temp = f;
y = x[index] + e;
f = temp + y;
e = (temp - f) + y;
}
return f;
}

double dkcsum(double *x, size_t i)
{
double d = 0.;
double e = 0.;
double temp,
y;
size_t index;

for (index = 0; index < i; index++) {
temp = d;
y = x[index] + e;
d = temp + y;
e = (temp - d) + y;
}
return d;
}

int main(void)
{
long i;
double df;
float ff;
double db;
float fb;
double ds;
float fs;
double d;
srand((unsigned) time(NULL));
for (i = 0; i < A_LEN; i++) {
d = rand() / (rand() + 1.0);
d *= d;
if (rand() % 2)
d = -d;
foo = (float) d;
bar = d;
}
ff = 0;
df = 0;
for (i = 0; i < A_LEN; i++) {
ff += foo;
df += bar;
}
fb = 0;
db = 0;
for (i = A_LEN - 1; i >= 0; i--) {
fb += foo;
db += bar;
}
qsort(foo, A_LEN, sizeof foo[0], comparef);
qsort(bar, A_LEN, sizeof bar[0], compared);
fs = 0;
ds = 0;
for (i = 0; i < A_LEN; i++) {
fs += foo;
ds += bar;
}
printf("forward float sum = %.*f\n", FLT_DIG, ff);
printf("backward float sum = %.*f\n", FLT_DIG, fb);
printf("kahan float sum = %.*f\n", FLT_DIG, fkcsum(foo,
A_LEN));
printf("sorted float sum = %.*f\n", FLT_DIG, fs);
printf("forward double sum = %.*f\n", DBL_DIG, df);
printf("backward double sum = %.*f\n", DBL_DIG, db);
printf("kahan double sum = %.*f\n", DBL_DIG, dkcsum(bar,
A_LEN));
printf("sorted double sum = %.*f\n", DBL_DIG, ds);
return 0;
}

user923005 · May 16, 2007

That would be exactly what Malcolm was talking about. Repeated addition
and subtraction tends to accumulate roundoff error. I'd probably write
this as

value *= 1.0 - factor * DeltaTime;

Mathematically equivalent, same number of operations, but numerically
more stable, and might be slightly faster since one of the terms is a
constant. It's also a little clearer that 'value' goes to 0 when

factor * DeltaTime >= 1.0

A different 'factor' would allow you to eliminate the subtraction. If
'factor' and 'DeltaTime' aren't changing, you could precalculate a
decay_constant that eliminates a multiply,

value *= decay_constant;

But if that's the case, then mathematically, 'value' *never* reaches 0
(except trivially when decay_constant == 0) and computationally, it
*always* decays to a denormal. You might save thousands of iterations
and eliminate your denormal problem by explicitly setting 'value' to 0
when it gets "small enough."

We can reduce roundoff by sorting first by magnitude or by using
compensated summation.
A previous version of this test program had an error, where I did the
sort before performing compensated summation, which gave compensated
summation an unfair advantage.

#include <stdio.h>
#include <stdlib.h>
#include <float.h>
#include <time.h>
#include <math.h>
#define A_LEN 1000
float foo[A_LEN];
double bar[A_LEN];

int compared(void *a, void *b)
{
double aa = fabs(*(double *) a);
double bb = fabs(*(double *) b);
return (aa > bb ? 1 : aa == bb ? 0 : -1);
}
int comparef(void *a, void *b)
{
float aa = fabsf(*(float *) a);
float bb = fabsf(*(float *) b);
return (aa > bb ? 1 : aa == bb ? 0 : -1);
}

float fkcsum(float *x, size_t i)
{
float f = 0.;
float e = 0.;
float temp,
y;
size_t index;

for (index = 0; index < i; index++) {
temp = f;
y = x[index] + e;
f = temp + y;
e = (temp - f) + y;
}
return f;
}

double dkcsum(double *x, size_t i)
{
double d = 0.;
double e = 0.;
double temp,
y;
size_t index;

for (index = 0; index < i; index++) {
temp = d;
y = x[index] + e;
d = temp + y;
e = (temp - d) + y;
}
return d;
}

int main(void)
{
long i;
double df;
float ff;
double db;
float fb;
double ds;
float fs;
double dk;
float fk;
double d;
srand((unsigned) time(NULL));
for (i = 0; i < A_LEN; i++) {
d = rand() / (rand() + 1.0);
d *= d;
if (rand() % 2)
d = -d;
foo = (float) d;
bar = d;
}
ff = 0;
df = 0;
for (i = 0; i < A_LEN; i++) {
ff += foo;
df += bar;
}
fb = 0;
db = 0;
for (i = A_LEN - 1; i >= 0; i--) {
fb += foo;
db += bar;
}

fk = fkcsum(foo, A_LEN);
dk = dkcsum(bar, A_LEN);

qsort(foo, A_LEN, sizeof foo[0], comparef);
qsort(bar, A_LEN, sizeof bar[0], compared);
fs = 0;
ds = 0;
for (i = 0; i < A_LEN; i++) {
fs += foo;
ds += bar;
}
printf("forward float sum = %.*f\n", FLT_DIG, ff);
printf("backward float sum = %.*f\n", FLT_DIG, fb);
printf("kahan float sum = %.*f\n", FLT_DIG, fk);
printf("sorted float sum = %.*f\n", FLT_DIG, fs);
printf("forward double sum = %.*f\n", DBL_DIG, df);
printf("backward double sum = %.*f\n", DBL_DIG, db);
printf("kahan double sum = %.*f\n", DBL_DIG, dk);
printf("sorted double sum = %.*f\n", DBL_DIG, ds);
return 0;
}

Klaus Bahner · May 16, 2007

Hi,

First of all, forgive me to being vague in the following, but it is
years ago that I dealt with similar problems. Unfortunately, I forgot
most of the details.

first of all I beg your pardon if this question has been asked
before, but I was unable to find anything in the past posts.

I have written a piece of code that was supposed to be quite portable,
and uses a lot fp numbers. Everything goes well on PPC cpus, but on

In the PPCs I knew, there was/were bits in the FPU configuration
registers, which defined how they were handling denormalized numbers.
For example, the older MacOS's did in fact a rather poor job on setting
these (basically disabling handling of denormalized numbers), because
mathematically speaking, they caused unnecessary imprecise calculation.
On the other hand it was surprisingly easy to access these settings from
an application (The downside was that this setting affected all other
applications too :-( )

some x86 CPU I get a dramatic loss of performance. After some
investigations I've discovered that the problem is caused by some
numbers becoming subnormal, PPC cpus seems to treat them quietly

I guess the PPC is not handling them at all :-(

without any significant loss of speed whereas on some x86 CPU I get
really very poor results as soon as these "strange" entities begin to
pop out...

At least when I worked with this stuff (before Windows 2000 and up to
the P3) the performance loss was caused by the FPU exception handler
software which executes a lot of code. IIRC, you are right about that
there is no (at least no easy way) to prevent the FPU from taking the
exception. Again, IIRC, the only way out of it would be to write your
own FPU exception handling routines, which is a non trivial task. Years
ago there was a very informative document on Intels web site titled
something like "Writing FPU exception handlers" or something like that.
Sorry again, for not being able to be more precise, but I don't have the
document any longer and my memory about this issue is fading away ...

Does someone know if there is a way to disable subnormal floating
point numbers, maybe using standard c/c++ libs? I don't need them and

I don't think so.

for me it will be just ok to have a small quantity dropped to zero -
in fact, the code is written assuming implicitly that very small
numbers will eventually become zero.

I hope that there is a simple solution for the problem, apart from
reviewing any very single line of code and putting a lot of
if(issubnormal...) or whatever else...

Well, the easiest solution, which may or may not be suitable for your
problem is to add a tiny constant to your calculations, just preventing
the results from getting denormalized. If your calculations have a
natural resolution limit, just add some noise below that resolution
limit and the problem probably disappears.

Finally, how do you declare your variables floats or doubles? Floats may
indeed be a performance buster, because internally the x86 (up to the
P3) worked on doubles. If you declare your variables as floats the
compiler issues the conversion instruction. In my case it was basically
the double to float conversion instruction, which caused both the
performance drop and also the FPU exception.

Sorry again for not being able to be more specific, but I hope it may point you anyway in the right direction..

Good luck,
Klaus

ultimatewarrior · May 17, 2007

I guess the PPC is not handling them at all :-(

No, no, the PPC IS handling them, I did some benchmarks forcing
subnormal numbers and I can assure you that the results are correct.

From my benchmarks, PPC and AMD cpus works quietly with subnormals,

whereas stupid Intel chips have lots of problems (see again
http://www.cygnus-software.com/papers/x86andinfinity.html). If you
like I can send you a couple of very simple test exe that simply adds
togheter some nnn numbers, if you test them on AMD chips everything is
fine, on Intel... well, you must see it to believe!...

the P3) the performance loss was caused by the FPU exception handler

Well, it seems that on modern Intel chips the problem is just
conversion from subnormals to 80bit internal representation...

Well, the easiest solution, which may or may not be suitable for your
problem is to add a tiny constant to your calculations, just preventing

At this point, simply checking for small quantity and forcing them to
zero is simpler. However, SSE seems to solve the problem,
unfortunately there is a global performance loss (as someone else
pointed out) using SSE as an "FPU replacemente", so I'm still
investingating if it's better to track low quantities and flush them
to zero instead of resorting to SSE...

Finally, how do you declare your variables floats or doubles? Floats may
indeed be a performance buster, because internally the x86 (up to the
P3) worked on doubles. If you declare your variables as floats the

I use floats, for memory useage more than for precision. They seem to
be faster, at least on my code, for the simple reason that there are
LOTS of them and, probably, I'm saving bandwidth to/from the RAM bus
(which nowadays is really important and probably the real
bottleneck)...

ultimatewarrior · May 17, 2007

I guess the PPC is not handling them at all :-(

No, no, the PPC IS handling them, I've written a simple test program
that adds some fp numbers, both normal and subnormals, PPCs and AMDs
chips have no problem at all dealing with them and the results, at the
end, are correct - they're just stupid Intel FPUs that have problem
with subnormal numbers (see http://www.cygnus-software.com/papers/x86andinfinity.html).
In my test, a P4 Intel FPU times (about) 1.4 sec to complete the loop,
and 24 (yes, TWENTY FOUR!) seconds to do the same loop with
subnormals, whereas PPCs and AMDs took slighlty more than the normal
case (the final results are the same on the three CPUs, at least)...

user923005 · May 17, 2007

ultimatewarrior wrote:

... snip ...

What is a 'subnormal' number?

He means denormal.
Some floating point implementations have gradual underflow and
overflow.
For instance, when you are at the supposed minium value and you
multiply by 0.5, the mantissa just marches one place to the right. So
we have lost one bit of precision. This continues until all the bits
are gone and we have total loss of significance. The danger of a
denormal number is that we are losing bits of precision as the number
gets more and more denormalized.

The C standard indirectly addresses this issue (for instance) in
footnote 195:
195) The term underflow here is intended to encompass both ''gradual
underflow'' as in IEC 60559 and also ''flush-to-zero'' underflow.

Klaus Bahner · May 17, 2007

Hi again,

In my test, a P4 Intel FPU times (about) 1.4 sec to complete the loop,
and 24 (yes, TWENTY FOUR!) seconds to do the same loop with
subnormals, whereas PPCs and AMDs took slighlty more than the normal
case (the final results are the same on the three CPUs, at least)...

I'm not doubting your findings, but a word of caution: You say, that the
results are the same and use that as indication for that all CPUs are
handling the denormalized numbers. This might not be the correct
conclusion, depending on whether "adding zeroes" (i.e. denormalized
floating point numbers) to your calculation changes the result at all.
Especially since you are saying that you work with floats. The PPC and
AMD processors may just cut them to zero (i.e. not handling them) and
hence being faster than the Intel processors.

Klaus

Keith Thompson · May 18, 2007

user923005 said:
He means denormal.
Some floating point implementations have gradual underflow and
overflow.
For instance, when you are at the supposed minium value and you
multiply by 0.5, the mantissa just marches one place to the right. So
we have lost one bit of precision. This continues until all the bits
are gone and we have total loss of significance. The danger of a
denormal number is that we are losing bits of precision as the number
gets more and more denormalized.

The C standard indirectly addresses this issue (for instance) in
footnote 195:
195) The term underflow here is intended to encompass both ''gradual
underflow'' as in IEC 60559 and also ''flush-to-zero'' underflow.

It also addresses it more directly in C99 7.12, describing the
classification macros. C99 uses the term "subnormal". (I'm not sure
whether "subnormal" and "denormal" refer to the same thing.)

Chris Torek · May 18, 2007

It also addresses it more directly in C99 7.12, describing the
classification macros. C99 uses the term "subnormal". (I'm not sure
whether "subnormal" and "denormal" refer to the same thing.)

Yes, at least in this case. See also a (very) brief description at
<http://web.torek.net/torek/c/numbers.html#Floating>.

user923005 · May 18, 2007

Yes, at least in this case. See also a (very) brief description at
<http://web.torek.net/torek/c/numbers.html#Floating>.

It seems you are fairly well along on your way to a complete book.
Hint, hint.

lawrence.jones · May 18, 2007

Keith Thompson said:
C99 uses the term "subnormal". (I'm not sure
whether "subnormal" and "denormal" refer to the same thing.)

In general, "subnormal" is a subset of "denormal". A subnormal is a
number that is too small to represent in a normalized form whereas a
denormal is just a number that is not represented in normalized form,
although it may be possible to do so. The IEEE floating point formats
do not allow denormals other than subnormals, but other floating point
formats do.

-Larry Jones

These child psychology books we bought were such a waste of money.
-- Calvin's Mom

Charles Richmond · May 20, 2007

CBFalconer said:
ultimatewarrior wrote:
... snip ...

What is a 'subnormal' number?

Maybe he means unnormalized. But I thought that with IEEE
floating point, *all* the floating point numbers *had* to
be normalized so the "phantom 0" could be recognized
correctly.

subnormal floating point numbers	5	May 15, 2007
Avoiding NaN and Inf on floating point division	14	Jan 4, 2014
Java OpenJDK Floating Point Dare	3	Jan 17, 2023
Floating point library RC	0	May 27, 2016
"Shifting" floating point numbers	18	Mar 21, 2006
subtraction of floating point numbers	2	Feb 24, 2012
Accessing array elements via floating point formats.	33	Dec 10, 2010
binary for floating point numbers - small?	1	Jan 28, 2011

subnormal floating point numbers

ultimatewarrior

Walter Roberson

Ben Pfaff

Malcolm McLean

ultimatewarrior

ultimatewarrior

Pierre Asselin

Ernie Wright

user923005

user923005

Klaus Bahner

ultimatewarrior

ultimatewarrior

user923005

Klaus Bahner

Keith Thompson

Chris Torek

user923005

lawrence.jones

Charles Richmond

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads