Basics on real floating types

Francis Moreau · Nov 10, 2009

Hello,

I have never needed to use the floating types until recently (yeah,
embedded system background mostly). And when I really had to use it,
fixed point numbers were actually used.

So now, I'd like to learn when 'double' should be used instead of
'float' for example. I know the answer is precision, but I'd like to be
able for a given application to calculate the accumulated error and to
deduce from it the best floating type to use.

BTW, what the 'cost' of the usage of 'double' over 'float' on
architecture without a floating point unit ? IOW does it really worth to
wonder such questions or can I simply always use 'double' type ?

Could anybody provide some usefull pointers ?

Thanks

Seebs · Nov 10, 2009

So now, I'd like to learn when 'double' should be used instead of
'float' for example. I know the answer is precision, but I'd like to be
able for a given application to calculate the accumulated error and to
deduce from it the best floating type to use.

The answer is: This depends a lot on your system.

BTW, what the 'cost' of the usage of 'double' over 'float' on
architecture without a floating point unit ? IOW does it really worth to
wonder such questions or can I simply always use 'double' type ?

It really depends HUGELY on your specific system. There's a fair bit
of information about floating point range and precision which is provided
by the system vendor (usually the compiler vendor); you should start by
looking at that. Also, you may want to consider the possibility that
float really ISN'T what you want. In many fairly common cases, you
may really want to do what is effectively fixed-point work, or
arbitrary-precision work, or something else. In short, you need to
look more closely at your problem. If you're doing financial work,
for instance, you almost certainly don't want to use floating point at
all ever. If you're doing physics, the question of what precision you
want or need varies widely... As does the question of what ranges you
need to deal with.

So basically, I don't think your question can be directly answered, but...

For starters, figure out more specific requirements for your task.
What are your performance targets? What are your accuracy requirements?
Is a 1% error tolerable or fatal? How about a 0.01% error? How many
calculations do you have to perform before you can go back to "fresh"
data? How reliable are your inputs? If you're doing a handful of
calculations on each input datum, and the inputs come from a device which
reports values with a 3% error margin, you probably don't care at all
about the floating point errors...

Combine that with information about your implementation. If you have
performance concerns, there's nothing for it but benchmarking. If you
have serious performance problems, you may want to look at non-portable
things, or at compiler options to tweak floating-point performance.

Could anybody provide some usefull pointers ?

void *v1, v2;

-s

Keith Thompson · Nov 10, 2009

Seebs said:
void *v1, v2;

Did we forget something? Hmmm?

Seebs · Nov 10, 2009

Did we forget something? Hmmm?

Me? Make *typos*? UNPOSSIBLE!

(And you also, I think, were the only person to spot the extra spaces
in my example shell code from the "how do you handle forward references
in pedagogical material" example.)

-s

Dann Corbit · Nov 10, 2009

Hello,

I have never needed to use the floating types until recently (yeah,
embedded system background mostly). And when I really had to use it,
fixed point numbers were actually used.

So now, I'd like to learn when 'double' should be used instead of
'float' for example. I know the answer is precision, but I'd like to be
able for a given application to calculate the accumulated error and to
deduce from it the best floating type to use.

You should use double when FLT_DIG digits of precision is not enough to
perform your calculation and create a usable answer. To know when, read
this:
http://docs.sun.com/source/806-3568/ncg_goldberg.html

BTW, what the 'cost' of the usage of 'double' over 'float' on
architecture without a floating point unit ?

Benchmark it.

IOW does it really worth to
wonder such questions or can I simply always use 'double' type ?

You can, the question is: "Should you?"

Keith Thompson · Nov 10, 2009

pete said:
The default argument promotions make (double)
to be the more natural type than (float).

N869
6.5.2.2 Function calls
Constraints

[#6] If the expression that denotes the called function has
a type that does not include a prototype, the integer
promotions are performed on each argument, and arguments
that have type float are promoted to double. These are
called the default argument promotions.

The default argument promotions should rarely if ever affect your
choice of which type to use. Normally *all* functions should have
visible prototypes, so the promotions should occur only for variadic
functions such as printf.

Dann Corbit · Nov 10, 2009

pete said:
pete said:

The default argument promotions make (double)
to be the more natural type than (float).

N869
6.5.2.2 Function calls
Constraints

[#6] If the expression that denotes the called function has
a type that does not include a prototype, the integer
promotions are performed on each argument, and arguments
that have type float are promoted to double. These are
called the default argument promotions.

Click to expand...

The default argument promotions should rarely if ever affect your
choice of which type to use. Normally *all* functions should have
visible prototypes, so the promotions should occur only for variadic
functions such as printf.

The usual reason to use float instead of double is those very rare
instances where you are cramped for memory space. On modern systems the
answer is "almost never". There may be some other rare circumstances
where float is a good idea. I guess that if you have them, then you
know about it. (E.g. you might have to do super-fast FFTs on 100x100
matrices, and using double is not fast enough, and you only need three
significant digits of precision in the output and the matrices always
have a good condition number).

The only other reason I can think of is that existing interfaces use
float (e.g. proj.4 -- which I converted to double anyway). Or perhaps
you have to write a database routine that handles floats to store and
retrieve them from disk.

With memory at low, low prices it seem hard to come up with sensible
reasons to prefer float over double for the vast majority of
applications.

If (for some reason) you do have to use float, be very careful about
precision loss, because it is exacerbated in low precision.

Keith Thompson · Nov 11, 2009

Dann Corbit said:
With memory at low, low prices it seem hard to come up with sensible
reasons to prefer float over double for the vast majority of
applications.

[...]

However much memory you've got, there will still be times (probably
rarely) when being able to store twice as many numbers is worthwhile.

Nick Keighley · Nov 11, 2009

I have never needed to use the floating types until recently (yeah,
embedded system background mostly). And when I really had to use it,
fixed point numbers were actually used.

which of course have their own traps and pitfalls

So now, I'd like to learn when 'double' should be used instead of
'float' for example.

nearly always

I know the answer is precision, but I'd like to be
able for a given application to calculate the accumulated error and to
deduce from it the best floating type to use.

BTW, what the 'cost' of the usage of 'double' over 'float' on
architecture without a floating point unit ?

It would depend on the library provided. You'll have to check the docs
for your implementation. Most desk tops and servers will have FP
hardware these days.

IOW does it really worth to
wonder such questions or can I simply always use 'double' type ?

Could anybody provide some usefull pointers ?

docs.sun.com/source/806-3568/ncg_goldberg.html

not exactly beginners' material but it will make you think

Eric Sosman · Nov 11, 2009

Dann said:
[...]
With memory at low, low prices it seem hard to come up with sensible
reasons to prefer float over double for the vast majority of
applications. [...]

One significant influence can be speed. Memory may be
at low, low prices but it's also at slow, slow speeds compared
to the CPU's. So you've got three or so levels of cache
between the CPU and the memory to paper over the latter's
slowness. Cache memory doesn't carry a low, low price; there's
a reason for the "$" in abbreviated notations for caches. Also,
caches are usually about three to four decimal orders of magnitude
smaller than the memories they front for.

So if you can fit twice as many `float' values as `double'
values in your cache, getting twice as much "bang for buck"
out of each memory transaction, you might be able to get through
your calculation in half the time. Usually other things get in
the way and make the speedup far less dramatic -- you'll be
multiplying two row-major matrices, say, and if one is cache-
friendly the other will be cache-hostile. Still, there's a good
deal of potential for speedup, and some of it can be realized
if memory pressure is reduced.

bartc · Nov 11, 2009

Eric Sosman said:
Dann said:

[...]
With memory at low, low prices it seem hard to come up with sensible
reasons to prefer float over double for the vast majority of
applications. [...]

Click to expand...

One significant influence can be speed. Memory may be

What happened to your disdain for the little tin god?

Francis Moreau · Nov 11, 2009

Dann Corbit said:
The usual reason to use float instead of double is those very rare
instances where you are cramped for memory space. On modern systems
the answer is "almost never".
There may be some other rare circumstances where float is a good idea.
I guess that if you have them, then you know about it. (E.g. you
might have to do super-fast FFTs on 100x100 matrices, and using double
is not fast enough, and you only need three significant digits of
precision in the output and the matrices always have a good condition
number).

So it sounds that you assume computation using 'float' are faster than
ones using 'double'.

This would make a good reason to use float.

The only other reason I can think of is that existing interfaces use
float (e.g. proj.4 -- which I converted to double anyway). Or perhaps
you have to write a database routine that handles floats to store and
retrieve them from disk.

With memory at low, low prices it seem hard to come up with sensible
reasons to prefer float over double for the vast majority of
applications.

Speed... if indeed 'float' is faster (on CPUs which don't have any FPU,
you need to use routines for floating point emulation. And I can expect
these routines to be slower when dealing with larger floating types, but
it's just a guess).

By analogy, when dealing with integer types, I don't use 'long long'
type everywhere I need an integer type, because these are larger hence
slower to use (at least on 32 bits CPUs).

If (for some reason) you do have to use float, be very careful about
precision loss, because it is exacerbated in low precision.

Yeah and one of my initial questions was how to 'measure' the cumulated
error and thus deduce if I need a float, double or whatever.

Thanks

Francis Moreau · Nov 11, 2009

Dann Corbit said:
You should use double when FLT_DIG digits of precision is not enough to
perform your calculation and create a usable answer. To know when, read
this:
http://docs.sun.com/source/806-3568/ncg_goldberg.html

ok I'll read it

Thanks

Francis Moreau · Nov 11, 2009

Seebs said:
The answer is: This depends a lot on your system.

It really depends HUGELY on your specific system. There's a fair bit
of information about floating point range and precision which is provided
by the system vendor (usually the compiler vendor); you should start by
looking at that.

Assume a 32 bits RISC CPU without FPU, the typical things you can find
in embedded systems nowdays.

Thoses systems rely on floating point emulation which is nothing more
than some software routines. I would expect them to be quite common for
all those systems.

So people who implement them should be able to tell: "yes, using
'double' in this case is much slower than 'float'".

I took a look to the GCC documentation but can't find this sort of
information.

Also, you may want to consider the possibility that float really ISN'T
what you want. In many fairly common cases, you may really want to do
what is effectively fixed-point work, or arbitrary-precision work, or
something else. In short, you need to look more closely at your
problem. If you're doing financial work, for instance, you almost
certainly don't want to use floating point at all ever. If you're
doing physics, the question of what precision you want or need varies
widely... As does the question of what ranges you need to deal with.

So basically, I don't think your question can be directly answered, but...

For starters, figure out more specific requirements for your task.
What are your performance targets? What are your accuracy requirements?
Is a 1% error tolerable or fatal? How about a 0.01% error? How many
calculations do you have to perform before you can go back to "fresh"
data? How reliable are your inputs?

They are good questions to ask but I don't know the answers. Actually
seeing a practical example would help a lot, but can't find anything
with google.

[...]

void *v1, v2;

*v1;

warning: dereferencing â€˜void *â€™ pointer

Thanks

bartc · Nov 11, 2009

Francis Moreau said:
So it sounds that you assume computation using 'float' are faster than
ones using 'double'.

This would make a good reason to use float.

With an FPU they might be about the same speed.

With software emulation, float should be faster if they have their own
emulation routines. If floats are emulated using the same routines as
double, they might be slower (because of having to convert).

Easiest just to test a few benchmarks if you already have a C compiler for
your target machine.

Yeah and one of my initial questions was how to 'measure' the cumulated
error and thus deduce if I need a float, double or whatever.

If float was all that was available, you would probably find a way of
getting around the problems of precision.

And if float is significantly faster than double, you might also find the
extra effort to use floats worthwhile, or perhaps some mix.

However, you could just try out floats and see how well your application
works.

Eric Sosman · Nov 11, 2009

bartc said:
Eric Sosman said:

Dann said:

[...]
With memory at low, low prices it seem hard to come up with sensible
reasons to prefer float over double for the vast majority of
applications. [...]

Click to expand...

One significant influence can be speed. Memory may be

Click to expand...

What happened to your disdain for the little tin god?

What happened to your ability to discriminate between
a (potential) twofold speedup and the overhead of one trivial
function call? Size matters.

bartc · Nov 11, 2009

Eric Sosman said:
bartc said:

Eric Sosman said:

Dann Corbit wrote:
[...]
With memory at low, low prices it seem hard to come up with sensible
reasons to prefer float over double for the vast majority of
applications. [...]

One significant influence can be speed. Memory may be

Click to expand...

What happened to your disdain for the little tin god?

Click to expand...

What happened to your ability to discriminate between
a (potential) twofold speedup and the overhead of one trivial
function call? Size matters.

There's nothing wrong with it. I've just tested the overhead of adding one
extra level of function call to a (admittedly fairly trivial) function. It
was something like 50% (and up to 100% on one test). That's not far off the
figures you're talking about.

Now an application probably won't spend all it's time calling that wrapper
function, but then it might not spend all it's time calculating with floats
either.

And of course that extra wrapper code will also require space in the
instruction cache. If function call overhead didn't matter than no compiler
would ever bother with inlining.

Eric Sosman · Nov 11, 2009

bartc said:
Eric Sosman said:

bartc said:

Dann Corbit wrote:
[...]
With memory at low, low prices it seem hard to come up with
sensible reasons to prefer float over double for the vast majority
of applications. [...]

One significant influence can be speed. Memory may be

What happened to your disdain for the little tin god?

Click to expand...

What happened to your ability to discriminate between
a (potential) twofold speedup and the overhead of one trivial
function call? Size matters.

Click to expand...

There's nothing wrong with it. I've just tested the overhead of adding
one extra level of function call to a (admittedly fairly trivial)
function. It was something like 50% (and up to 100% on one test). That's
not far off the figures you're talking about.

The context of the trivial function call (in the "lack of
default function parameter" thread) was avoiding quote "lots
of duplication" end quote. We're not talking about a trivial
wrapper around a trivial function, but about a trivial wrapper
around a substantial function, one with "lots" of innards to
avoid duplicating.

Yes, it's quite possible that in

int foo(void) { return 0; }
int bar(void) { return foo(); }

.... a call to bar() might take roughly twice as long as a
call to foo(). But that's not the case under discussion!

Now an application probably won't spend all it's time calling that
wrapper function, but then it might not spend all it's time calculating
with floats either.

And of course that extra wrapper code will also require space in the
instruction cache. If function call overhead didn't matter than no
compiler would ever bother with inlining.

Function call overhead *can* matter, but that doesn't mean
it always matters, nor even that it usually matters. Let the
compilers cope with it; they're pretty well schooled in ways of
minimizing the bad effects. Worry about the overhead only *after*
you've measured the program's performance, found it lacking, and
found evidence that function linkage is a substantial contributor.

Thinking about data structure size and locality is another
matter entirely.

Dann Corbit · Nov 11, 2009

So it sounds that you assume computation using 'float' are faster than
ones using 'double'.

This would make a good reason to use float.

This is almost always true. It is also almost always true that the
double answer will be more accurate. So do you want a less reliable
answer faster or a more reliable answer more slowly? It's not always
simple.

Speed... if indeed 'float' is faster (on CPUs which don't have any FPU,
you need to use routines for floating point emulation. And I can expect
these routines to be slower when dealing with larger floating types, but
it's just a guess).

By analogy, when dealing with integer types, I don't use 'long long'
type everywhere I need an integer type, because these are larger hence
slower to use (at least on 32 bits CPUs).

On a 64 bit system with a 64 bit compiler and operating system there is
no real penalty for 64 bit operations (other than increased size of
applications due to doubling of pointer width). But operations on 64
bit integers are ultra-fast.

Yeah and one of my initial questions was how to 'measure' the cumulated
error and thus deduce if I need a float, double or whatever.

This is a complicated subject, called "Numerical Analysis".

There are number packages that will calculate this for you using range
arithmetic:
http://portal.acm.org/citation.cfm?id=138377

Of course, there is a large speed penalty for using range arithmetic.

For things like solving a linear system you can examine the condition
number of a matrix. There are some problems (for instance) that have an
exact answer, and yet are incredibly difficult to calculate. An example
is solving a linear system where the matrix is a Hilbert matrix:
http://en.wikipedia.org/wiki/Hilbert_matrix
The condition number of a Hilbert matrix goes "coo coo for cocoa puffs"
as the matrix gets large.
Even though there is an exact solution, even a moderately small matrix
will give absurdly wrong answers unless extended or arbitrary precision
is used for the calculation.

A general rule of thumb would be:
"If time allows, use the widest possbile floating point type that can
get the job done in the appointed time."

IMO-YMMV

Phil Carmody · Nov 11, 2009

Keith Thompson said:
Dann Corbit said:

With memory at low, low prices it seem hard to come up with sensible
reasons to prefer float over double for the vast majority of
applications.

Click to expand...

[...]

However much memory you've got, there will still be times (probably
rarely) when being able to store twice as many numbers is worthwhile.

Very much so. However, often the interface between the two types
is more common on ingress/egress. *Every* variable in the program
can be considered a 'temporary intermediate' to be acted on at
double precision.

Phil

Java OpenJDK Floating Point Dare	3	Jan 17, 2023
Types	58	Dec 10, 2006
Accessing array elements via floating point formats.	33	Dec 10, 2010
converting floating point types round off error ....	13	Oct 4, 2008
Types in C	117	May 22, 2011
Raspberry Pi Open Source PLC Communication Wonder LECPython, and Example of Communication with Omron PLC	0	Oct 9, 2024
How to use single precision floating point?	10	Aug 6, 2010
Minimum value of floating point types.	5	Oct 8, 2009

Basics on real floating types

Francis Moreau

Seebs

Keith Thompson

Seebs

Dann Corbit

Keith Thompson

Dann Corbit

Keith Thompson

Nick Keighley

Eric Sosman

bartc

Francis Moreau

Francis Moreau

Francis Moreau

bartc

Eric Sosman

bartc

Eric Sosman

Dann Corbit

Phil Carmody

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads