How to use single precision floating point?

Immortal Nephi · Aug 6, 2010

I want to use single precision floating point. I don’t need to use
double precision floating point because single precision floating
point is faster.
How do I tell C++ Compiler to treat all floating point numbers with
float data type instead of double data type? I don’t need to use
casting operator.

For example:

float x = 1.0 / 3.5;

1.0 / 3.5 always use double data type. Doing casting operator is
annoying.

float x = float( 1.0 / 3.5 ); // Too annoying

Ian Collins · Aug 6, 2010

I want to use single precision floating point. I don’t need to use
double precision floating point because single precision floating
point is faster.

Are they?

How do I tell C++ Compiler to treat all floating point numbers with
float data type instead of double data type? I don’t need to use
casting operator.

That would be a compiler specific option, if it exists at all.

For example:

float x = 1.0 / 3.5;

1.0 / 3.5 always use double data type. Doing casting operator is
annoying.

float x = float( 1.0 / 3.5 ); // Too annoying

The cast casts the result of the operation, not the operation its self.

Stefan van Kessel · Aug 6, 2010

Append f or F after a numeric constant for it to be a float. E.g. 7.123f
..2f 1.f 7e3f etc.
float x = 1.0f / 3.5f;

There are also suffixes to make integer constants unsigned (u/U) or make
them longs (l/L). For floating point sequences the suffix l or L makes
them long double. In c++0x there will also be ll for long long and even
user define defined suffixes.

Immortal Nephi · Aug 7, 2010

Append f or F after a numeric constant for it to be a float. E.g. 7.123f
.2f 1.f 7e3f etc.
float x = 1.0f / 3.5f;

There are also suffixes to make integer constants unsigned (u/U) or make
them longs (l/L). For floating point sequences the suffix l or L makes
them long double. In c++0x there will also be ll for long long and even
user define defined suffixes.

- Show quoted text -

If you define int instead of short, long, or long long, then C++
Compiler will assign long to int on 32 bit machine or long long on 64
bit machine.
If you use array, choice of either short or long is appropriate.
Why do you need to use prefix u or U? If unsigned long is already
defined, you don’t need prefix u or U.

unsigned long x = 5L; // no U is needed
signed long y = -7L;

unsigned long z = 10UL;

James Kanze · Aug 7, 2010

I want to use single precision floating point. I don’t need
to use double precision floating point because single
precision floating point is faster.

Really? On what platform. Historically, the reason C++
defaults to double is that C did, and the reason C defaults to
double is that double was significantly faster on the most
important platforms of the time. I would be very surprised if
there were an important modern platform where it made a
difference, and if it did, I suspect that it is double that
would be faster.

How do I tell C++ Compiler to treat all floating point numbers
with float data type instead of double data type?

"-Ddouble=float". Doing so will, of course, mean that you can't
use any existing library (including the system libraries or the
standard C++ libraries).

I don’t need to use casting operator.

For example:

float x = 1.0 / 3.5;

1.0 / 3.5 always use double data type.

With most compilers, the arithmetic will be done at compile
time, not run-time. And the results rounded to float.

I'm not sure what you're getting at here---if you want floating
point constants to be float, and not double. If you want a
literal to have type float, just suffix it with and 'F', e.g.
3.5F. But I can't imagine why one would want to do such a
thing. (And many compilers will promote float to double in all
arithmetic expressions. On an Intel architecture, in fact,
arithmetic operations will always be done in a long double
format, 10 bytes. You can configure the processor to use less,
but it runs slower if you do.)

Doing casting operator is annoying.

float x = float( 1.0 / 3.5 ); // Too annoying

That conversion is implicit anyway, so there's no point to do
it.

James Kanze · Aug 7, 2010

If you define int instead of short, long, or long long, then
C++ Compiler will assign long to int on 32 bit machine or long
long on 64 bit machine.

You can't define int; the compiler defines it for you. And long
is always long, never int or long long. (It may have the same
size and representation as some other type, but it is still a
different type.)

If you use array, choice of either short or long is appropriate.

Almost never. In general, you use int, unless there is some
particular necessity of doing otherwise.

Why do you need to use prefix u or U? If unsigned long is
already defined, you don’t need prefix u or U.

What prefix? The post you were responding to spoke of suffixes.
And using U is often important, since the rules of arithmetic
(in C++) are subtly different for signed and unsigned types.

unsigned long x = 5L; // no U is needed

Not here. Nor is the L, for that matter.

signed long y = -7L;

But:
unsigned long y = -7;
and
unsigned long y = -7U;
give very different results. (At least on machine where long is
larger than an int.)

Most of the time, though, the difference intervenes when you
have constant expressions. Things like (-1U >> 3), for example.

tni · Aug 7, 2010

Really? On what platform.

Intel x86 / x64, if the compiler uses scalar SSE instructions. The
difference will be a lot bigger, if the compiler manages to vectorize
things and uses SSE SIMD instructions.

On an Intel architecture, in fact,
arithmetic operations will always be done in a long double
format, 10 bytes. You can configure the processor to use less,
but it runs slower if you do.)

Only, if the FPU and not SSE is used. (The SSE code is typically faster.)

Juha Nieminen · Aug 8, 2010

James Kanze said:
the reason C defaults to
double is that double was significantly faster on the most
important platforms of the time. I would be very surprised if
there were an important modern platform where it made a
difference, and if it did, I suspect that it is double that
would be faster.

That was way before SSE was introduced, which can make a significant
difference on whether doubles or floats are used, if the compiler is
able to fully optimize for SSE. (If I understand correctly, SSE can be
used to do some operations to 4 floats in parallel, but only 2 doubles.)

James Kanze · Aug 9, 2010

That was way before SSE was introduced, which can make
a significant difference on whether doubles or floats are
used, if the compiler is able to fully optimize for SSE. (If
I understand correctly, SSE can be used to do some operations
to 4 floats in parallel, but only 2 doubles.)

I'll admit that I'd not considered SSE. But isn't only for
vector operations. For vector operations, even without SSE,
issues of locality could favor float over double. On the other
hand, for isolated variables, I would expect double to be
faster, or is there something involving SSE that changes that as
well.

Juha Nieminen · Aug 10, 2010

James Kanze said:
I'll admit that I'd not considered SSE. But isn't only for
vector operations. For vector operations, even without SSE,
issues of locality could favor float over double. On the other
hand, for isolated variables, I would expect double to be
faster, or is there something involving SSE that changes that as
well.

I don't know if SSE operations on individual values is equally fast
for floats and doubles, but there's another thing which might affect
the speed of the program: Caching. If you deal with lots of doubles,
the processor's data cache is going to fill up faster than if you were
dealing with floats. This might have some impact with some programs
(although obviously not with all).

My personal experience is that this is quite heavily platform-specific.
A quite math-intensive library of mine seems to be about equally fast
regardless of whether the data type is double or float when I run it on
my computer (a Pentium4), but I have had reports of measurable speed
advantages in favor of float in other platforms.

James Kanze · Aug 10, 2010

I don't know if SSE operations on individual values is equally fast
for floats and doubles, but there's another thing which might affect
the speed of the program: Caching. If you deal with lots of doubles,
the processor's data cache is going to fill up faster than if you were
dealing with floats. This might have some impact with some programs
(although obviously not with all).

That's what I meant by issues of locality. If you have large
tables, which you're processing sequentially, then float may be
signficantly faster. If the operations on each element aren't
too complicated, then it could be faster even if the operations
themselves were faster on a double---a lot of programs today are
memory bound.

My personal experience is that this is quite heavily
platform-specific.

And depends a lot on what you are doing. If you're scaling
large tables of physical measurements (which are only accurate
to about 10% to begin with), then float is probably indicated.
If you're iterating over three or four values, doing successive
approximations, trying to converge on the correct answer, then
double is more likely the best choice---if nothing else, the
improved accuracy should give better mathematic stability.

Java OpenJDK Floating Point Dare	3	Jan 17, 2023
C++ SSE and SSE2 compiler settings, and their Floating Point effects.	0	May 31, 2022
Single and Double Precision Floating Point Modeling	1	Apr 13, 2009
How to use Densenet121 in monai	0	Feb 16, 2024
Writing floating point number to disk	17	Sep 17, 2008
floating point calculation problem	4	Jan 5, 2010
converting floating point types round off error ....	13	Oct 4, 2008
floating point conversions && how to read standards	4	Oct 7, 2011

How to use single precision floating point?

Immortal Nephi

Ian Collins

Stefan van Kessel

Immortal Nephi

James Kanze

James Kanze

tni

Juha Nieminen

James Kanze

Juha Nieminen

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads