C/C++ pitfalls related to 64-bits (unsigned long & double)

BGB · Feb 14, 2012

Any conventional PC-type system these days is going to use IEEE FP,
and if the system claims to support IEEE FP, it has to be exact. If
it isn't, it's a bug.

I did run your program on my AMD system (phenom I), and it showed no
output. It would be interesting to see somebody with an identical CPU
to yours try it...

I don't know.

in the past, the issue seemed specific to Linux x86-64 (and SSE), but I
am also currently getting the same results from a 32-bit Windows program
(using the x87 FPU, compiled with GCC).

testing with MSVC, the results slightly are different (only 4 different
values show up), but the basic issue is still present.

I'm not sure you could call it a minor issue. A lot of software
assumes that FP arithmetic is exact for integer values within a
certain range, and isn't going to do any fudging (because it shouldn't
be necessary, and would have a severe performance impact), so such a
system where fudging is necessary would have ... problems.

potentially, yes.

however, other than this, the computer seems to work fine (nothing is
obviously acting buggy or crashing...).

BGB · Feb 14, 2012

The explanation for this, at least, seems simple. You're overflowing
the dynamic range of a double. 32767*32767*100000000 needs a bit more
than 56 bits to represent exactly. That doesn't fit in a double
(assuming an IEEE double, of course). Nor do the results of that
number plus or minus 1.0. If the ints on your system are 32 bit,
there should be no problem, as the number will be reduced to 32 bits
first, if they're 64 bits, you'll have round off error.

this is on x86 and x86-64, and in both cases "int" is 32 bits.
storing the expression into an int essentially truncates it to 32-bits.

if it were a "long long", it would be a different matter.

I've not looks through all the combinations, but the compilers could
be setting rounding differently, and even for x87 and SSE2
instructions rounding can be set differently at the same time.
Rounding can be different between the x87 store-as-int (FIST rounds
according to the selected x87 rounding mode), SSE2 convert-to-int
(CVTTSD2S always truncates towards zero) and software convert-to-int
(which can do whatever the implementers fancied). Those different
rounding modes could to a lesser or greater degree be masking the
problem. Another variable is that on some implementations, the x87 FP
calculations are done in 80-bit format, at least some of the time, so
you might see no round off errors at all in those cases.

yeah.

dunno the what exactly is the issue, but it is probably fairly minor
given the lack of any obvious problems...

as noted, when I had seen it before, I had simply assumed that it was
some basic property of floating-point behavior and fudged it. if other
people have been seeing similar behavior, it is possible maybe such
fudging is fairly common?...

MikeWhy · Feb 14, 2012

BGB said:
I don't know.

in the past, the issue seemed specific to Linux x86-64 (and SSE), but
I am also currently getting the same results from a 32-bit Windows
program (using the x87 FPU, compiled with GCC).

testing with MSVC, the results slightly are different (only 4
different values show up), but the basic issue is still present.

The MSVC warnings are rather explicit:
"... possible loss of data".
'initializing' : truncation from 'const uint_64' to 'const double'
'initializing' : conversion from 'const double' to 'const uint_64', possible
loss of data

Z:\Foo>fooFP.exe
sizeof(uint_64) ==> 8
std::numeric_limits<double>::digits = 53
std::numeric_limits<uint_64>::digits = 64
(18446744073709551615 == 9223372036854775808) ==> false
(18446744073709551615 == 9223372036854775808) ==> false

On gcc:

mikey@boatVPM-Linux:~/Foo$ g++ --version
g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

mikey@boatVPM-Linux:~/Foo$ g++ -std=c++0x -Wall -pedantic fooFP.cpp
fooFP.cpp: In function âint main()â:
fooFP.cpp:21:30: warning: overflow in implicit constant conversion
[-Woverflow]
mikey@boatVPM-Linux:~/Foo$ ./a.out
sizeof(uint_64) ==> 8
std::numeric_limits<double>::digits = 53
std::numeric_limits<uint_64>::digits = 64
(18446744073709551615 == 0) ==> false
(18446744073709551615 == 18446744073709551615) ==> true
mikey@boatVPM-Linux:~/Foo$

The first case was similar to what you had written it. The second case used
'const uint_64' and const double as hints to the compiler. gcc actually
managed to get the second case right. I don't know what MSVC was thinking.
The integer value evaluated to (2^64 - 1) as expected. Converting back to
ULL from double produced 2^63. It apparently stored 1/2 * 2^64 as the FP
value, which is a surprising conversion (but still within the definition of
undefined behavior).

however, other than this, the computer seems to work fine (nothing is
obviously acting buggy or crashing...).

??!! It's a coding error. The compiler recognized the error and printed
diagnostics, warning about truncation or overflow in the conversion.
Comparing std::numeric_limits<double>::digits against
std::numeric_limits<unsigned long long>::digits makes clear the nature of
the problem.

Keith Thompson · Feb 14, 2012

Ben Bacarisse said:
In the architecture in question, yes.

[...]

In any conforming C implementation; the requirements on double are such
that it can represent any 32-bit integer value exactly.

James Kuyper · Feb 14, 2012

That's to be expected. As i gets larger, for some of the values that can
be returned by rand(),

....
The explanation for this, at least, seems simple. You're overflowing
the dynamic range of a double. 32767*32767*100000000 needs a bit more
than 56 bits to represent exactly.

That number is 32767*32767*5^8 * 2^8. Representing the first part
exactly only requires 48 bits, and the 2^8 just shifts the exponent.
However, the actual maximum value that d can have is
32767*32767*99999999, and that does require 56 bits to represent exactly.

... That doesn't fit in a double
(assuming an IEEE double, of course). Nor do the results of that
number plus or minus 1.0. ...

That part is true, even when 'i' has a large power of 2 among it's factors.

There's no mystery about these results anymore - he's looking at integer
values that are too large for N and N+1 to both be exactly representable.

Ben Bacarisse · Feb 14, 2012

James Kuyper said:
#include <stdio.h>

int main()
{
double d;
int i, j, k;

for(i=0; i<100000000; i++)
{
j=rand()*rand()*i;
d=j;
d=d+1.0; //(1)
d=d-1.0; //(1)
k=d; //(2)
k=(d>=0?(int)(d+0.0001)int)(d-0.0001)); //(2) [The reported error come when the above line is commented out]
if(j!=k)
printf("%d %d\n", j, k);
}
}

Click to expand...

Click to expand...

... That doesn't fit in a double
(assuming an IEEE double, of course). Nor do the results of that
number plus or minus 1.0. ...

Click to expand...

That part is true, even when 'i' has a large power of 2 among it's factors.

There's no mystery about these results anymore - he's looking at integer
values that are too large for N and N+1 to both be exactly
representable.

That's not what's happening. On the system in question, int has only
32 bits. The large and possibly overflowing value is assigned to an
int first. This conversion is implementation defined, but it can't make
j too large to represented exactly in d. In fact, one of his reported
error cases was caused when j == 4. This *must* set d to 4.0 (and it
did). The subsequent +1.0 and -1.0 and conversion back to int produces
a value not == 4.

I agree there is no mystery, but it's not using integers too large for
double -- it's floating-point arithmetic is not exact.

[Aside: since the integer arithmetic can overflow, technically the
program has undefined behaviour. So while d may be assigned 4.0, say,
it is quite permissible (from C's point of view) for it to become 5.0 or
109.2 or any other value at any time. I'm assuming a sane
implementation.]

glen herrmannsfeldt · Feb 14, 2012

(snip, someone wrote)

Not C, but C-on-a-system-using-IEEE-FP, which is basically everything
mainstream. [In practice it's a pretty good bet that even wackier FP
hardware actually maintains the same constraint.]

Click to expand...

Apparently you never tried to multiply or divide on any early Crays.
;-)

Multiplication could be off by a full ULP. So products that exactly
filled all 48 bits with significant bits often produced slightly odd
results.

Is that also the one with the non-commutative multiplication?

-- glen

glen herrmannsfeldt · Feb 14, 2012

(snip)

I've not looks through all the combinations, but the compilers could
be setting rounding differently, and even for x87 and SSE2
instructions rounding can be set differently at the same time.
Rounding can be different between the x87 store-as-int (FIST rounds
according to the selected x87 rounding mode), SSE2 convert-to-int
(CVTTSD2S always truncates towards zero) and software convert-to-int
(which can do whatever the implementers fancied). Those different
rounding modes could to a lesser or greater degree be masking the
problem. Another variable is that on some implementations, the x87 FP
calculations are done in 80-bit format, at least some of the time, so
you might see no round off errors at all in those cases.

The x87 registers are always 80 bit, with 64 bit significand.

The result is that intermediate values are often kept to 64
significant bits, while those stored to memory are (usually
rounded) to 53. In higher optimization modes, compilers will
keep values in registers longer.

There are mode bits that specify the precision, but they don't
apply to all operations. The extra precision is supposed to be
good, but the variability (not knowing which are stored in
between and which aren't) can be surprising.

-- glen

BGB · Feb 14, 2012

That's to be expected. As i gets larger, for some of the values that can
be returned by rand(),

That number is 32767*32767*5^8 * 2^8. Representing the first part
exactly only requires 48 bits, and the 2^8 just shifts the exponent.
However, the actual maximum value that d can have is
32767*32767*99999999, and that does require 56 bits to represent exactly.

but, if you store it into a 32 bit integer (first), all the high-order
bits end up cut off anyways. a desktop PC doesn't have any such 64-bit
"int" type, so the above can't be the issue.

That part is true, even when 'i' has a large power of 2 among it's factors.

There's no mystery about these results anymore - he's looking at integer
values that are too large for N and N+1 to both be exactly representable.

no, that is not the problem here.

I suspect results would be far more obvious if something like this was
going on (not just a few misses, but an absurdly long list of misses,
all with huge values...).

BGB · Feb 14, 2012

(snip)

The x87 registers are always 80 bit, with 64 bit significand.

yes.

the mystery here is this:
on an older core (an "Athlon 64 X2"), I was only seeing this issue when
using SSE for the arithmetic (on Linux x86-64). this was because on
64-bit targets, people said NO to x87, and switched over to using almost
entirely SSE for floating-point math.

on my newer CPU ("Althon II X4"), it appears I am seeing the same
behaviors from both SSE and x87.

someone is not seeing any issues with an "AMD Phenom", which is more
curious as both chips use an "AMD K10" based core (in my case, the
"Propus" core).

apparently, it still has separate x87 and SSE (unlike, say, the newer
"Bulldozer" core), so it is mildly curious.

The result is that intermediate values are often kept to 64
significant bits, while those stored to memory are (usually
rounded) to 53. In higher optimization modes, compilers will
keep values in registers longer.

There are mode bits that specify the precision, but they don't
apply to all operations. The extra precision is supposed to be
good, but the variability (not knowing which are stored in
between and which aren't) can be surprising.

possibly, but theory says it "should fit" in the intermediate forms as
well (when saved out to double, say, both "16384" and "16385" should be
exactly representable).

it apparently does fit for the vast majority of values, but for whatever
reason for certain values it comes out "slightly off".

but, whatever, I am ending up wasting too much time here thinking about
it...

BGB · Feb 14, 2012

The MSVC warnings are rather explicit:
"... possible loss of data".
'initializing' : truncation from 'const uint_64' to 'const double'
'initializing' : conversion from 'const double' to 'const uint_64',
possible loss of data

you sure we are talking about the same piece of code?...
my example wasn't using any "uint_64", just a plain "int" type (which is
32 bits).

Z:\Foo>fooFP.exe
sizeof(uint_64) ==> 8
std::numeric_limits<double>::digits = 53
std::numeric_limits<uint_64>::digits = 64
(18446744073709551615 == 9223372036854775808) ==> false
(18446744073709551615 == 9223372036854775808) ==> false

On gcc:

mikey@boatVPM-Linux:~/Foo$ g++ --version
g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

mikey@boatVPM-Linux:~/Foo$ g++ -std=c++0x -Wall -pedantic fooFP.cpp
fooFP.cpp: In function âint main()â:
fooFP.cpp:21:30: warning: overflow in implicit constant conversion
[-Woverflow]
mikey@boatVPM-Linux:~/Foo$ ./a.out
sizeof(uint_64) ==> 8
std::numeric_limits<double>::digits = 53
std::numeric_limits<uint_64>::digits = 64
(18446744073709551615 == 0) ==> false
(18446744073709551615 == 18446744073709551615) ==> true
mikey@boatVPM-Linux:~/Foo$

The first case was similar to what you had written it. The second case
used 'const uint_64' and const double as hints to the compiler. gcc
actually managed to get the second case right. I don't know what MSVC
was thinking. The integer value evaluated to (2^64 - 1) as expected.
Converting back to ULL from double produced 2^63. It apparently stored
1/2 * 2^64 as the FP value, which is a surprising conversion (but still
within the definition of undefined behavior).

however, other than this, the computer seems to work fine (nothing is
obviously acting buggy or crashing...).

Click to expand...

??!! It's a coding error. The compiler recognized the error and printed
diagnostics, warning about truncation or overflow in the conversion.
Comparing std::numeric_limits<double>::digits against
std::numeric_limits<unsigned long long>::digits makes clear the nature
of the problem.

MikeWhy · Feb 14, 2012

BGB said:
you sure we are talking about the same piece of code?...
my example wasn't using any "uint_64", just a plain "int" type (which is
32 bits).

No, I'm not sure at all. At some point, the conversation turned it into
64-bit unsigned, possibly only in my head.

Both gcc and MSVC get this right:

const int_64 ifoo = -1LL;
{
double foox = ifoo;
uint_64 foo2 = foox;

std::cout << '(' << (uint_64)ifoo << " == " << foo2 << ") ==> "
<< ((uint_64)ifoo == foo2 ? "true" : "false")
<< '\n';
}

.... but, both get the following wrong:
{
uint_64 foo = ifoo;
double foox = foo;
uint_64 foo2 = foox;

std::cout << '(' << foo << " == " << foo2 << ") ==> "
<< (foo == foo2 ? "true" : "false")
<< '\n';
}

gcc: (18446744073709551615 == 0) ==> false
MSVC: (18446744073709551615 == 9223372036854775808) ==> false

.... and gcc gets it right if the values are marked const:
{
const uint_64 foo = ifoo;
const double foox = foo;
const uint_64 foo2 = foox;

std::cout << '(' << foo << " == " << foo2 << ") ==> "
<< (foo == foo2 ? "true" : "false")
<< '\n';
}

gcc: (18446744073709551615 == 18446744073709551615) ==> true
MSVC: (18446744073709551615 == 9223372036854775808) ==> false

gcc apparently keeps enough precision for the const double initializer
value, possibly as a long double. MSVC obviously does not.

All the same, both compilers print diagnostics about signed/unsigned
mismatches; possible loss of data in conversions; truncation/overflow in
conversion.

BGB · Feb 14, 2012

No, I'm not sure at all. At some point, the conversation turned it into
64-bit unsigned, possibly only in my head.

fair enough, we were talking about different pieces of code with
different behaviors (the "64-bit unsigned" was from a different part of
the thread).

BGB · Feb 14, 2012

(snip, someone wrote)

Not C, but C-on-a-system-using-IEEE-FP, which is basically everything
mainstream. [In practice it's a pretty good bet that even wackier FP
hardware actually maintains the same constraint.]

Click to expand...

Apparently you never tried to multiply or divide on any early Crays.
;-)

Click to expand...

Multiplication could be off by a full ULP. So products that exactly
filled all 48 bits with significant bits often produced slightly odd
results.

Click to expand...

Is that also the one with the non-commutative multiplication?

Click to expand...

Yep. Reverse the operands, get a different "round off" error.

seems like fun...

also reminds me of a tricks used to implement fixed-point multiplies:
if the intermediate result wont fit in the word, shift one or both of
the arguments right some to make the result smaller.

this was a way to do fast/simple fixed point math without needing a
larger register to hold the intermediate results, but would also cost
some in terms of accuracy.

makes me wonder if maybe Cray was doing something similar?... (say,
essentially just discarding low-order bits from the operands in an
uneven manner?).

glen herrmannsfeldt · Feb 14, 2012

(snip)

but, if you store it into a 32 bit integer (first), all the high-order
bits end up cut off anyways. a desktop PC doesn't have any such 64-bit
"int" type, so the above can't be the issue.

IA32 doesn't do it in one instruction, but many compilers targeting
the 32 bit IA32 have a 64 bit (long long) data type. Add and
subtract can usually be done inline, with two instructions.
Multiply and divide might be done as subroutine call, but many
will recognize the case of 32 bit operands cast to (long long),
and use a single multiply instruction with 64 bit product.

I never tested for it, but they might also recognize the case of
a 64 bit dividend and 32 bit divisor, generating a 32 bit quotient.

(But a large fraction of desktop PCs now have 64 bit processors.)

-- glen

glen herrmannsfeldt · Feb 14, 2012

(snip, I wrote)

the mystery here is this:
on an older core (an "Athlon 64 X2"), I was only seeing this
issue when using SSE for the arithmetic (on Linux x86-64).
this was because on 64-bit targets, people said NO to x87,
and switched over to using almost entirely SSE for
floating-point math.

There is another effect to watch for, which I remember became
obvious when testing for the Pentium FDIV bug. Many compilers
evaluate constant expressions at compile time using different
arithmetic than they do at run time. Specifically, they did
not use FDIV, either at compile time or run time when given
a constant expression.

Many of the expressions in this thread were constants known
at compile time, which the compiler could be evaluating.

-- glen

Joe keane · Feb 14, 2012

Still wondering here about the "up for grabs" part. It seems to imply
some edge condition that isn't accounted for.

a)

machine has internal format that is superset of 'long' and 'double'

b)

compiler notice the code does
long -> double -> long
and say 'screw that, don't do nothing'[1]

[1] in general, people don't get mad if you give -more- precision than
that required, just supply a 'yes i'm a masochist' option

glen herrmannsfeldt · Feb 14, 2012

(snip)

a) machine has internal format that is superset of 'long' and 'double'

b) compiler notice the code does: long -> double -> long
and say 'screw that, don't do nothing'[1]

[1] in general, people don't get mad if you give -more- precision than
that required, just supply a 'yes i'm a masochist' option

gcc has the --float-store option, requiring it to store intermediate
values and refetch them (hopefully in cache) to avoid the excess
precision problem. I don't know if that also applied to compile time
expression evaluation, though.

There are some algorithms that require a consistent precision,
and that will fail if sometimes given more.

-- glen

Keith Thompson · Feb 14, 2012

[1] in general, people don't get mad if you give -more- precision than
that required, just supply a 'yes i'm a masochist' option

People will sometimes get quite mad if you give them *inconsistent*
precision.

glen herrmannsfeldt · Feb 15, 2012

In comp.lang.c++ Keith Thompson said:
(e-mail address removed) (Joe keane) writes:

[1] in general, people don't get mad if you give -more- precision than
that required, just supply a 'yes i'm a masochist' option

Click to expand...

People will sometimes get quite mad if you give them *inconsistent*
precision.

Well, for the long story, when the 8087 was developed, the idea
was for an infinite register stack. When it was full, there
would be an interrupt, some would be spilled to memory, and
then continue on. The chip was designed and built before anyone
tried to write the software to do it, when it was found
not to be possible.

Why it wasn't fixed for the 80287 and 80837, I don't know.

That would, at least, have allowed for intermediate values
to always have extra precision.

Now, when you add optimizations, such as common subexpression
elimination, and keep values in registers between statements,
then again you have inconsistent precision. The infinite
stack would have allowed for more consistency.

As an example of what is possible, without actually seeing
a compiler do it, if you:

X=A*B+C*D;
Y=A*B-C*D;

The compiler might keep C*D with extra precision, but not A*B.

Now, if A=C and B=D, Y should be zero but it isn't.

With an infinite stack, one might hope that the intermediates
for A*B and C*D had the same precision, extra or not.

-- glen

64 bits values or 32 bits	2	Jun 30, 2011
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Unsigned long problem in c coding	3	Jul 31, 2019
Ordenate and remove duplicate cases of an array passed as a c function argument	0	Sep 27, 2022
Any integer number is always 32 bits	2	May 14, 2012
Cannot convert (double) to (double*)	1	Sep 5, 2022
Struct Member Variable Problems	1	Jun 21, 2023
sizeof (long double)	2	Feb 18, 2011

C/C++ pitfalls related to 64-bits (unsigned long & double)

BGB

BGB

MikeWhy

Keith Thompson

James Kuyper

Ben Bacarisse

glen herrmannsfeldt

glen herrmannsfeldt

BGB

BGB

BGB

MikeWhy

BGB

BGB

glen herrmannsfeldt

glen herrmannsfeldt

Joe keane

glen herrmannsfeldt

Keith Thompson

glen herrmannsfeldt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads