C/C++ pitfalls related to 64-bits (unsigned long & double)

A

Alex Vinokur

Hi,

unsigned long a = -1;
double b = a;
unsigned long c = b;

Model 32-bits: a == c
Model 64-bits: a != c


It is not a compiler bug for 64-bits.

Is it a design bug of C/C++ languge for 64-bits?

Alex
 
G

Goran

Hi,

unsigned long a = -1;
double b = a;
unsigned long c = b;

Model 32-bits:   a == c
Model 64-bits:   a != c

It is not a compiler bug for 64-bits.

Is it a design bug of C/C++ languge for 64-bits?

No. Your code snippet will produce "c" that equals "a" on both 32- and
64-bit systems.

If you think there is, post an example that shows the problem, e.g.:

#include <iostream>
int main()
{
unsigned long a = -1;
double b = a;
unsigned long c = b;
if (a != c)
{
std::cout << "error!";
}
}

Goran.
 
A

Alex Vinokur

No.

a == c for both "32-bits" and "64-bits" on my compiler (VC9).

/Leigh

aCC: HP C/aC++ B3910B A.06.25.01 [May 16 2010]
For 64-bits:
a = 0xffffffffffffffff
c = 0x8000000000000000

Intel(R) C++ Intel(R) 64 Compiler XE for applications running on
Intel(R) 64, Version 12.0.4.191 Build 20110427
For 64-bits:
a = 0xffffffffffffffff
c = 0
 
A

Alex Vinokur

No. Your code snippet will produce "c" that equals "a" on both 32- and
64-bit systems.

If you think there is, post an example that shows the problem, e.g.:

#include <iostream>
int main()
{
 unsigned long a = -1;
 double b = a;
 unsigned long c = b;
 if (a != c)
 {
  std::cout << "error!";
 }

}

Goran.

#include <iostream>
#include <cassert>

typedef unsigned char uchar;

#define SHOW_HEX(x) std::cerr << #x << " = " << std::hex <<
std::showbase << x << std::dec << std::endl
#define SHOW_DOUBLE(x) std::cerr << #x << " = " << x << std::endl
#define SHOW_CHAR(x) std::cerr << #x << " = " << std::hex <<
std::showbase << std::size_t(uchar(x)) << std::dec << std::endl

int main()
{
// -------------------------------------
std::cout << "Model: " << sizeof(void*) * CHAR_BIT << "-bits"<<
std::endl;
// -------------------------------------

// std::size_t a = std::size_t(-1);
// double b = a;
std::size_t a = std::numeric_limits<std::size_t>::max();
double b = a;
std::size_t c = b;
char* pa = reinterpret_cast<char*>(&a);
char* pb = reinterpret_cast<char*>(&b);
char* pc = reinterpret_cast<char*>(&c);

SHOW_HEX(a);
SHOW_DOUBLE(b);
SHOW_HEX(c);

std::cerr << std::endl;
for (std::size_t i = 0; i < (sizeof(std::size_t)/sizeof(char)); i++)
{
SHOW_CHAR(pa);
}
std::cerr << std::endl;
for (std::size_t i = 0; i < (sizeof(double)/sizeof(char)); i++)
{
SHOW_CHAR(pb);
}
std::cerr << std::endl;
for (std::size_t i = 0; i < (sizeof(std::size_t)/sizeof(char)); i++)
{
SHOW_CHAR(pc);
}

assert (a == c);

return 0;
}
 
E

Eric Sosman

Hi,

unsigned long a = -1;
double b = a;
unsigned long c = b;

Model 32-bits: a == c
Model 64-bits: a != c


It is not a compiler bug for 64-bits.

Is it a design bug of C/C++ languge for 64-bits?

Whether the language design is faulty seems a matter of opinion.
However, part of the "spirit of C" is to stay fairly close to the
hardware. Since hardware that offers 64 bits of precision in the
floating-point format used for `double', some loss of precision in
`b = a' must be expected.

The language *could* have been defined to raise an exception
whenever a floating-point operation delivers an inexact result, but
that would have meant raising such exceptions for a large fraction
of all F-P calculations, perhaps many times in a single expression.
Or the language could have left the inexact conversion entirely
undefined, in which case there'd be no reason to expect `a == c'
(or even that the execution would get that far). The behavior
actually chosen (conversion yields one of the two representable
neighbors) seems fairly benign, not something I'd call a bug. But,
as I say, that's a matter of opinion.

(The language *could* have been defined to deliver exact F-P
results for all calculations, widening the representation at need.
That's the approach used on the Starship Enterprise, where Kirk
crippled the computer by asking it to calculate pi ...)
 
B

Ben Bacarisse

Talking from the C perspective here...
unsigned long a = -1;
double b = a;
unsigned long c = b;

Model 32-bits: a == c
Model 64-bits: a != c

It is not a compiler bug for 64-bits.

Quite. Both outcomes are permitted.
Is it a design bug of C/C++ languge for 64-bits?

No.

Your use of "64-bits" is a little confusing. Not all 64-bit systems
have 64 bit unsigned longs which is, I think, what you are talking
about.

On systems with 64-bit longs and standard 56-bit mantissa doubles, you
can not represent ULONG_MAX (the value of 'a' in the above code) exactly
in a double. C mandates that you get one of the two nearest
representable values, but it wont be exact. When the conversion goes
the other way the result can be undefined (if the floating point values
was rounded up to a value larger that ULONG_MAX), but, even if the
double has a value in the range of unsigned long, it will not longer
equal ULONG_MAX.

I said "no" to it being an error in the design of the language because
solving it would impose the kind of burden on implementations that C
rejects. C is designed to use native machine types wherever possible.
 
E

Eric Sosman

[...] Since hardware that offers 64 bits of precision in the
floating-point format used for `double', some loss of precision in
`b = a' must be expected.

Oh, drat. There was supposed to be an "is fairly rare" just
before the comma ...
 
B

Ben Bacarisse

Goran said:
No. Your code snippet will produce "c" that equals "a" on both 32- and
64-bit systems.

Not always. His (and your) use of "64-bit systems" hides that fact that
they are not all the same:

$ cat t.c
#include <stdio.h>

int main(void)
{
unsigned long a = -1;
puts(a == (unsigned long)(double)a ? "same" : "different");
}
$ gcc -o t -std=c99 -pedantic t.c
$ ./t
different

(g++ will do the same, here).

<snip>
 
N

Noob

Alex said:
Hi,

unsigned long a = -1;
double b = a;
unsigned long c = b;

Model 32-bits: a == c
Model 64-bits: a != c


It is not a compiler bug for 64-bits.

Is it a design bug of C/C++ languge for 64-bits?

Nicely done. You've hit all the right nails.

Remaining conspicuously vague, conflating C and C++ while cross-posting
to both groups, claiming a defect in the language, conjuring the ever
so misunderstood floating-point format, ...

You would make Kenny so proud!

If you're bored, you could read Goldberg's paper (all of it!)
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
 
G

Goran

No. Your code snippet will produce "c" that equals "a" on both 32- and
64-bit systems.

Here I stand ashamed. I overlook the "unsigned" part. :-(

It's what Noob says, first and foremost. max of size_t is likely
2^64-1. That's more digits than number of significant digits "double"
can carry.

Goran.
 
B

BGB

Talking from the C perspective here...


Quite. Both outcomes are permitted.


No.

Your use of "64-bits" is a little confusing. Not all 64-bit systems
have 64 bit unsigned longs which is, I think, what you are talking
about.

On systems with 64-bit longs and standard 56-bit mantissa doubles, you
can not represent ULONG_MAX (the value of 'a' in the above code) exactly
in a double. C mandates that you get one of the two nearest
representable values, but it wont be exact. When the conversion goes
the other way the result can be undefined (if the floating point values
was rounded up to a value larger that ULONG_MAX), but, even if the
double has a value in the range of unsigned long, it will not longer
equal ULONG_MAX.

I said "no" to it being an error in the design of the language because
solving it would impose the kind of burden on implementations that C
rejects. C is designed to use native machine types wherever possible.

yeah.

also, even though double has more bits than, say, an integer, does not
mean it will reliably encode an integer's value (it can do so in theory,
and will most often do so, but whether or not it will actually always do
so is more "up for grabs").


it is much less reliable with float (since float has only about 23 bits
to hold an integer's value, vs the 52 bits or so in double).

hence, float can't reliably hold the entire integer range, and double
can't reliably hold the entire long-long range (the size of long is
target specific, even for the same CPU architecture and operating mode,
it may still vary between the OS and compiler in use).


the most common behavior seems to be:
int -> float or double, may produce a value slightly below the integer;
float or double to int, will generally truncate the value, yielding the
integer representation as rounded towards 0.

the result then is a tendency for an int->double->int conversion to have
a small chance to drop the integer value towards 0 (why? I don't know
exactly, but I have observed it before).

one can counteract this by fudging the value with a small epsilon prior
to converting back into an integer.

say, for example (untested, from memory):
(v>=0)?((int)(v+0.0001)):((int)(v-0.0001));

can't say it will always work, but similar seems to work fairly well IME
(at least on generic x86 based targets).


or such...
 
J

James Kuyper

On 02/13/2012 01:49 PM, Richard wrote:
....
Alex Vinokur <[email protected]> spake the secret code


Isn't this undefined behavior right from the get-go?

No, why do you think so?

The behavior is defined by 6.3.1.3p2 in the C standard, which has been
quoted several times already in this thread. Do you have any reason to
doubt the accuracy or applicability of that section to this code, or
have you simply not been paying attention?
..
Since this is cross-posted comp.lang.c++, section 4.7p2 is the
appropriate location in the C++ standard; it says essentially the same
thing, but with different language.
 
J

Juha Nieminen

In comp.lang.c++ James Kuyper said:
On 02/13/2012 01:49 PM, Richard wrote:
...

No, why do you think so?

Even if it were undefined, you could simply change it to:

unsigned long a = ~0UL;
 
M

MikeWhy

BGB said:
... even though double has more bits than, say, an integer, does not
mean it will reliably encode an integer's value (it can do so in
theory, and will most often do so, but whether or not it will
actually always do so is more "up for grabs").

What circumstances are those? Integers hold integer values. AFAIK, all
integer values encode correctly in FP of higher precision. It's a simple
matter of normalization. (On Intel, a bit-scan operation.)
 
J

James Kuyper

What circumstances are those? Integers hold integer values. AFAIK, all
integer values encode correctly in FP of higher precision. It's a simple
matter of normalization. (On Intel, a bit-scan operation.)

You've covered the correct point, but have apparently not realized that
it was relevant. He'd talking about converting an integer value to a
floating point type when the floating point type has insufficient
precision to encode the value correctly. This is less than clear,
because he's making all kinds of implementation-specific assumptions
about the sizes of various types, and (inconsistently) using 'integer'
as if it were synonymous with 'int'.

The key point is that, for instance, a 32-bit integer type can represent
values too large to be converted without loss of precision to a 32-bit
floating point type, because the floating point type uses some of those
bits for the exponent. The same thing applies to 64 bit integer types
and and 64 bit floating point types.
 
G

glen herrmannsfeldt

In comp.lang.c++ Eric Sosman said:
[...] Since hardware that offers 64 bits of precision in the
floating-point format used for `double', some loss of precision in
`b = a' must be expected.
Oh, drat. There was supposed to be an "is fairly rare" just
before the comma ...

x87 hardware isn't that rare. Depending on the implementation,
the compiler might do the calculation in temporary real form,
with all 64 bits.

But, yes, the usual double is 64 total bits, so fewer than 64
for the significand. As far as I know, though, there is no
restriction in C or C++ against a larger double, such as
a 64 bit float and 128 bit double.

-- glen
 
B

Ben Bacarisse

BGB said:
also, even though double has more bits than, say, an integer, does not
mean it will reliably encode an integer's value (it can do so in
theory, and will most often do so, but whether or not it will actually
always do so is more "up for grabs").

It's not up for grabs in C (and C++ is essentially the same in this
regard). If the integer can be represented exactly in the floating
point type, it must be.

float or double to int, will generally truncate the value, yielding
the integer representation as rounded towards 0.

That's true of the truncated value can be represented as an int. If
not, the behaviour is undefined. For example, in the example that
triggered this thread my implementation produces zero as the result.
the result then is a tendency for an int->double->int conversion to
have a small chance to drop the integer value towards 0 (why? I don't
know exactly, but I have observed it before).

If the int is "in range" you don't have a conforming C implementation.
one can counteract this by fudging the value with a small epsilon
prior to converting back into an integer.

say, for example (untested, from memory):
(v>=0)?((int)(v+0.0001)):((int)(v-0.0001));

I can't see how this helps. If v is representable exactly as a double,
the round trip has no effect so this code is not needed. Can you give me
a use-case?
 
M

MikeWhy

James said:
You've covered the correct point, but have apparently not realized
that it was relevant. He'd talking about converting an integer value
to a floating point type when the floating point type has insufficient
precision to encode the value correctly. This is less than clear,
because he's making all kinds of implementation-specific assumptions
about the sizes of various types, and (inconsistently) using 'integer'
as if it were synonymous with 'int'.

The key point is that, for instance, a 32-bit integer type can
represent values too large to be converted without loss of precision
to a 32-bit floating point type, because the floating point type uses
some of those bits for the exponent. The same thing applies to 64 bit
integer types and and 64 bit floating point types.

Point taken, which is the OP point of 64-bit ULL in a 53 bit mantissa (or
32-bit int in a single precision float). This is easily understood and
documented for the architecture in numeric_limits. Still wondering here
about the "up for grabs" part. It seems to imply some edge condition that
isn't accounted for.
 
J

James Kuyper

On 02/13/2012 04:52 PM, glen herrmannsfeldt wrote:
....
for the significand. As far as I know, though, there is no
restriction in C or C++ against a larger double, such as
a 64 bit float and 128 bit double.

True. Such a restriction applies only to C implementations that
pre-#define __STDC_IEC_559__, in which case "The double type matches the
IEC 60559 double format" (F.2p1).

It's not clear to me that the C++ standard makes any such requirement if
std::numeric_limits<double>.is_iec559 is true. It seem to me that
is_iec559 could still be true if double is an extended double IEC 559 type.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top