# double cast to int reliable?

Discussion in 'C Programming' started by sandeep, Jun 2, 2010.

1. ### sandeepGuest

In the following code:

int i,j;
double d;

i = 1; // or any integer
d = (double)i;
j = (int)d;

Is there any chance that "i" will not equal "j" due to the double
being stored inexactly?

sandeep, Jun 2, 2010

2. ### Sjouke BurryGuest

sandeep wrote:
> In the following code:
>
> int i,j;
> double d;
>
> i = 1; // or any integer
> d = (double)i;
> j = (int)d;
>
> Is there any chance that "i" will not equal "j" due to the double
> being stored inexactly?

Yep. Rounding while converting do double will for most integers
mean that the double is slightly smaller then the int.
converting then to int, will not give you the original.

Sjouke Burry, Jun 2, 2010

3. ### Dann CorbitGuest

In article <hu6gg8\$lim\$>, says...
>
> In the following code:
>
> int i,j;
> double d;
>
> i = 1; // or any integer
> d = (double)i;
> j = (int)d;
>
> Is there any chance that "i" will not equal "j" due to the double
> being stored inexactly?

It is possible for int to be 64 bits, and represent values as large as
(for instance) 9223372036854775807.

It is possible for double to have as little as 6-7 significant digits,
though for the most part you will see 15-16 significant digits.

It is possible (though unlikely) to see the problem you describe. I
would be very surprised to see a system with 64 bit ints and 32 bit
doubles. But even with 64 bit ints and 64 bit doubles, the int values
will have greater precision because they do not store an exponent.

Far more likely is this

double d = <some value>;
int i = d; /* undefined behavior due to integer overflow */

At this point, i may not be equal to floor(d) if floor(d) is not
representible as an integer.

There will always be some situations where information can be lost
because it is very unlikely that the precisions are identical.

Hence, if you want the integral value that is stored in a double, far
better is:

double integral_part = floor(some_double);

Dann Corbit, Jun 2, 2010
4. ### Dann CorbitGuest

In article <4c06c7db\$0\$14122\$>,
says...
>
> sandeep wrote:
> > In the following code:
> >
> > int i,j;
> > double d;
> >
> > i = 1; // or any integer
> > d = (double)i;
> > j = (int)d;
> >
> > Is there any chance that "i" will not equal "j" due to the double
> > being stored inexactly?

> Yep. Rounding while converting do double will for most integers
> mean that the double is slightly smaller then the int.
> converting then to int, will not give you the original.

Since he specified an integer assignment:
> > i = 1; // or any integer

the difficulties are not due to rounding, as I see it.

Dann Corbit, Jun 2, 2010
5. ### Tim StreaterGuest

In article <4c06c7db\$0\$14122\$>,
Sjouke Burry <> wrote:

> sandeep wrote:
> > In the following code:
> >
> > int i,j;
> > double d;
> >
> > i = 1; // or any integer
> > d = (double)i;
> > j = (int)d;
> >
> > Is there any chance that "i" will not equal "j" due to the double
> > being stored inexactly?

> Yep. Rounding while converting do double will for most integers
> mean that the double is slightly smaller then the int.
> converting then to int, will not give you the original.

Won't this be exact if the integer in question occupies fewer bits than
the mantissa size in bits? On the CDC 6600 (60-bit word), all integer
arithmetic was in fact done by the floating point unit (apart from
integer addition), so integers were limited to 48 bits (mantissa length).

--
Tim

"That excessive bail ought not to be required, nor excessive fines imposed,
nor cruel and unusual punishments inflicted" -- Bill of Rights 1689

Tim Streater, Jun 2, 2010
6. ### Ben PfaffGuest

Dann Corbit <> writes:

> It is possible for double to have as little as 6-7 significant digits,
> though for the most part you will see 15-16 significant digits.

The 'float' type must have at least 6 significant digits.
The 'double' and 'long double' types must have at least 10
significant digits.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1utchar(a[i&15]);break;}}}

Ben Pfaff, Jun 2, 2010
7. ### Ben BacarisseGuest

Tim Streater <> writes:

> In article <4c06c7db\$0\$14122\$>,
> Sjouke Burry <> wrote:
>
>> sandeep wrote:
>> > In the following code:
>> >
>> > int i,j;
>> > double d;
>> >
>> > i = 1; // or any integer
>> > d = (double)i;
>> > j = (int)d;
>> >
>> > Is there any chance that "i" will not equal "j" due to the double
>> > being stored inexactly?

>> Yep. Rounding while converting do double will for most integers
>> mean that the double is slightly smaller then the int.
>> converting then to int, will not give you the original.

>
> Won't this be exact if the integer in question occupies fewer bits than
> the mantissa size in bits?

Yes. 6.3.1.4 p2 (part of the section on conversions) starts:

When a value of integer type is converted to a real floating type, if
the value being converted can be represented exactly in the new type,
it is unchanged.

The standard does not use the term mantissa but section 5.2.4.2.2
("Characteristics of floating types") defines C's model of floating
types in such a way that the expected range of integers will be exactly
representable.

<snip>
--
Ben.

Ben Bacarisse, Jun 2, 2010
8. ### kathirGuest

On Jun 2, 1:55 pm, sandeep <> wrote:
> In the following code:
>
> int i,j;
> double d;
>
> i = 1; // or any integer
> d = (double)i;
> j = (int)d;
>
> Is there any chance that "i" will not equal "j" due to the double
> being stored inexactly?

The way how floating point numbers are stored internally are
different, uses mantissa and exponent portion. If you do any floating
point calculation (multiplication and division) and convert back to
integer, you will see a minor difference between int and double value.
To understand the bit pattern of floating points, visit at
http://softwareandfinance.com/Research_Floating_Point_Ind.html

Thanks and Regards,
Kathir
http://programming.softwareandfinance.com

kathir, Jun 3, 2010
9. ### Nick KeighleyGuest

On 2 June, 22:06, Sjouke Burry <>
wrote:
> sandeep wrote:
> > In the following code:

>
> > int i,j;
> > double d;

>
> > i = 1; // or any integer
> > d = (double)i;
> > j = (int)d;

>
> > Is there any chance that "i" will not equal "j" due to the double
> > being stored inexactly?

>
> Yep. Rounding while converting do double will for most integers
> mean that the double is slightly smaller then the int.
> converting then to int, will not give you the original.

really? Can you name an implementation where this is so? Is it a valid
implementation of C?

Nick Keighley, Jun 3, 2010
10. ### Nick KeighleyGuest

On 3 June, 00:58, kathir <> wrote:
> On Jun 2, 1:55 pm, sandeep <> wrote:
>
> > In the following code:

>
> > int i,j;
> > double d;

>
> > i = 1; // or any integer
> > d = (double)i;
> > j = (int)d;

>
> > Is there any chance that "i" will not equal "j" due to the double
> > being stored inexactly?

>
> The way how floating point numbers are stored internally are
> different, uses mantissa and exponent portion. If you do any floating
> point calculation (multiplication and division) and convert back to
> integer, you will see a minor difference between int and double value.

depending what operations you do you might see huge differences

> To understand the bit pattern of floating points, visit athttp://softwareandfinance.com/Research_Floating_Point_Ind.html
>
> Thanks and Regards,
> Kathirhttp://programming.softwareandfinance.com

Nick Keighley, Jun 3, 2010
11. ### SeebsGuest

On 2010-06-03, Nick Keighley <> wrote:
> On 2 June, 22:06, Sjouke Burry <>
> wrote:
>> Yep. Rounding while converting do double will for most integers
>> mean that the double is slightly smaller then the int.
>> converting then to int, will not give you the original.

> really? Can you name an implementation where this is so? Is it a valid
> implementation of C?

The obvious case would be a machine where both int and double are 64-bit,
at which point, it's pretty obvious that for the vast majority of positive
integers, the conversion to double will at the very least change the
value, and I think I've seen it round down, so...

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach /
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Seebs, Jun 3, 2010
12. ### Keith ThompsonGuest

Seebs <> writes:
> On 2010-06-03, Nick Keighley <> wrote:
>> On 2 June, 22:06, Sjouke Burry <>
>> wrote:
>>> Yep. Rounding while converting do double will for most integers
>>> mean that the double is slightly smaller then the int.
>>> converting then to int, will not give you the original.

>
>> really? Can you name an implementation where this is so? Is it a valid
>> implementation of C?

>
> The obvious case would be a machine where both int and double are 64-bit,
> at which point, it's pretty obvious that for the vast majority of positive
> integers, the conversion to double will at the very least change the
> value, and I think I've seen it round down, so...

Round down or round to zero? If the latter, then it's not the case
that "most" integers yield a slightly smaller double when converted
(unless "smaller" means closer to zero). But yes, this is just
nitpicking.

The point is that the standard requires the conversion of an integer
to a floating-point type to yield an exact result when that result
can be represented (C99 6.3.1.4), and the floating-point model
imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
values must be exactly representable. That range might not cover
the full range of any integer type (even long double might not be
able to represent CHAR_MAX if CHAR_BIT is big enough).

In particular, converting the value 1 from int to double and back
to int is guaranteed to yield 1; if it doesn't, your implementation
is non-conforming.

There's a common idea that floating-point values can never be
anything more than approximations, and that no floating-point
operation is guaranteed to yield an exact result, but the reality
of it isn't that simple. It might be safer to *assume* that all
such operations are approximate but there are things you can get
away with if you know what you're doing. The trouble is that, even
if you know what you're doing, it can be very easy to accidentally
get outside the range in which the guarantees apply; you can use
double to represent exact integers, but there's no warning when you
exceed the range where that works.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jun 3, 2010
13. ### Ersek, LaszloGuest

On Thu, 3 Jun 2010, Keith Thompson wrote:

> The trouble is that, even if you know what you're doing, it can be very
> easy to accidentally get outside the range in which the guarantees
> apply; you can use double to represent exact integers, but there's no
> warning when you exceed the range where that works.

For any unsigned type that has no more bits than 612,787,565,149,966; that
is, any conceivable unsigned type, the following is a sufficient condition
to store any value of said type in a "long double":

((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
<= LDBL_DIG

For uint32_t, the left side evaluates to 10, and both DBL_DIG and LDBL_DIG
must be at least 10 on any conformant platform.

After the conversion to the chosen floating point type, eg. long double,
one must track the possible ranges in every floating point expression
involved, and make sure that any evaluation can't exceed "limit", which
can be initialized like this:

char lim_str[LDBL_DIG + 1] = "";
long double limit;

(void)sscanf(memset(lim_str, '9', LDBL_DIG), "%Lf", &limit);

(Of course not exceeding this bound may not be sufficient for converting
back to "utype", but since "(utype)-1" itself was convertible, this final
condition is only a simple comparison away.)

--o--

The number of full decimal digits needed to represent the C value
"(utype)-1" is given by the math expression

ceil(log_10(2 ** numbits - 1))

"numbits" being the number of value bits in "utype". It is safe to assume
(or rather, we have to assume) that all bits are value bits. Continuing
with math formulas, and exploiting log_10 being strictly monotonic and
ceil being monotonic,

ceil(log_10(2 ** numbits - 1))
<= ceil(log_10(2 ** numbits ))
== ceil(numbits * log_10(2))
<= ceil(numbits * (30103 / 100000))
== ceil(numbits * 30103 / 100000)

which equals the value of the math expression

floor( (numbits * 30103 + (100000 - 1)) / 100000 )

Therefore, this integer value is not less than the number of full decimal
digits needed. As "numbits" increases, this value becomes greater than the
exact number of decimal places required. The speed of divergence is
determined by the accuracy of 30103 / 100000 approximating log_10(2), but
I'm too lazy to try to calculate that relationship.

BTW, 30103 and 100000 are coprimes (30103 is a prime in its own right),
thus the smallest positive "numbits" where "numbits * 30103" is an
integral multiple of 100000 is 100000, which would still make for quite a
big integer type. Hence we can assume that the remainder of the modular
division "numbits * 30103 / 100000" is always nonzero, and the last
ceiling math expression could be rewritten as

floor(numbits * 30103 / 100000) + 1

This simplifies the initial C expression to

(long long unsigned)sizeof(utype) * CHAR_BIT * 30103 / 100000 < LDBL_DIG

Unfortunately, the entire approach falls on its face with uint64_t and an
extended precision (1 + 15 + 64 = 80 bits) "long double", even though the
significand has the required number of bits available. (As said above, the
condition is only sufficient, not necessary.)

The problem is that the method above works with entire base 10 digits. The
decimal representation of UINT64_MAX needs 20 places (19 full places and a
"fractional place", rounded up to 20), but the 64 bit significand only
provides for 19 whole decimal places, and the comparison is done in whole
decimal places. What's worse, an extended precision "long double" can only
allow for an LDBL_DIG of 18 (as my platform defines it), presumably
because (and I'm peeking at C99 5.2.4.2.2 p8) "long double" must
"accomodate" not only integers with LDBL_DIG decimal places, but also any
decimal fraction with LDBL_DIG digits. The exponent of the "long double"
stores the position of the *binary* point, not that of the *decimal*
point, and this probably sacrifices a further decimal digit.

(I gave you some material to shred, please be gentle while shredding.)

Cheers,
lacos

Ersek, Laszlo, Jun 3, 2010
14. ### Keith ThompsonGuest

"Ersek, Laszlo" <> writes:
> On Thu, 3 Jun 2010, Keith Thompson wrote:
>
>> The trouble is that, even if you know what you're doing, it can be very
>> easy to accidentally get outside the range in which the guarantees
>> apply; you can use double to represent exact integers, but there's no
>> warning when you exceed the range where that works.

>
> For any unsigned type that has no more bits than 612,787,565,149,966; that
> is, any conceivable unsigned type, the following is a sufficient condition
> to store any value of said type in a "long double":
>
> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
> <= LDBL_DIG

612,787,565,149,966 can be represented in 50 bits.
unsigned long long is at least 64 bits.

Inconceivable? "I do not think that word means what you think it means."

[snip]

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jun 3, 2010
15. ### Ersek, LaszloGuest

On Thu, 3 Jun 2010, Keith Thompson wrote:

> "Ersek, Laszlo" <> writes:
>> On Thu, 3 Jun 2010, Keith Thompson wrote:
>>
>>> The trouble is that, even if you know what you're doing, it can be very
>>> easy to accidentally get outside the range in which the guarantees
>>> apply; you can use double to represent exact integers, but there's no
>>> warning when you exceed the range where that works.

>>
>> For any unsigned type that has no more bits than 612,787,565,149,966; that
>> is, any conceivable unsigned type, the following is a sufficient condition
>> to store any value of said type in a "long double":
>>
>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
>> <= LDBL_DIG

>
> 612,787,565,149,966 can be represented in 50 bits.
> unsigned long long is at least 64 bits.
>
> Inconceivable? "I do not think that word means what you think it means."

I believe I wasn't formulating my point carefully enough. Verbatim quote,

>> For any unsigned type that has no more *bits* than 612,787,565,149,966;

The range of such an unsigned type would be

[0 .. 2 ** 612,787,565,149,966 - 1].

The limit is not arbitrary, it is (for the smallest allowed ULLONG_MAX):

(ULLONG_MAX - 99999) / 30103

expressed in C. "unsigned long long" doesn't need to cover the range of
the type in question, it must be able to represent the *number of bits* in
it.

Cheers,
lacos

Ersek, Laszlo, Jun 3, 2010
16. ### Ben BacarisseGuest

Keith Thompson <> writes:

> "Ersek, Laszlo" <> writes:
>> On Thu, 3 Jun 2010, Keith Thompson wrote:
>>
>>> The trouble is that, even if you know what you're doing, it can be very
>>> easy to accidentally get outside the range in which the guarantees
>>> apply; you can use double to represent exact integers, but there's no
>>> warning when you exceed the range where that works.

>>
>> For any unsigned type that has no more bits than 612,787,565,149,966; that
>> is, any conceivable unsigned type, the following is a sufficient condition
>> to store any value of said type in a "long double":
>>
>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
>> <= LDBL_DIG

>
> 612,787,565,149,966 can be represented in 50 bits.
> unsigned long long is at least 64 bits.
>
> Inconceivable? "I do not think that word means what you think it
> means."

I'm pretty sure it's a word order confusion. I think he intended "any
unsigned type that has no more than 612,787,565,149,966 bits". That's
the maximum number of bits that won't cause the quoted expression to
fail. I.e. for more than that number of bits, long long unsigned is not
guaranteed to be able to represent the result.

Some people might still conceive of such types, but the term is not
nearly so outlandish in that context.

--
Ben.

Ben Bacarisse, Jun 3, 2010
17. ### Ersek, LaszloGuest

On Thu, 3 Jun 2010, Ben Bacarisse wrote:

> Keith Thompson <> writes:
>
>> "Ersek, Laszlo" <> writes:
>>> On Thu, 3 Jun 2010, Keith Thompson wrote:
>>>
>>>> The trouble is that, even if you know what you're doing, it can be very
>>>> easy to accidentally get outside the range in which the guarantees
>>>> apply; you can use double to represent exact integers, but there's no
>>>> warning when you exceed the range where that works.
>>>
>>> For any unsigned type that has no more bits than 612,787,565,149,966; that
>>> is, any conceivable unsigned type, the following is a sufficient condition
>>> to store any value of said type in a "long double":
>>>
>>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
>>> <= LDBL_DIG

>>
>> 612,787,565,149,966 can be represented in 50 bits.
>> unsigned long long is at least 64 bits.
>>
>> Inconceivable? "I do not think that word means what you think it
>> means."

>
> I'm pretty sure it's a word order confusion. I think he intended "any
> unsigned type that has no more than 612,787,565,149,966 bits".

Yes, thank you. I guess 18 hours of sleep accumulated over four nights is
not too much.

(I don't need decaf, it's my DSPS [0] that doesn't cooperate with the
"strictly scheduled" training of this week. It's 01:04 AM in local time,
again.)

Cheers,
lacos

[0] http://en.wikipedia.org/wiki/Delayed_sleep_phase_syndrome

Ersek, Laszlo, Jun 4, 2010
18. ### SeebsGuest

On 2010-06-03, Keith Thompson <> wrote:
>> The obvious case would be a machine where both int and double are 64-bit,
>> at which point, it's pretty obvious that for the vast majority of positive
>> integers, the conversion to double will at the very least change the
>> value, and I think I've seen it round down, so...

> Round down or round to zero? If the latter, then it's not the case
> that "most" integers yield a slightly smaller double when converted
> (unless "smaller" means closer to zero). But yes, this is just
> nitpicking.

Which is why I put "positive" in there.

> The point is that the standard requires the conversion of an integer
> to a floating-point type to yield an exact result when that result
> can be represented (C99 6.3.1.4), and the floating-point model
> imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
> values must be exactly representable. That range might not cover
> the full range of any integer type (even long double might not be
> able to represent CHAR_MAX if CHAR_BIT is big enough).

Right.

But the obvious case would be 64-bit int and 64-bit double. Look at it
this way. Assume a typical mantissa/exponent system. Assume that there
are 58 bits of mantissa. There's 58 bits of numbers that can be represented
exactly, you can represent half of the numbers in the 59-bit range, 1/4 of
the numbers in the 60-bit range... And it turns out that this means that,
of the 63-bit range of int, a very small number of values (rough order of
1/16?) can be represented exactly in a double.

Now, as it happens, 99% of the numbers I've ever used in a C program are
in that range.

> The trouble is that, even
> if you know what you're doing, it can be very easy to accidentally
> get outside the range in which the guarantees apply; you can use
> double to represent exact integers, but there's no warning when you
> exceed the range where that works.

Yes.

For plain float, on the systems I've tried, the boundary seems to be about
2^24; 2^24+1 cannot be represented exactly in a 32-bit float. I wouldn't
be surprised to find that double came out somewhere near 2^48+1 as the first
positive integer value that couldn't be represented.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach /
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Seebs, Jun 4, 2010
19. ### Keith ThompsonGuest

"Ersek, Laszlo" <> writes:
> On Thu, 3 Jun 2010, Keith Thompson wrote:
>
>> "Ersek, Laszlo" <> writes:
>>> On Thu, 3 Jun 2010, Keith Thompson wrote:
>>>
>>>> The trouble is that, even if you know what you're doing, it can be very
>>>> easy to accidentally get outside the range in which the guarantees
>>>> apply; you can use double to represent exact integers, but there's no
>>>> warning when you exceed the range where that works.
>>>
>>> For any unsigned type that has no more bits than 612,787,565,149,966; that
>>> is, any conceivable unsigned type, the following is a sufficient condition
>>> to store any value of said type in a "long double":
>>>
>>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
>>> <= LDBL_DIG

>>
>> 612,787,565,149,966 can be represented in 50 bits.
>> unsigned long long is at least 64 bits.
>>
>> Inconceivable? "I do not think that word means what you think it means."

>
> I believe I wasn't formulating my point carefully enough. Verbatim
> quote, with emphasis added:
>
>>> For any unsigned type that has no more *bits* than 612,787,565,149,966;

Ok, I see what you mean. ("no more than ... bits" would have been
clearer.)

> The range of such an unsigned type would be
>
> [0 .. 2 ** 612,787,565,149,966 - 1].
>
> The limit is not arbitrary, it is (for the smallest allowed ULLONG_MAX):
>
> (ULLONG_MAX - 99999) / 30103
>
> expressed in C. "unsigned long long" doesn't need to cover the range
> of the type in question, it must be able to represent the *number of
> bits* in it.

And the formula doesn't say "yes" for smaller types and "no" for
bigger ones; it breaks down for really huge types, right?

When I have time, I'll have to go back and re-read what you wrote.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jun 4, 2010
20. ### Keith ThompsonGuest

Seebs <> writes:
> On 2010-06-03, Keith Thompson <> wrote:
>>> The obvious case would be a machine where both int and double are 64-bit,
>>> at which point, it's pretty obvious that for the vast majority of positive
>>> integers, the conversion to double will at the very least change the
>>> value, and I think I've seen it round down, so...

>
>> Round down or round to zero? If the latter, then it's not the case
>> that "most" integers yield a slightly smaller double when converted
>> (unless "smaller" means closer to zero). But yes, this is just
>> nitpicking.

>
> Which is why I put "positive" in there.

Which I very cleverly failed to notice. *sigh*

>> The point is that the standard requires the conversion of an integer
>> to a floating-point type to yield an exact result when that result
>> can be represented (C99 6.3.1.4), and the floating-point model
>> imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
>> values must be exactly representable. That range might not cover
>> the full range of any integer type (even long double might not be
>> able to represent CHAR_MAX if CHAR_BIT is big enough).

>
> Right.
>
> But the obvious case would be 64-bit int and 64-bit double. Look at it
> this way. Assume a typical mantissa/exponent system. Assume that there
> are 58 bits of mantissa. There's 58 bits of numbers that can be represented
> exactly, you can represent half of the numbers in the 59-bit range, 1/4 of
> the numbers in the 60-bit range... And it turns out that this means that,
> of the 63-bit range of int, a very small number of values (rough order of
> 1/16?) can be represented exactly in a double.

Yes, that's the obvious case. My point, which I didn't express very
clearly, is that it's possible that *every* integer type has values
that can't be exactly represented in *any* floating-point type.
I know of no such systems in real life, but a system where everything
from char to long long and from float to long double is exactly 64
bits is certainly plausible (the Cray T90 I keep bringing up made
char 8 bits only for compatibility with other Unix-like systems; all
other arithmetic types were 64 bits).

An implementation could have integer values don't just lose precision
but *overflow* when converted to a floating-point type. On my
system, FLT_MAX is slightly less than 2**128, so (float)UINT128_MAX
would overflow if uint128_t existed.

> Now, as it happens, 99% of the numbers I've ever used in a C program are
> in that range.

You counted? }

>> The trouble is that, even
>> if you know what you're doing, it can be very easy to accidentally
>> get outside the range in which the guarantees apply; you can use
>> double to represent exact integers, but there's no warning when you
>> exceed the range where that works.

>
> Yes.
>
> For plain float, on the systems I've tried, the boundary seems to be about
> 2^24; 2^24+1 cannot be represented exactly in a 32-bit float. I wouldn't
> be surprised to find that double came out somewhere near 2^48+1 as the first
> positive integer value that couldn't be represented.

It's more likely to be 2^53-1, assuming IEEE floating-point; look at the
values of FLT_MANT_DIG and DBL_MANT_DIG.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jun 4, 2010