double cast to int reliable?

Discussion in 'C Programming' started by sandeep, Jun 2, 2010.

  1. sandeep

    sandeep Guest

    In the following code:

    int i,j;
    double d;

    i = 1; // or any integer
    d = (double)i;
    j = (int)d;

    Is there any chance that "i" will not equal "j" due to the double
    being stored inexactly?
     
    sandeep, Jun 2, 2010
    #1
    1. Advertising

  2. sandeep

    Sjouke Burry Guest

    sandeep wrote:
    > In the following code:
    >
    > int i,j;
    > double d;
    >
    > i = 1; // or any integer
    > d = (double)i;
    > j = (int)d;
    >
    > Is there any chance that "i" will not equal "j" due to the double
    > being stored inexactly?

    Yep. Rounding while converting do double will for most integers
    mean that the double is slightly smaller then the int.
    converting then to int, will not give you the original.
     
    Sjouke Burry, Jun 2, 2010
    #2
    1. Advertising

  3. sandeep

    Dann Corbit Guest

    In article <hu6gg8$lim$>, says...
    >
    > In the following code:
    >
    > int i,j;
    > double d;
    >
    > i = 1; // or any integer
    > d = (double)i;
    > j = (int)d;
    >
    > Is there any chance that "i" will not equal "j" due to the double
    > being stored inexactly?


    It is possible for int to be 64 bits, and represent values as large as
    (for instance) 9223372036854775807.

    It is possible for double to have as little as 6-7 significant digits,
    though for the most part you will see 15-16 significant digits.

    It is possible (though unlikely) to see the problem you describe. I
    would be very surprised to see a system with 64 bit ints and 32 bit
    doubles. But even with 64 bit ints and 64 bit doubles, the int values
    will have greater precision because they do not store an exponent.

    Far more likely is this

    double d = <some value>;
    int i = d; /* undefined behavior due to integer overflow */

    At this point, i may not be equal to floor(d) if floor(d) is not
    representible as an integer.

    There will always be some situations where information can be lost
    because it is very unlikely that the precisions are identical.

    Hence, if you want the integral value that is stored in a double, far
    better is:

    double integral_part = floor(some_double);
     
    Dann Corbit, Jun 2, 2010
    #3
  4. sandeep

    Dann Corbit Guest

    In article <4c06c7db$0$14122$>,
    says...
    >
    > sandeep wrote:
    > > In the following code:
    > >
    > > int i,j;
    > > double d;
    > >
    > > i = 1; // or any integer
    > > d = (double)i;
    > > j = (int)d;
    > >
    > > Is there any chance that "i" will not equal "j" due to the double
    > > being stored inexactly?

    > Yep. Rounding while converting do double will for most integers
    > mean that the double is slightly smaller then the int.
    > converting then to int, will not give you the original.


    Since he specified an integer assignment:
    > > i = 1; // or any integer

    the difficulties are not due to rounding, as I see it.
     
    Dann Corbit, Jun 2, 2010
    #4
  5. sandeep

    Tim Streater Guest

    In article <4c06c7db$0$14122$>,
    Sjouke Burry <> wrote:

    > sandeep wrote:
    > > In the following code:
    > >
    > > int i,j;
    > > double d;
    > >
    > > i = 1; // or any integer
    > > d = (double)i;
    > > j = (int)d;
    > >
    > > Is there any chance that "i" will not equal "j" due to the double
    > > being stored inexactly?

    > Yep. Rounding while converting do double will for most integers
    > mean that the double is slightly smaller then the int.
    > converting then to int, will not give you the original.


    Won't this be exact if the integer in question occupies fewer bits than
    the mantissa size in bits? On the CDC 6600 (60-bit word), all integer
    arithmetic was in fact done by the floating point unit (apart from
    integer addition), so integers were limited to 48 bits (mantissa length).

    --
    Tim

    "That excessive bail ought not to be required, nor excessive fines imposed,
    nor cruel and unusual punishments inflicted" -- Bill of Rights 1689
     
    Tim Streater, Jun 2, 2010
    #5
  6. sandeep

    Ben Pfaff Guest

    Dann Corbit <> writes:

    > It is possible for double to have as little as 6-7 significant digits,
    > though for the most part you will see 15-16 significant digits.


    The 'float' type must have at least 6 significant digits.
    The 'double' and 'long double' types must have at least 10
    significant digits.
    --
    char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
    ={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
    =b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
    2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}
     
    Ben Pfaff, Jun 2, 2010
    #6
  7. Tim Streater <> writes:

    > In article <4c06c7db$0$14122$>,
    > Sjouke Burry <> wrote:
    >
    >> sandeep wrote:
    >> > In the following code:
    >> >
    >> > int i,j;
    >> > double d;
    >> >
    >> > i = 1; // or any integer
    >> > d = (double)i;
    >> > j = (int)d;
    >> >
    >> > Is there any chance that "i" will not equal "j" due to the double
    >> > being stored inexactly?

    >> Yep. Rounding while converting do double will for most integers
    >> mean that the double is slightly smaller then the int.
    >> converting then to int, will not give you the original.

    >
    > Won't this be exact if the integer in question occupies fewer bits than
    > the mantissa size in bits?


    Yes. 6.3.1.4 p2 (part of the section on conversions) starts:

    When a value of integer type is converted to a real floating type, if
    the value being converted can be represented exactly in the new type,
    it is unchanged.

    The standard does not use the term mantissa but section 5.2.4.2.2
    ("Characteristics of floating types") defines C's model of floating
    types in such a way that the expected range of integers will be exactly
    representable.

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Jun 2, 2010
    #7
  8. sandeep

    kathir Guest

    On Jun 2, 1:55 pm, sandeep <> wrote:
    > In the following code:
    >
    > int i,j;
    > double d;
    >
    > i = 1; // or any integer
    > d = (double)i;
    > j = (int)d;
    >
    > Is there any chance that "i" will not equal "j" due to the double
    > being stored inexactly?


    The way how floating point numbers are stored internally are
    different, uses mantissa and exponent portion. If you do any floating
    point calculation (multiplication and division) and convert back to
    integer, you will see a minor difference between int and double value.
    To understand the bit pattern of floating points, visit at
    http://softwareandfinance.com/Research_Floating_Point_Ind.html

    Thanks and Regards,
    Kathir
    http://programming.softwareandfinance.com
     
    kathir, Jun 3, 2010
    #8
  9. On 2 June, 22:06, Sjouke Burry <>
    wrote:
    > sandeep wrote:
    > > In the following code:

    >
    > > int i,j;
    > > double d;

    >
    > > i = 1; // or any integer
    > > d = (double)i;
    > > j = (int)d;

    >
    > > Is there any chance that "i" will not equal "j" due to the double
    > > being stored inexactly?

    >
    > Yep. Rounding while converting do double will for most integers
    > mean that the double is slightly smaller then the int.
    > converting then to int, will not give you the original.


    really? Can you name an implementation where this is so? Is it a valid
    implementation of C?
     
    Nick Keighley, Jun 3, 2010
    #9
  10. On 3 June, 00:58, kathir <> wrote:
    > On Jun 2, 1:55 pm, sandeep <> wrote:
    >
    > > In the following code:

    >
    > > int i,j;
    > > double d;

    >
    > > i = 1; // or any integer
    > > d = (double)i;
    > > j = (int)d;

    >
    > > Is there any chance that "i" will not equal "j" due to the double
    > > being stored inexactly?

    >
    > The way how floating point numbers are stored internally are
    > different, uses mantissa and exponent portion. If you do any floating
    > point calculation (multiplication and division) and convert back to
    > integer, you will see a minor difference between int and double value.


    depending what operations you do you might see huge differences


    > To understand the bit pattern of floating points, visit athttp://softwareandfinance.com/Research_Floating_Point_Ind.html
    >
    > Thanks and Regards,
    > Kathirhttp://programming.softwareandfinance.com
     
    Nick Keighley, Jun 3, 2010
    #10
  11. sandeep

    Seebs Guest

    On 2010-06-03, Nick Keighley <> wrote:
    > On 2 June, 22:06, Sjouke Burry <>
    > wrote:
    >> Yep. Rounding while converting do double will for most integers
    >> mean that the double is slightly smaller then the int.
    >> converting then to int, will not give you the original.


    > really? Can you name an implementation where this is so? Is it a valid
    > implementation of C?


    The obvious case would be a machine where both int and double are 64-bit,
    at which point, it's pretty obvious that for the vast majority of positive
    integers, the conversion to double will at the very least change the
    value, and I think I've seen it round down, so...

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Jun 3, 2010
    #11
  12. Seebs <> writes:
    > On 2010-06-03, Nick Keighley <> wrote:
    >> On 2 June, 22:06, Sjouke Burry <>
    >> wrote:
    >>> Yep. Rounding while converting do double will for most integers
    >>> mean that the double is slightly smaller then the int.
    >>> converting then to int, will not give you the original.

    >
    >> really? Can you name an implementation where this is so? Is it a valid
    >> implementation of C?

    >
    > The obvious case would be a machine where both int and double are 64-bit,
    > at which point, it's pretty obvious that for the vast majority of positive
    > integers, the conversion to double will at the very least change the
    > value, and I think I've seen it round down, so...


    Round down or round to zero? If the latter, then it's not the case
    that "most" integers yield a slightly smaller double when converted
    (unless "smaller" means closer to zero). But yes, this is just
    nitpicking.

    The point is that the standard requires the conversion of an integer
    to a floating-point type to yield an exact result when that result
    can be represented (C99 6.3.1.4), and the floating-point model
    imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
    values must be exactly representable. That range might not cover
    the full range of any integer type (even long double might not be
    able to represent CHAR_MAX if CHAR_BIT is big enough).

    In particular, converting the value 1 from int to double and back
    to int is guaranteed to yield 1; if it doesn't, your implementation
    is non-conforming.

    There's a common idea that floating-point values can never be
    anything more than approximations, and that no floating-point
    operation is guaranteed to yield an exact result, but the reality
    of it isn't that simple. It might be safer to *assume* that all
    such operations are approximate but there are things you can get
    away with if you know what you're doing. The trouble is that, even
    if you know what you're doing, it can be very easy to accidentally
    get outside the range in which the guarantees apply; you can use
    double to represent exact integers, but there's no warning when you
    exceed the range where that works.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jun 3, 2010
    #12
  13. On Thu, 3 Jun 2010, Keith Thompson wrote:

    > The trouble is that, even if you know what you're doing, it can be very
    > easy to accidentally get outside the range in which the guarantees
    > apply; you can use double to represent exact integers, but there's no
    > warning when you exceed the range where that works.


    For any unsigned type that has no more bits than 612,787,565,149,966; that
    is, any conceivable unsigned type, the following is a sufficient condition
    to store any value of said type in a "long double":

    ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
    <= LDBL_DIG

    For uint32_t, the left side evaluates to 10, and both DBL_DIG and LDBL_DIG
    must be at least 10 on any conformant platform.

    After the conversion to the chosen floating point type, eg. long double,
    one must track the possible ranges in every floating point expression
    involved, and make sure that any evaluation can't exceed "limit", which
    can be initialized like this:

    char lim_str[LDBL_DIG + 1] = "";
    long double limit;

    (void)sscanf(memset(lim_str, '9', LDBL_DIG), "%Lf", &limit);

    (Of course not exceeding this bound may not be sufficient for converting
    back to "utype", but since "(utype)-1" itself was convertible, this final
    condition is only a simple comparison away.)

    --o--

    The number of full decimal digits needed to represent the C value
    "(utype)-1" is given by the math expression

    ceil(log_10(2 ** numbits - 1))

    "numbits" being the number of value bits in "utype". It is safe to assume
    (or rather, we have to assume) that all bits are value bits. Continuing
    with math formulas, and exploiting log_10 being strictly monotonic and
    ceil being monotonic,

    ceil(log_10(2 ** numbits - 1))
    <= ceil(log_10(2 ** numbits ))
    == ceil(numbits * log_10(2))
    <= ceil(numbits * (30103 / 100000))
    == ceil(numbits * 30103 / 100000)

    which equals the value of the math expression

    floor( (numbits * 30103 + (100000 - 1)) / 100000 )

    Therefore, this integer value is not less than the number of full decimal
    digits needed. As "numbits" increases, this value becomes greater than the
    exact number of decimal places required. The speed of divergence is
    determined by the accuracy of 30103 / 100000 approximating log_10(2), but
    I'm too lazy to try to calculate that relationship.

    BTW, 30103 and 100000 are coprimes (30103 is a prime in its own right),
    thus the smallest positive "numbits" where "numbits * 30103" is an
    integral multiple of 100000 is 100000, which would still make for quite a
    big integer type. Hence we can assume that the remainder of the modular
    division "numbits * 30103 / 100000" is always nonzero, and the last
    ceiling math expression could be rewritten as

    floor(numbits * 30103 / 100000) + 1

    This simplifies the initial C expression to

    (long long unsigned)sizeof(utype) * CHAR_BIT * 30103 / 100000 < LDBL_DIG

    Unfortunately, the entire approach falls on its face with uint64_t and an
    extended precision (1 + 15 + 64 = 80 bits) "long double", even though the
    significand has the required number of bits available. (As said above, the
    condition is only sufficient, not necessary.)

    The problem is that the method above works with entire base 10 digits. The
    decimal representation of UINT64_MAX needs 20 places (19 full places and a
    "fractional place", rounded up to 20), but the 64 bit significand only
    provides for 19 whole decimal places, and the comparison is done in whole
    decimal places. What's worse, an extended precision "long double" can only
    allow for an LDBL_DIG of 18 (as my platform defines it), presumably
    because (and I'm peeking at C99 5.2.4.2.2 p8) "long double" must
    "accomodate" not only integers with LDBL_DIG decimal places, but also any
    decimal fraction with LDBL_DIG digits. The exponent of the "long double"
    stores the position of the *binary* point, not that of the *decimal*
    point, and this probably sacrifices a further decimal digit.

    (I gave you some material to shred, please be gentle while shredding.)

    Cheers,
    lacos
     
    Ersek, Laszlo, Jun 3, 2010
    #13
  14. "Ersek, Laszlo" <> writes:
    > On Thu, 3 Jun 2010, Keith Thompson wrote:
    >
    >> The trouble is that, even if you know what you're doing, it can be very
    >> easy to accidentally get outside the range in which the guarantees
    >> apply; you can use double to represent exact integers, but there's no
    >> warning when you exceed the range where that works.

    >
    > For any unsigned type that has no more bits than 612,787,565,149,966; that
    > is, any conceivable unsigned type, the following is a sufficient condition
    > to store any value of said type in a "long double":
    >
    > ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
    > <= LDBL_DIG


    612,787,565,149,966 can be represented in 50 bits.
    unsigned long long is at least 64 bits.

    Inconceivable? "I do not think that word means what you think it means."

    [snip]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jun 3, 2010
    #14
  15. On Thu, 3 Jun 2010, Keith Thompson wrote:

    > "Ersek, Laszlo" <> writes:
    >> On Thu, 3 Jun 2010, Keith Thompson wrote:
    >>
    >>> The trouble is that, even if you know what you're doing, it can be very
    >>> easy to accidentally get outside the range in which the guarantees
    >>> apply; you can use double to represent exact integers, but there's no
    >>> warning when you exceed the range where that works.

    >>
    >> For any unsigned type that has no more bits than 612,787,565,149,966; that
    >> is, any conceivable unsigned type, the following is a sufficient condition
    >> to store any value of said type in a "long double":
    >>
    >> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
    >> <= LDBL_DIG

    >
    > 612,787,565,149,966 can be represented in 50 bits.
    > unsigned long long is at least 64 bits.
    >
    > Inconceivable? "I do not think that word means what you think it means."


    I believe I wasn't formulating my point carefully enough. Verbatim quote,
    with emphasis added:

    >> For any unsigned type that has no more *bits* than 612,787,565,149,966;



    The range of such an unsigned type would be

    [0 .. 2 ** 612,787,565,149,966 - 1].

    The limit is not arbitrary, it is (for the smallest allowed ULLONG_MAX):

    (ULLONG_MAX - 99999) / 30103

    expressed in C. "unsigned long long" doesn't need to cover the range of
    the type in question, it must be able to represent the *number of bits* in
    it.

    Cheers,
    lacos
     
    Ersek, Laszlo, Jun 3, 2010
    #15
  16. Keith Thompson <> writes:

    > "Ersek, Laszlo" <> writes:
    >> On Thu, 3 Jun 2010, Keith Thompson wrote:
    >>
    >>> The trouble is that, even if you know what you're doing, it can be very
    >>> easy to accidentally get outside the range in which the guarantees
    >>> apply; you can use double to represent exact integers, but there's no
    >>> warning when you exceed the range where that works.

    >>
    >> For any unsigned type that has no more bits than 612,787,565,149,966; that
    >> is, any conceivable unsigned type, the following is a sufficient condition
    >> to store any value of said type in a "long double":
    >>
    >> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
    >> <= LDBL_DIG

    >
    > 612,787,565,149,966 can be represented in 50 bits.
    > unsigned long long is at least 64 bits.
    >
    > Inconceivable? "I do not think that word means what you think it
    > means."


    I'm pretty sure it's a word order confusion. I think he intended "any
    unsigned type that has no more than 612,787,565,149,966 bits". That's
    the maximum number of bits that won't cause the quoted expression to
    fail. I.e. for more than that number of bits, long long unsigned is not
    guaranteed to be able to represent the result.

    Some people might still conceive of such types, but the term is not
    nearly so outlandish in that context.

    --
    Ben.
     
    Ben Bacarisse, Jun 3, 2010
    #16
  17. On Thu, 3 Jun 2010, Ben Bacarisse wrote:

    > Keith Thompson <> writes:
    >
    >> "Ersek, Laszlo" <> writes:
    >>> On Thu, 3 Jun 2010, Keith Thompson wrote:
    >>>
    >>>> The trouble is that, even if you know what you're doing, it can be very
    >>>> easy to accidentally get outside the range in which the guarantees
    >>>> apply; you can use double to represent exact integers, but there's no
    >>>> warning when you exceed the range where that works.
    >>>
    >>> For any unsigned type that has no more bits than 612,787,565,149,966; that
    >>> is, any conceivable unsigned type, the following is a sufficient condition
    >>> to store any value of said type in a "long double":
    >>>
    >>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
    >>> <= LDBL_DIG

    >>
    >> 612,787,565,149,966 can be represented in 50 bits.
    >> unsigned long long is at least 64 bits.
    >>
    >> Inconceivable? "I do not think that word means what you think it
    >> means."

    >
    > I'm pretty sure it's a word order confusion. I think he intended "any
    > unsigned type that has no more than 612,787,565,149,966 bits".


    Yes, thank you. I guess 18 hours of sleep accumulated over four nights is
    not too much. :(

    (I don't need decaf, it's my DSPS [0] that doesn't cooperate with the
    "strictly scheduled" training of this week. It's 01:04 AM in local time,
    again.)

    Cheers,
    lacos

    [0] http://en.wikipedia.org/wiki/Delayed_sleep_phase_syndrome
     
    Ersek, Laszlo, Jun 4, 2010
    #17
  18. sandeep

    Seebs Guest

    On 2010-06-03, Keith Thompson <> wrote:
    >> The obvious case would be a machine where both int and double are 64-bit,
    >> at which point, it's pretty obvious that for the vast majority of positive
    >> integers, the conversion to double will at the very least change the
    >> value, and I think I've seen it round down, so...


    > Round down or round to zero? If the latter, then it's not the case
    > that "most" integers yield a slightly smaller double when converted
    > (unless "smaller" means closer to zero). But yes, this is just
    > nitpicking.


    Which is why I put "positive" in there.

    > The point is that the standard requires the conversion of an integer
    > to a floating-point type to yield an exact result when that result
    > can be represented (C99 6.3.1.4), and the floating-point model
    > imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
    > values must be exactly representable. That range might not cover
    > the full range of any integer type (even long double might not be
    > able to represent CHAR_MAX if CHAR_BIT is big enough).


    Right.

    But the obvious case would be 64-bit int and 64-bit double. Look at it
    this way. Assume a typical mantissa/exponent system. Assume that there
    are 58 bits of mantissa. There's 58 bits of numbers that can be represented
    exactly, you can represent half of the numbers in the 59-bit range, 1/4 of
    the numbers in the 60-bit range... And it turns out that this means that,
    of the 63-bit range of int, a very small number of values (rough order of
    1/16?) can be represented exactly in a double.

    Now, as it happens, 99% of the numbers I've ever used in a C program are
    in that range.

    > The trouble is that, even
    > if you know what you're doing, it can be very easy to accidentally
    > get outside the range in which the guarantees apply; you can use
    > double to represent exact integers, but there's no warning when you
    > exceed the range where that works.


    Yes.

    For plain float, on the systems I've tried, the boundary seems to be about
    2^24; 2^24+1 cannot be represented exactly in a 32-bit float. I wouldn't
    be surprised to find that double came out somewhere near 2^48+1 as the first
    positive integer value that couldn't be represented.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Jun 4, 2010
    #18
  19. "Ersek, Laszlo" <> writes:
    > On Thu, 3 Jun 2010, Keith Thompson wrote:
    >
    >> "Ersek, Laszlo" <> writes:
    >>> On Thu, 3 Jun 2010, Keith Thompson wrote:
    >>>
    >>>> The trouble is that, even if you know what you're doing, it can be very
    >>>> easy to accidentally get outside the range in which the guarantees
    >>>> apply; you can use double to represent exact integers, but there's no
    >>>> warning when you exceed the range where that works.
    >>>
    >>> For any unsigned type that has no more bits than 612,787,565,149,966; that
    >>> is, any conceivable unsigned type, the following is a sufficient condition
    >>> to store any value of said type in a "long double":
    >>>
    >>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
    >>> <= LDBL_DIG

    >>
    >> 612,787,565,149,966 can be represented in 50 bits.
    >> unsigned long long is at least 64 bits.
    >>
    >> Inconceivable? "I do not think that word means what you think it means."

    >
    > I believe I wasn't formulating my point carefully enough. Verbatim
    > quote, with emphasis added:
    >
    >>> For any unsigned type that has no more *bits* than 612,787,565,149,966;


    Ok, I see what you mean. ("no more than ... bits" would have been
    clearer.)

    > The range of such an unsigned type would be
    >
    > [0 .. 2 ** 612,787,565,149,966 - 1].
    >
    > The limit is not arbitrary, it is (for the smallest allowed ULLONG_MAX):
    >
    > (ULLONG_MAX - 99999) / 30103
    >
    > expressed in C. "unsigned long long" doesn't need to cover the range
    > of the type in question, it must be able to represent the *number of
    > bits* in it.


    And the formula doesn't say "yes" for smaller types and "no" for
    bigger ones; it breaks down for really huge types, right?

    When I have time, I'll have to go back and re-read what you wrote.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jun 4, 2010
    #19
  20. Seebs <> writes:
    > On 2010-06-03, Keith Thompson <> wrote:
    >>> The obvious case would be a machine where both int and double are 64-bit,
    >>> at which point, it's pretty obvious that for the vast majority of positive
    >>> integers, the conversion to double will at the very least change the
    >>> value, and I think I've seen it round down, so...

    >
    >> Round down or round to zero? If the latter, then it's not the case
    >> that "most" integers yield a slightly smaller double when converted
    >> (unless "smaller" means closer to zero). But yes, this is just
    >> nitpicking.

    >
    > Which is why I put "positive" in there.


    Which I very cleverly failed to notice. *sigh*

    >> The point is that the standard requires the conversion of an integer
    >> to a floating-point type to yield an exact result when that result
    >> can be represented (C99 6.3.1.4), and the floating-point model
    >> imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
    >> values must be exactly representable. That range might not cover
    >> the full range of any integer type (even long double might not be
    >> able to represent CHAR_MAX if CHAR_BIT is big enough).

    >
    > Right.
    >
    > But the obvious case would be 64-bit int and 64-bit double. Look at it
    > this way. Assume a typical mantissa/exponent system. Assume that there
    > are 58 bits of mantissa. There's 58 bits of numbers that can be represented
    > exactly, you can represent half of the numbers in the 59-bit range, 1/4 of
    > the numbers in the 60-bit range... And it turns out that this means that,
    > of the 63-bit range of int, a very small number of values (rough order of
    > 1/16?) can be represented exactly in a double.


    Yes, that's the obvious case. My point, which I didn't express very
    clearly, is that it's possible that *every* integer type has values
    that can't be exactly represented in *any* floating-point type.
    I know of no such systems in real life, but a system where everything
    from char to long long and from float to long double is exactly 64
    bits is certainly plausible (the Cray T90 I keep bringing up made
    char 8 bits only for compatibility with other Unix-like systems; all
    other arithmetic types were 64 bits).

    An implementation could have integer values don't just lose precision
    but *overflow* when converted to a floating-point type. On my
    system, FLT_MAX is slightly less than 2**128, so (float)UINT128_MAX
    would overflow if uint128_t existed.

    > Now, as it happens, 99% of the numbers I've ever used in a C program are
    > in that range.


    You counted? :cool:}

    >> The trouble is that, even
    >> if you know what you're doing, it can be very easy to accidentally
    >> get outside the range in which the guarantees apply; you can use
    >> double to represent exact integers, but there's no warning when you
    >> exceed the range where that works.

    >
    > Yes.
    >
    > For plain float, on the systems I've tried, the boundary seems to be about
    > 2^24; 2^24+1 cannot be represented exactly in a 32-bit float. I wouldn't
    > be surprised to find that double came out somewhere near 2^48+1 as the first
    > positive integer value that couldn't be represented.


    It's more likely to be 2^53-1, assuming IEEE floating-point; look at the
    values of FLT_MANT_DIG and DBL_MANT_DIG.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jun 4, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sydex
    Replies:
    12
    Views:
    6,646
    Victor Bazarov
    Feb 17, 2005
  2. Schnoffos
    Replies:
    2
    Views:
    1,251
    Martien Verbruggen
    Jun 27, 2003
  3. Hal Styli
    Replies:
    14
    Views:
    1,709
    Old Wolf
    Jan 20, 2004
  4. spasmous

    double cast to int reliable?

    spasmous, Jan 13, 2009, in forum: C Programming
    Replies:
    14
    Views:
    3,913
    James Kuyper
    Jan 14, 2009
  5. Shriramana Sharma
    Replies:
    8
    Views:
    295
    Gerhard Fiedler
    Jun 18, 2013
Loading...

Share This Page