# double precise enough for unsigned short?

Discussion in 'C Programming' started by Andy, May 31, 2010.

1. ### AndyGuest

Hello,

I will be storing a series of unsigned shorts (C language)
as doubles. I used gcc 3.2 to print out a bunch of
sizeof statements, and unsigned shorts are 4 bytes.
Doubles are 8 bytes, with the fractional part being 52 bits.
Everything seems to indicate that double has more than
enough precision to represent all possible values of
unsigned short. I'm coding with this assumption in mind.
Just to be sure, can anyone confirm this? My simplistic
reason for assuming this is that 32 bits of the fractional
part of the double can be made nonfractional by shifting
by 32. That's enough nonfractional bits to represent all
32 bits of an unsigned short. To indicate this 32-bit
shift, the exponent part of the double must be 6 bits. This
easily fits into the 11 bits for the exponent of a double.

Despite this justification, I just want to be sure that I'm
not missing something conceptually about the IEEE
number representation. I recall vaguely that the numbers
represented by double are not uniformly distributed, but
the above reasoning seems to indicate that they are good
enough for unsigned shorts.

Thanks.

Andy, May 31, 2010

2. ### SGGuest

On 31 Mai, 23:12, Andy wrote:
>
> I will be storing a series of unsigned shorts (C language)
> as doubles.  I used gcc 3.2 to print out a bunch of
> sizeof statements, and unsigned shorts are 4 bytes.

Interesting. What platform is this?

> Doubles are 8 bytes, with the fractional part being 52 bits.
> Everything seems to indicate that double has more than
> enough precision to represent all possible values of
> unsigned short.  I'm coding with this assumption in mind.
> Just to be sure, can anyone confirm this?

There is (as far as I know) no upper bound on what values can be
represented with unsigned shorts. Typically shorts are 16 bits wide
but could be wider.

If I remember correctly, the C standard requires the macro DBL_DIG
(see float.h) to evaluate to an int of at least 10. So you have at
least 10 significant digits. For IEEE 754 doubles (52 bit mantissas)
you have even more precision. But 10 digits is enough to store 16 bit
integers losslessly. And it might be just enough to store 32 bit
integers losslessly (not 100% about that). But IEEE 754 doubles
certainly will do so.

> Despite this justification, I just want to be sure that I'm
> not missing something conceptually about the IEEE
> number representation.

No, I don't think that you missed something -- except maybe that short
is not restricted in size. If cases where short is 64 bits (or more)
wide an IEEE-754 double won't be able to losslessly hold all short
values.

Cheers!
SG

SG, May 31, 2010

3. ### bart.cGuest

"Andy" <> wrote in message
news:hu18ns\$9ho\$...
> Hello,
>
> I will be storing a series of unsigned shorts (C language)
> as doubles. I used gcc 3.2 to print out a bunch of
> sizeof statements, and unsigned shorts are 4 bytes.
> Doubles are 8 bytes, with the fractional part being 52 bits.
> Everything seems to indicate that double has more than
> enough precision to represent all possible values of
> unsigned short. I'm coding with this assumption in mind.
> Just to be sure, can anyone confirm this? My simplistic
> reason for assuming this is that 32 bits of the fractional
> part of the double can be made nonfractional by shifting
> by 32. That's enough nonfractional bits to represent all
> 32 bits of an unsigned short. To indicate this 32-bit
> shift, the exponent part of the double must be 6 bits. This
> easily fits into the 11 bits for the exponent of a double.
>
> Despite this justification, I just want to be sure that I'm
> not missing something conceptually about the IEEE
> number representation. I recall vaguely that the numbers
> represented by double are not uniformly distributed, but
> the above reasoning seems to indicate that they are good
> enough for unsigned shorts.

You might just try testing every possible value:

#include <stdio.h>
#include <limits.h>

int main(void){
unsigned short i,j;
double x;

i=0;
do {
x=i;
j=x;
if (i!=j)
printf("Can't represent %u as double\n",i);
++i;
} while (i!=0);
}

(Turn off optimisation)

--
Bartc

bart.c, May 31, 2010
4. ### Peter NilssonGuest

Andy <> wrote:
> I will be storing a series of unsigned shorts (C language)
> as [IEEE] doubles.

Why?

--
Peter

Peter Nilsson, Jun 1, 2010
5. ### bart.cGuest

"Richard Heathfield" <> wrote in message
news:...
> In <hmWMn.34334\$8g7.14401@hurricane>, bart.c wrote:

>> You might just try testing every possible value:
>> unsigned short i,j;
>> double x;
>>
>> i=0;
>> do {
>> x=i;
>> j=x;
>> if (i!=j)
>> printf("Can't represent %u as double\n",i);
>> ++i;
>> } while (i!=0);
>> }

>
> If there is genuine cause for concern, unsigned short is sufficiently
> large to make your program rather tedious to run.

16-bit shorts are instant. 32-bit ones took a few minutes on my computer,
but the test is only done once.

With 64-bits shorts, you already know they can't be represented.

> It would make more
> sense to binary-search. Start at USHRT_MAX, and play the max/min
> game.

You mean, to find the point in the range where it stops working? More
sensible then to just test 0 and USHRT_MAX then.

You need to test every possible value if there are any doubts being able to
represent each one (for example if the conversion to and from double is more
elaborate than simple assignment).

>> (Turn off optimisation)

>
> By all means turn off machine optimisation if you think it'll make a
> difference,

It's just to stop the compiler possibly using a register floating point
value (which on my machine has more precision than double).

> but there's no need to turn off design optimisation.

--
Bartc

bart.c, Jun 1, 2010
6. ### SGGuest

On 1 Jun., 11:33, bart.c wrote:
>
> You need to test every possible value if there are any doubts being able to
> represent each one (for example if the conversion to and from double is more
> elaborate than simple assignment).

I think it's safe to assume that the differences between consecutive
representable numbers is a monotonically increasing function. In that
case you only need to check USHRT_MAX and USHRT_MAX-1.

But something of the form

DBL_EPSILON * USHRT_MAX < some_threshold

for some value of some_threshold (between 0.4 and 1.0) is probably a
better way to test this.

> >> (Turn off optimisation)

>
> > By all means turn off machine optimisation if you think it'll make a
> > difference,

>
> It's just to stop the compiler possibly using a register floating point
> value (which on my machine has more precision than double).

Aren't there rules that prohibit a compiler from using such
intermetiate high precision representations beyond the evaluation of a
full expression? I'm not sure. But I think I read something like that
somewhere ... If so, it should suffice to store the double value in a
variable in one expression and read/use the variable's value in
another expression a la

double x = ...;
...x... // new expression

(?)

Cheers!
SG

SG, Jun 1, 2010
7. ### Eric SosmanGuest

On 6/1/2010 4:35 AM, Richard Heathfield wrote:
> In<hmWMn.34334\$8g7.14401@hurricane>, bart.c wrote:
>
> <snip>
>>
>> You might just try testing every possible value:
>>
>> #include<stdio.h>
>> #include<limits.h>
>>
>> int main(void){
>> unsigned short i,j;
>> double x;
>>
>> i=0;
>> do {
>> x=i;
>> j=x;
>> if (i!=j)
>> printf("Can't represent %u as double\n",i);
>> ++i;
>> } while (i!=0);
>> }

>
> If there is genuine cause for concern, unsigned short is sufficiently
> large to make your program rather tedious to run. It would make more
> sense to binary-search. Start at USHRT_MAX, and play the max/min
> game.

Not sure how you'd apply binary search to this problem, since
there's no obvious "order:" each test tells you that a given value
does or does not garble when converted to double and back, and gives
no obvious clue about what happens to larger and smaller values.
Did you have some other order relation in mind?

Anyhow, I think the only value one need test is USHRT_MAX itself,
unless FLT_RADIX is something really, really weird. For the common
FLT_RADIX values of 2 and 16 (and even for the uncommon value 10),
USHRT_MAX will need at least as many non-zero fraction digits as
any other unsigned short value.

--
Eric Sosman
lid

Eric Sosman, Jun 1, 2010
8. ### bart.cGuest

"Richard Heathfield" <> wrote in message
news:...
> bart.c wrote:
>>
>> "Richard Heathfield" <> wrote in message
>> news:...

>
> <snip>
>
>>> If there is genuine cause for concern, unsigned short is sufficiently
>>> large to make your program rather tedious to run.

>>
>> 16-bit shorts are instant. 32-bit ones took a few minutes on my computer,
>> but the test is only done once.
>>
>> With 64-bits shorts, you already know they can't be represented.

>
> Why? If an implementation has 64-bit short ints, maybe it also has 256-bit
> doubles.

I think a 64-bit short/64-bit double implementation would be far more
common. But given 64-bit short/256-bit double, then probably a different
approach is needed, or just more confidence that all 64-bit values are in
fact representable.

--
Bartc

bart.c, Jun 1, 2010
9. ### Keith ThompsonGuest

Richard Heathfield <> writes:
> Eric Sosman wrote:
>> On 6/1/2010 4:35 AM, Richard Heathfield wrote:
>>> In<hmWMn.34334\$8g7.14401@hurricane>, bart.c wrote:
>>>
>>> <snip>
>>>>
>>>> You might just try testing every possible value:
>>>>

[snip]
>>>
>>> If there is genuine cause for concern, unsigned short is sufficiently
>>> large to make your program rather tedious to run. It would make more
>>> sense to binary-search. Start at USHRT_MAX, and play the max/min
>>> game.

>>
>> Not sure how you'd apply binary search to this problem, since
>> there's no obvious "order:" each test tells you that a given value
>> does or does not garble when converted to double and back, and gives
>> no obvious clue about what happens to larger and smaller values.
>> Did you have some other order relation in mind?

>
> Oh, it was just a faulty (albeit probably fairly accurate!) assumption -
> that double will be able to represent integers precisely, up to a
> certain point, but not (reliably) beyond that point. If there is such a
> point for unsigned short ints, it would be nice to know what it is.

But since the non-representable values aren't consecutive, a simple
binary search might not find the smallest one.

It's probably safe to assume that if N-1 and N are both exactly
representable, then all values from 0 to N are exactly representable.
Given that assumption, you could do a binary search to find the
smallest non-representable value.

> (Having said that, I suspect that the range of contiguous integer values
> starting from 0 and exactly representable by double is likely to exceed
> USHRT_MAX on any "normal" implementation, so there is probably no such
> point.)

Well, I've used a system (Cray T90) where both short and double
were 64 bits. But short may have had padding bits; I never had a
chance to check.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jun 1, 2010
10. ### Malcolm McLeanGuest

On Jun 1, 12:12 am, Andy <> wrote:
>
> I will be storing a series of unsigned shorts (C language)
> as doubles.
>

Double is usually capable of representing every possible short, long
or int. However some very small compilers have very low-precision
floating point representation (this is not compliant), and some very
large systems have shorts with 64 bits and doubles with 64 bits (this
is compliant, but normally you would only use the lower two bytes of a
short).

Malcolm McLean, Jun 1, 2010
11. ### Guest

On Jun 1, 10:58 am, Malcolm McLean <>
wrote:
> On Jun 1, 12:12 am, Andy <> wrote:
>
> > I will be storing a series of unsigned shorts (C language)
> > as doubles.

>
> Double is usually capable of representing every possible short, long
> or int. However some very small compilers have very low-precision
> floating point representation (this is not compliant), and some very
> large systems have shorts with 64 bits and doubles with 64 bits (this
> is compliant, but normally you would only use the lower two bytes of a
> short).

Certainly not true for any of the 64 bit *nix platforms I'm aware of,
which have 64 bit doubles and 64 bit longs. And that would be the
case for any other LP64 or ILP64 system (and I'm sure there's a 64-bit
*nix implementation out there somewhat that *isn't* LP64, but I don't
know it). Windows, OTOH, is LLP64 (32 bit longs, 64 bit long longs).

, Jun 1, 2010
12. ### Keith ThompsonGuest

"" <> writes:
> On Jun 1, 10:58Â am, Malcolm McLean <>
> wrote:
>> On Jun 1, 12:12Â am, Andy <> wrote:
>>
>> > I will be storing a series of unsigned shorts (C language)
>> > as doubles. Â

>>
>> Double is usually capable of representing every possible short, long
>> or int. However some very small compilers have very low-precision
>> floating point representation (this is not compliant), and some very
>> large systems have shorts with 64 bits and doubles with 64 bits (this
>> is compliant, but normally you would only use the lower two bytes of a
>> short).

>
> Certainly not true for any of the 64 bit *nix platforms I'm aware of,
> which have 64 bit doubles and 64 bit longs. And that would be the
> case for any other LP64 or ILP64 system (and I'm sure there's a 64-bit
> *nix implementation out there somewhat that *isn't* LP64, but I don't
> know it). Windows, OTOH, is LLP64 (32 bit longs, 64 bit long longs).

As I mentioned elsethread, the Cray T90 is (was?) a Unix system with
64-bit shorts and 64-bit doubles. The more recent SV1 has the same
characteristics, and other Cray vector systems are probably similar.
But short could well have a significant number of padding bits.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jun 1, 2010