Casting integers to float.

Jonathan Fielder · Aug 12, 2003

Hi,

I have a 32 bit integer value and I wish to find the single precision
floating point value that is closest to but less than or equal to the
integer. I also have a similar case where I need to find the single
precision floating point value that is closest to but greater than or equal
to the integer. I believe that if I simply cast to a float, it may be
assigned the next higher or lower representable value, depending on
implementation.

I am aware that if I use double precision floating point values then I
shouldn't have a problem because 32 bit integers can be represented exactly,
but I really need to use float.

Is there a simple method using standard C to achieve my goal?

Many thanks,

Jon.

Christian Bau · Aug 12, 2003

"Jonathan Fielder said:
Hi,

I have a 32 bit integer value and I wish to find the single precision
floating point value that is closest to but less than or equal to the
integer. I also have a similar case where I need to find the single
precision floating point value that is closest to but greater than or equal
to the integer. I believe that if I simply cast to a float, it may be
assigned the next higher or lower representable value, depending on
implementation.

I am aware that if I use double precision floating point values then I
shouldn't have a problem because 32 bit integers can be represented exactly,
but I really need to use float.

Is there a simple method using standard C to achieve my goal?

Interesting problem. I think this should give the required result on
most or all correct C implementations.

float int32_to_float_rounddown (long i) {

double d = (double) i;
double e = d;
float f;

while ((f = (float) e) > d)
e -= 1.0;

return f;
}

float int32_to_float_roundup (long i) {

double d = (double) i;
double e = d;
float f;

while ((f = (float) e)< d)
e += 1.0;

return f;
}

d and e must be double so that the conversions from 32 bit integer and
from float are exact.

Kevin Easton · Aug 13, 2003

Jonathan Fielder said:
Hi,

I have a 32 bit integer value and I wish to find the single precision
floating point value that is closest to but less than or equal to the
integer. I also have a similar case where I need to find the single
precision floating point value that is closest to but greater than or equal
to the integer. I believe that if I simply cast to a float, it may be
assigned the next higher or lower representable value, depending on
implementation.

Will this work?

float f = myint;

if (f > (double)myint) {
f -= FLT_EPSILON * myint;
}

- Kevin.

Tim Prince · Aug 13, 2003

Jonathan Fielder wrote:

I have a 32 bit integer value and I wish to find the single precision
floating point value that is closest to but less than or equal to the
integer. float f = myint -.25
I also have a similar case where I need to find the single
precision floating point value that is closest to but greater than or
equal
to the integer. float f = myint +.25
I believe that if I simply cast to a float, it may be
assigned the next higher or lower representable value, depending on
implementation.

Only for some of the values satisfying myint > 1/FLT_EPSILON, assuming a
sane implementation, such as any IEEE compliant one.

Christian Bau · Aug 13, 2003

Tim Prince said:
Jonathan Fielder wrote:

float f = myint -.25

Wrong result if myint = 1

float f = myint +.25

Wrong result if myint = 1

Kevin Easton · Aug 13, 2003

Tim Prince said:
Jonathan Fielder wrote:

float f = myint -.25

If myint = 1, that gives 0.75 as f. There are many values representable
in float that are closer to 1.0 than 0.75, whilst still being less than
or equal to 1.0.

- Kevin.

Kevin Easton · Aug 13, 2003

Christian Bau said:
Interesting problem. I think this should give the required result on
most or all correct C implementations.

float int32_to_float_rounddown (long i) {

double d = (double) i;
double e = d;
float f;

while ((f = (float) e) > d)
e -= 1.0;

Why do you think that 1.0 is the smallest amount you will have to
subtract from e to make it less than i ?

- Kevin.

Jirka Klaue · Aug 13, 2003

Kevin said:
Why do you think that 1.0 is the smallest amount you will have to
subtract from e to make it less than i ?

How about this?

float f = i;
double d = i;

while (f > (double)i) {
d -= 1;
f = d;
}

while (f + FLT_EPSILON != f && f + FLT_EPSILON < (double)i)
f += FLT_EPSILON;

Jirka

Christian Bau · Aug 13, 2003

Kevin Easton said:
Why do you think that 1.0 is the smallest amount you will have to
subtract from e to make it less than i ?

This makes three assumptions: 1. All integers that fit almost into 32
bit can be stored exactly in a "double" variable. 2. Adding or
subtracting 1 to/from such a variable produces the correct result. 3.
The type float has the following property: There are two numbers fmin
and fmax such that all integers x, fmin <= x <= fmax can be represented
in a variable of type float, and no non-integer value less than fmin or
greater than fmax can be represented.

That would be the case for any simple floating point representation that
I have ever seen, and it wouldn't matter if it is binary, base 10, base
sixteen or whatever. (I know there are implementations of long double
that work differently).

casting constant value from float to unsigned short - compiler bugs?	36	Dec 2, 2011
Question for a REAL expert on casting double to float...	23	Jul 12, 2008
casting	23	Jun 9, 2006
was: "mod operator for signed integers"	5	Jun 25, 2011
hashing strings to integers for sqlite3 keys	22	May 22, 2014
hexadecimal to float conversion	4	Aug 19, 2005
float point arithmetic a-a != 0.0	12	Mar 8, 2010
Casting double to float - compiler bug?	4	Aug 6, 2003

Casting integers to float.

Jonathan Fielder

Christian Bau

Kevin Easton

Tim Prince

Christian Bau

Kevin Easton

Kevin Easton

Jirka Klaue

Christian Bau

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads