Casting integers to float.

J

Jonathan Fielder

Hi,

I have a 32 bit integer value and I wish to find the single precision
floating point value that is closest to but less than or equal to the
integer. I also have a similar case where I need to find the single
precision floating point value that is closest to but greater than or equal
to the integer. I believe that if I simply cast to a float, it may be
assigned the next higher or lower representable value, depending on
implementation.

I am aware that if I use double precision floating point values then I
shouldn't have a problem because 32 bit integers can be represented exactly,
but I really need to use float.

Is there a simple method using standard C to achieve my goal?

Many thanks,

Jon.
 
C

Christian Bau

"Jonathan Fielder said:
Hi,

I have a 32 bit integer value and I wish to find the single precision
floating point value that is closest to but less than or equal to the
integer. I also have a similar case where I need to find the single
precision floating point value that is closest to but greater than or equal
to the integer. I believe that if I simply cast to a float, it may be
assigned the next higher or lower representable value, depending on
implementation.

I am aware that if I use double precision floating point values then I
shouldn't have a problem because 32 bit integers can be represented exactly,
but I really need to use float.

Is there a simple method using standard C to achieve my goal?

Interesting problem. I think this should give the required result on
most or all correct C implementations.

float int32_to_float_rounddown (long i) {

double d = (double) i;
double e = d;
float f;

while ((f = (float) e) > d)
e -= 1.0;

return f;
}

float int32_to_float_roundup (long i) {

double d = (double) i;
double e = d;
float f;

while ((f = (float) e)< d)
e += 1.0;

return f;
}

d and e must be double so that the conversions from 32 bit integer and
from float are exact.
 
K

Kevin Easton

Jonathan Fielder said:
Hi,

I have a 32 bit integer value and I wish to find the single precision
floating point value that is closest to but less than or equal to the
integer. I also have a similar case where I need to find the single
precision floating point value that is closest to but greater than or equal
to the integer. I believe that if I simply cast to a float, it may be
assigned the next higher or lower representable value, depending on
implementation.

Will this work?

float f = myint;

if (f > (double)myint) {
f -= FLT_EPSILON * myint;
}

- Kevin.
 
T

Tim Prince

Jonathan Fielder wrote:

I have a 32 bit integer value and I wish to find the single precision
floating point value that is closest to but less than or equal to the
integer. float f = myint -.25
I also have a similar case where I need to find the single
precision floating point value that is closest to but greater than or
equal
to the integer. float f = myint +.25
I believe that if I simply cast to a float, it may be
assigned the next higher or lower representable value, depending on
implementation.
Only for some of the values satisfying myint > 1/FLT_EPSILON, assuming a
sane implementation, such as any IEEE compliant one.
 
K

Kevin Easton

Tim Prince said:
Jonathan Fielder wrote:


float f = myint -.25

If myint = 1, that gives 0.75 as f. There are many values representable
in float that are closer to 1.0 than 0.75, whilst still being less than
or equal to 1.0.

- Kevin.
 
K

Kevin Easton

Christian Bau said:
Interesting problem. I think this should give the required result on
most or all correct C implementations.

float int32_to_float_rounddown (long i) {

double d = (double) i;
double e = d;
float f;

while ((f = (float) e) > d)
e -= 1.0;

Why do you think that 1.0 is the smallest amount you will have to
subtract from e to make it less than i ?

- Kevin.
 
J

Jirka Klaue

Kevin said:
Why do you think that 1.0 is the smallest amount you will have to
subtract from e to make it less than i ?

How about this?

float f = i;
double d = i;

while (f > (double)i) {
d -= 1;
f = d;
}

while (f + FLT_EPSILON != f && f + FLT_EPSILON < (double)i)
f += FLT_EPSILON;

Jirka
 
C

Christian Bau

Kevin Easton said:
Why do you think that 1.0 is the smallest amount you will have to
subtract from e to make it less than i ?

This makes three assumptions: 1. All integers that fit almost into 32
bit can be stored exactly in a "double" variable. 2. Adding or
subtracting 1 to/from such a variable produces the correct result. 3.
The type float has the following property: There are two numbers fmin
and fmax such that all integers x, fmin <= x <= fmax can be represented
in a variable of type float, and no non-integer value less than fmin or
greater than fmax can be represented.

That would be the case for any simple floating point representation that
I have ever seen, and it wouldn't matter if it is binary, base 10, base
sixteen or whatever. (I know there are implementations of long double
that work differently).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top