Integer to Floating-Point conversions

Discussion in 'Java' started by Kai Koehne, May 13, 2006.

  1. Kai Koehne

    Kai Koehne Guest

    Hi,

    I need an int to real32 conversion with IEEE-754 round-towards-zero mode
    .... That means, a method x, so that

    int i = 44554435
    float f = x(i)

    results in f = 3355432.0, and not f = 335546.0 .

    Are you aware of any mathematical packages that provide such a function,
    or a way to do this with pure Java?

    Regards,

    Kai Koehne
    Kai Koehne, May 13, 2006
    #1
    1. Advertising

  2. Kai Koehne

    Guest

    Kai Koehne wrote:
    > Hi,
    >
    > I need an int to real32 conversion with IEEE-754 round-towards-zero mode
    > ... That means, a method x, so that
    >
    > int i = 44554435
    > float f = x(i)
    >
    > results in f = 3355432.0, and not f = 335546.0 .
    >
    > Are you aware of any mathematical packages that provide such a function,
    > or a way to do this with pure Java?
    >
    > Regards,
    >
    > Kai Koehne


    Huh?

    I was going to mumble something automatic integer to floating point
    promotion should work as advertised, but your question does not seem to
    make sense.

    If you absolutely must work with 32 bit floats instead of 64 bit
    doubles or java.math.BigDecimal, you will lose precision when promoting
    large integers. There's no way to squeeze 32 bits of precision into
    less than 32 bits of precision, expecially when a 32 bit float needs a
    few bits for the exponent.

    But if you really need a method that converts 44554435 stored in an int
    to either 3355432.0 or 335546.0 in 32 bit floating point
    representation, you can always write your own. I'd prefer you give it a
    reasonable name, like "mungeNumber()" or "hashAFloat()" or
    my.own.strange.Promotions.conversion_x() so that if it ever gets close
    to any of my code my code doesn't get infected. But you can definitely
    build it. I'd point you to the xxxToYyyBits() and yyyBitsToXxx()
    methods of java.lang.Double and java.lang.Float, but I wouldn't want to
    be accused of contributing to the crime.
    , May 13, 2006
    #2
    1. Advertising

  3. Kai Koehne

    Kai Koehne Guest

    schrieb:
    > Kai Koehne wrote:
    >> Hi,
    >>
    >> I need an int to real32 conversion with IEEE-754 round-towards-zero mode
    >> [...]
    >> Kai Koehne

    >
    > Huh?
    >
    > I was going to mumble something automatic integer to floating point
    > promotion should work as advertised, but your question does not seem to
    > make sense.


    Just to clarify my motivation: I am building an Occam 2 Java compiler
    .... and as it happens, occam supports two ways of converting an integer
    to a floating-point variable:

    REAL32 ROUND n -- conversion with round-towards-middle
    REAL32 TRUNC n -- conversion with round-towards-zero

    The first has the same semantics as (float)n , but I couldn't find a way
    to implement the second conversion.

    > [...]
    > But if you really need a method that converts 44554435 stored in an int
    > to either 3355432.0 or 335546.0 in 32 bit floating point
    > representation, you can always write your own. I'd prefer you give it a
    > reasonable name, like "mungeNumber()" or "hashAFloat()" or
    > my.own.strange.Promotions.conversion_x() so that if it ever gets close
    > to any of my code my code doesn't get infected. But you can definitely
    > build it. I'd point you to the xxxToYyyBits() and yyyBitsToXxx()
    > methods of java.lang.Double and java.lang.Float, but I wouldn't want to
    > be accused of contributing to the crime.
    >


    Huh, this is the tough way ... but I will try to find my way through the
    implementation details, thank you for the idea :)

    Kind regards,

    Kai Koehne
    Kai Koehne, May 13, 2006
    #3
  4. Kai Koehne

    Chris Uppal Guest

    Kai Koehne wrote:

    > I need an int to real32 conversion with IEEE-754 round-towards-zero mode


    Java doesn't include a spectacularly complete set of IEEE-754 operations,
    that's one of the ones that's missing. I believe that more stuff will be added
    in 1.6 (see http://download.java.net/jdk6/docs/api/java/lang/Math.html
    ) but I don't know if that will provide what you need.

    If not, or if you can't wait, then I think you'll just have to program it
    yourself. Math.ulp(float) may help.

    -- chris
    Chris Uppal, May 13, 2006
    #4
  5. Kai Koehne

    Guest

    Kai Koehne wrote:
    > schrieb:
    > > Kai Koehne wrote:
    > >> Hi,
    > >>
    > >> I need an int to real32 conversion with IEEE-754 round-towards-zero mode
    > >> [...]
    > >> Kai Koehne

    > >
    > > Huh?
    > >
    > > I was going to mumble something automatic integer to floating point
    > > promotion should work as advertised, but your question does not seem to
    > > make sense.

    >
    > Just to clarify my motivation: I am building an Occam 2 Java compiler
    > ... and as it happens, occam supports two ways of converting an integer
    > to a floating-point variable:
    >
    > REAL32 ROUND n -- conversion with round-towards-middle
    > REAL32 TRUNC n -- conversion with round-towards-zero
    >
    > The first has the same semantics as (float)n , but I couldn't find a way
    > to implement the second conversion.


    Ahh. I see.

    > > [...]
    > > But if you really need a method that converts 44554435 stored in an int
    > > to either 3355432.0 or 335546.0 in 32 bit floating point
    > > representation, you can always write your own. I'd prefer you give it a
    > > reasonable name, like "mungeNumber()" or "hashAFloat()" or
    > > my.own.strange.Promotions.conversion_x() so that if it ever gets close
    > > to any of my code my code doesn't get infected. But you can definitely
    > > build it. I'd point you to the xxxToYyyBits() and yyyBitsToXxx()
    > > methods of java.lang.Double and java.lang.Float, but I wouldn't want to
    > > be accused of contributing to the crime.
    > >

    >
    > Huh, this is the tough way ... but I will try to find my way through the
    > implementation details, thank you for the idea :)


    If what Chris suggests doesn't get there fast enough for you, you might
    want to consider implementing it borrowing from java.math.BigDecimal
    and BigInteger. If it turns out too slow, it could at least give you a
    good baseline.

    > Kind regards,
    >
    > Kai Koehne
    , May 13, 2006
    #5
  6. Kai Koehne wrote:
    ....
    > Huh, this is the tough way ... but I will try to find my way through the
    > implementation details, thank you for the idea :)

    ....

    Here's my attempt. The test cases shown in the main method
    are ALL the testing I've done.

    public class FloatConvert {

    public static void main(String[] args) {
    // Original example
    test(44554435, 44554432);
    // Negation of original example
    test(-44554435, -44554432);
    // No rounding needed
    test(44554432, 44554432);
    // Rounding goes in correct direction
    test(44554433, 44554432);
    }

    private static void test(int input, int expected) {
    double actual = truncateIntToFloat(input);
    if (actual == expected) {
    System.out.printf("Success: input %d actual %f\n", input, actual);
    } else {
    System.out.printf("Failure: input %d actual %f expected %d\n", input,
    actual, expected);
    }
    }

    /**
    * Convert int to float with truncation towards zero.
    *
    * @param input
    * Number to convert
    * @return Largest magnitude float that has the same sign as input
    * and no greater absolute value.
    */
    public static float truncateIntToFloat(int input) {
    float rounded = (float) input;
    if (Math.abs((double) rounded) <= Math.abs((double) input)) {
    return rounded;
    } else {
    int roundedBits = Float.floatToIntBits(rounded);
    float truncated = Float.intBitsToFloat(roundedBits - 1);
    return truncated;
    }
    }
    }
    Patricia Shanahan, May 13, 2006
    #6
  7. Kai Koehne

    Chris Uppal Guest

    Patricia Shanahan wrote:

    > Here's my attempt. The test cases shown in the main method
    > are ALL the testing I've done.


    It works for the other 4294967292 inputs too. I've tried 'em all ;-)


    > int roundedBits = Float.floatToIntBits(rounded);
    > float truncated = Float.intBitsToFloat(roundedBits - 1);
    > return truncated;


    That's clever. At least, I /hope/ it's clever because I don't understand why
    it works. Do you care to expand a bit ? Please ?

    I got interested in this myself and have tried out a number of approaches,
    including an attempt to convert rounding into truncation by consideration of
    ULPs (as I suggested to the OP earlier). The best, IMO, is to do the
    truncation in the integer domain before converting into a float. Its a little
    complicated to do, but it runs fast, and has the aesthetic advantage of not
    requiring double-precision operations to produce a single-precision result
    (which all the other things I tried do).

    I've appended the code, including the test-harness, for anyone who's
    interested.


    -- chris

    =========== Test.java ===============
    /*
    * contains several static implementations of truncating a 32-bit integer
    * to a 32-bit float with rounding "towards zero" rather that to the nearest
    * float (as in the Java language spec for casting an int to a float).
    * Also contains an exhaustive test which checks all the implementations
    * against the simplest one (presumed to be correct).
    *
    * Approximate times when run on a 1.5.0_06-b05 JVM, in "client" mode on
    * a 1.5 GHz celeron WinXP Pro box. In each case the time is the time in
    * nanoseconds to convert one value averaged over the full 32-bit range of
    * integers.
    *
    * Slow: 130
    * ULP: 63.5
    * FastULP: 65
    * Twiddle: 19
    * Patricia: 74.5
    *
    * Approximate times for inputs in the resticted range [-10,000,000,
    10,000,000),
    * 83% of which is exactly representable as a float.
    *
    * Slow: 12
    * ULP: 12
    * FastULP: 20.5
    * Twiddle: 18.5
    * Patricia: 24
    */
    public class Test
    {
    public static void
    main(String[] args)
    {
    int i = 0;
    int errors = 0;
    for (;;)
    {
    if (i % 10000000 == 0)
    {
    // progress...
    System.out.printf("%11d...\r", i);
    System.out.flush();
    }
    float shouldBe = slowTruncate(i);
    float is;

    if ((is = ulpTruncate(i)) != shouldBe)
    {
    System.out.printf("Slow: %d: %f != %f%n", i, is, shouldBe);
    errors++;
    }
    if ((is = fastTruncate(i)) != shouldBe)
    {
    System.out.printf("Fast: %d: %f != %f%n", i, is, shouldBe);
    errors++;
    }
    if ((is = twiddleTruncate(i)) != shouldBe)
    {
    System.out.printf("Twiddle: %d: %f != %f%n", i, is, shouldBe);
    errors++;
    }
    if ((is = patriciaTruncate(i)) != shouldBe)
    {
    System.out.printf("Patricia: %d: %f != %f%n", i, is, shouldBe);
    errors++;
    }

    if (errors > 10)
    break;
    if (i == -1)
    break; // we've wrapped all the way around
    i++;
    }
    System.out.printf("%ndone, %d errors%n", errors);
    }


    /**
    * Slow implementation of truncating an int to a 32-bit floating point.
    * This implementation is intended to be the "so simple it can't possibly
    * be wrong" benchmark against which other implementations can be compared
    */
    public static float
    slowTruncate(int i)
    {
    // since all +ve numbers have negations, but the reverse
    // isn't true, we work with -ve numbers
    if (i > 0)
    return -slowTruncate(-i);

    // loop up (towards zero) until we find the right answer
    for (;;)
    {
    float f = (float)i;
    if ((double)f >= (double)i)
    return f;
    i++;
    }
    }


    /**
    * Implementation of truncating an int to a 32-bit floating point based on
    * ULP manipulation
    */
    public static float
    ulpTruncate(int i)
    {
    float f = (float)i;
    double error = (double)f - (double)i;

    // no error ?
    if (error == 0)
    return f;

    // or rounded towards 0 anyway ?
    if ((error < 0) == (i > 0))
    return f;

    // adjust by one ulp towards 0
    // the double ulp() is because ulp is the distance to the next
    // number away from 0, we need the distance to the next float
    // nearest zero
    if (i > 0)
    {
    float ulp = Math.ulp(f-Math.ulp(f));
    f -= ulp;
    }
    else
    {
    float ulp = Math.ulp(f+Math.ulp(f));
    f += ulp;
    }

    return f;
    }


    /**
    * Faster implementation of truncating an int to a 32-bit floating point.
    * This implementation uses a simple test to catch a useful range of
    * "eaasy" cases. If the input is likely to be in that range then
    * this provides a useful optimisation, otherwise it's marginally
    * slower.
    * NB1: this version uses ulpTruncate() for the difficult cases, but
    * it could just as easily use a different one -- even slowTruncate().
    * NB2: since this consists only of a simple test guarding simple
    operations,
    * it seems reasonable to hope that the JIT will inline it in callers
    */
    public static float
    fastTruncate(int i)
    {
    // these are all exact
    if (i >= -8388608 && i <= 8388608)
    return (float)i;
    else
    return ulpTruncate(i); // or any other xxxTruncate()
    }


    /**
    * Faster implementation of truncating an int to a 32-bit floating point
    based
    * on bit-twiddling.
    * The idea is to take the number, discover how many high-bits it has which
    * don't fit into the 24-bit mantissa of a float, and to mask off that many
    * low bits before converting to a float. I.e. we do the truncation in the
    * integer domain before converting to a float, so the conversion is always
    * is always exact.
    * The actual implementation is complicated by sign issues, but that's the
    * basic idea.
    * Note that we don't actually mess with the physical layout of an IEEE
    float,
    * but only use the fact that it has exactly 24 bits in which to represent
    the
    * mantissa
    */
    public static float
    twiddleTruncate(int i)
    {
    if (i >= 0)
    {
    // because i is +ve the high bit is always zero,
    // and so (i >>> 24) is always in the range [0, 128)
    return (float)(i & MASKS[i >>> 24]);
    }

    // nasty special case
    if (i == Integer.MIN_VALUE)
    return -2147483648.0f;

    return -twiddleTruncate(-i);
    }

    // helper data for twiddleTruncate(). A set of precomuted masks which
    // will be indexed by bits [24, 30] (zero-based) of the input integer.
    private static final int[] MASKS = new int[128];
    static
    {
    for (int i = 0; i < 128; i++)
    MASKS = -1;
    for (int bit = 0; bit < 7; bit++)
    {
    int threshold = 1 << bit;
    int mask = 1 << bit;
    for (int i = threshold; i < 128; i++)
    MASKS ^= mask;
    }
    }


    /**
    * Clever implementation of truncating an int to a 32-bit floating point
    posted
    * to comp.lang.java.programmer by Patricia Shanahan on 13 May 2006
    * (Message-ID: <TWm9g.126$> )
    */
    public static float
    patriciaTruncate(int i)
    {
    float rounded = (float) i;

    if (Math.abs((double) rounded) <= Math.abs((double) i))
    return rounded;

    int roundedBits = Float.floatToIntBits(rounded);
    float truncated = Float.intBitsToFloat(roundedBits - 1);

    return truncated;
    }
    }
    ==========================
    Chris Uppal, May 14, 2006
    #7
  8. Chris Uppal wrote:
    > Patricia Shanahan wrote:
    >
    >
    >> Here's my attempt. The test cases shown in the main method are ALL
    >> the testing I've done.

    >
    >
    > It works for the other 4294967292 inputs too. I've tried 'em all ;-)
    >
    >
    >
    >
    >> int roundedBits = Float.floatToIntBits(rounded); float truncated =
    >> Float.intBitsToFloat(roundedBits - 1); return truncated;

    >
    >
    > That's clever. At least, I /hope/ it's clever because I don't
    > understand why it works. Do you care to expand a bit ? Please ?


    Sure. First I split the problem into two cases. If the result did not
    require rounding, or the rounding was towards zero, the rounded result
    and truncated result are equal, so I just returned the rounded result.

    The code you quote is for those cases that round away from zero. The
    float we want is the one with the same sign as rounded, and the largest
    magnitude that is less than rounded.

    NaNs, infinities, and zeros would all be special cases in a general
    attempt to make the smallest possible reduction in the magnitude of a
    float, but can be ignored here. Zero can only result from converting int
    zero, and getting an exact result. The others cannot result from int
    conversion.

    If rounded is not an exact power of two, the truncation has a mantissa
    one less than rounded. If it is an exact power of two, it has an
    exponent one less than rounded, and mantissa all ones. Either way,
    subtracting one from the rounded bit pattern gets the bit pattern for
    truncated.

    > I got interested in this myself and have tried out a number of
    > approaches, including an attempt to convert rounding into truncation
    > by consideration of ULPs (as I suggested to the OP earlier). The
    > best, IMO, is to do the truncation in the integer domain before
    > converting into a float. Its a little complicated to do, but it runs
    > fast, and has the aesthetic advantage of not requiring
    > double-precision operations to produce a single-precision result
    > (which all the other things I tried do).


    If you prefer to avoid double, just do the abs in int:

    public static float truncateIntToFloat(int input) {
    float rounded = (float) input;
    if (Math.abs((int) rounded) <= Math.abs(input)) {
    return rounded;
    } else {
    int roundedBits = Float.floatToIntBits(rounded);
    float truncated = Float.intBitsToFloat(roundedBits - 1);
    return truncated;
    }
    }

    Both numbers whose absolute values are to be compared can be exactly
    represented in either int or double, so either type should work.

    Patricia
    Patricia Shanahan, May 14, 2006
    #8
  9. Kai Koehne

    Kai Koehne Guest

    Wow! Thank you all for this thorough discussion ... Two days ago I was
    not even sure if truncating conversion can be done at all, and now I
    have five working solutions + performance tests. I have the strong
    feeling to owe you a beverage of your choice :)

    Thank you!

    Kai Koehne
    Kai Koehne, May 14, 2006
    #9
  10. Kai Koehne

    Chris Uppal Guest

    Patricia Shanahan wrote:

    > If rounded is not an exact power of two, the truncation has a mantissa
    > one less than rounded. If it is an exact power of two, it has an
    > exponent one less than rounded, and mantissa all ones. Either way,
    > subtracting one from the rounded bit pattern gets the bit pattern for
    > truncated.


    Ah, I see. The IEEE fp layout is subtle.

    Thank you.


    > If you prefer to avoid double, just do the abs in int:
    >
    > public static float truncateIntToFloat(int input) {
    > float rounded = (float) input;
    > if (Math.abs((int) rounded) <= Math.abs(input)) {
    > return rounded;
    > } else {
    > int roundedBits = Float.floatToIntBits(rounded);
    > float truncated = Float.intBitsToFloat(roundedBits - 1);
    > return truncated;
    > }
    > }
    >


    Unfotunately, the comparison doesn't work for Integer.MAX_VALUE, nor for values
    near Integer.MIN_VALUE. Two example failure cases are:

    2147483647
    -2147483638


    Some test code (which uses a routine from yesterday's post):
    =============
    public class Test2
    {
    public static void
    main(String[] args)
    {
    test(2147483647);
    test(-2147483638);
    }

    static void
    test(int input)
    {
    System.out.printf("input: %d%n", input);
    System.out.printf("desired output: %f%n", Test.slowTruncate(input));

    float rounded = (float)input;
    System.out.printf("rounded: %f%n", rounded);

    System.out.printf("(int)rounded: %d%n", (int)rounded);
    System.out.printf("Math.abs((int)rounded): %d%n", Math.abs((int)rounded));
    System.out.printf("Math.abs(input): %d%n", Math.abs(input));
    System.out.println();
    }
    }
    =============

    Which produces output:

    input: 2147483647
    desired output: 2147483520.000000
    rounded: 2147483648.000000
    (int)rounded: 2147483647
    Math.abs((int)rounded): 2147483647
    Math.abs(input): 2147483647

    input: -2147483638
    desired output: -2147483520.000000
    rounded: -2147483648.000000
    (int)rounded: -2147483648
    Math.abs((int)rounded): -2147483648
    Math.abs(input): 2147483638

    I find the negative output from Math.abs() particularly entertaining ;-)

    -- chris
    Chris Uppal, May 15, 2006
    #10
  11. Chris Uppal wrote:
    > Patricia Shanahan wrote:
    >
    >
    >>If rounded is not an exact power of two, the truncation has a mantissa
    >>one less than rounded. If it is an exact power of two, it has an
    >>exponent one less than rounded, and mantissa all ones. Either way,
    >>subtracting one from the rounded bit pattern gets the bit pattern for
    >>truncated.

    >
    >
    > Ah, I see. The IEEE fp layout is subtle.
    >
    > Thank you.
    >
    >
    >
    >>If you prefer to avoid double, just do the abs in int:

    ....
    > Unfotunately, the comparison doesn't work for Integer.MAX_VALUE, nor for values
    > near Integer.MIN_VALUE. Two example failure cases are:

    ....
    > I find the negative output from Math.abs() particularly entertaining ;-)


    Good point. Shows the folly of not testing at least the maximum and
    minimum value, as well as some ordinary values.

    Patricia
    Patricia Shanahan, May 15, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Fouhy

    Floating point -> string conversions

    John Fouhy, Nov 8, 2004, in forum: Python
    Replies:
    5
    Views:
    333
    John Fouhy
    Nov 19, 2004
  2. Replies:
    10
    Views:
    706
    Jasen Betts
    Aug 5, 2005
  3. Stanley Rice
    Replies:
    4
    Views:
    377
    Stanley Rice
    Oct 8, 2011
  4. Saraswati lakki
    Replies:
    0
    Views:
    1,322
    Saraswati lakki
    Jan 6, 2012
  5. teeshift
    Replies:
    2
    Views:
    255
    Chris Pearl
    Dec 1, 2006
Loading...

Share This Page