Portable IEEE754 write routine

Discussion in 'C Programming' started by Malcolm McLean, Feb 17, 2010.

  1. Here it is. Many thanks to all who helped, especially Michael on whose
    contribution the code is mainly based.

    The idea is to write a float in binary IEEE format, reagrdless of the
    native representation in the machine you are on.

    I've only access to a Windows PC at the moment, so I'd be grateful if
    people could test it on other architectures, especially non-IEEE
    native floating point units. Particularly I'm not sure about the
    denormalised numbers - my printf seems to think they are identical to
    zero so I'm not sure if I'm dropping precision.

    int fwriteieee754(double x, FILE *fp, int bigendian)
    {
    int shift;
    unsigned long sign, exp, hibits, hilong, lowlong;
    double fnorm, significand;
    int expbits = 11;
    int significandbits = 52;

    /* zero (can't handle signed zero) */
    if(x == 0)
    {
    hilong = 0;
    lowlong = 0;
    goto writedata;
    }
    /* infinity */
    if(x > DBL_MAX)
    {
    hilong = 1024 + ((1<<(expbits-1)) - 1);
    hilong <<= (31 - expbits);
    lowlong = 0;
    goto writedata;
    }
    /* -infinity */
    if(x < -DBL_MAX)
    {
    hilong = 1024 + ((1<<(expbits-1)) - 1);
    hilong <<= (31-expbits);
    hilong |= (1 << 31);
    lowlong = 0;
    goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test, but
    *there is no portable isnan() */
    if(x != x)
    {
    hilong = 1024 + ((1<<(expbits-1)) - 1);
    hilong <<= (31 - expbits);
    lowlong = 1234;
    goto writedata;
    }

    /* get the sign */
    if(x < 0) {sign = 1; fnorm = -x;}
    else {sign = 0; fnorm = x;}

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    while(fnorm < 1.0) { fnorm *= 2.0; shift--; }

    /* check for denormalized numbers */
    if(shift < -1022)
    {
    while(shift <-1022) {fnorm /= 2.0; shift++;}
    shift = -1023;
    }
    else
    fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

    /* calculate the integer form of the significand */
    /* hold it in a double for now */

    significand = fnorm * ((1LL<<significandbits) + 0.5f);


    /* get the biased exponent */
    exp = shift + ((1<<(expbits-1)) - 1); /* shift + bias */

    /* put the data into tweo longs (for convenience) */
    hibits = (long) (significand / 4294967296);
    hilong = (sign << 31) | (exp << (31-expbits) ) | hibits;
    lowlong = (long) (significand - hibits * 4294967296);

    writedata:
    /* write the bytes out to the stream */
    if(bigendian)
    {
    fputc( (hilong >> 24) & 0xFF, fp);
    fputc( (hilong >> 16) & 0xFF, fp);
    fputc( (hilong >> 8) & 0xFF, fp);
    fputc( hilong & 0xFF, fp);

    fputc( (lowlong >> 24) & 0xFF, fp);
    fputc( (lowlong >> 16) & 0xFF, fp);
    fputc( (lowlong >> 8) & 0xFF, fp);
    fputc( lowlong & 0xFF, fp);
    }
    else
    {
    fputc( lowlong & 0xFF, fp);
    fputc( (lowlong >> 8) & 0xFF, fp);
    fputc( (lowlong >> 16) & 0xFF, fp);
    fputc( (lowlong >> 24) & 0xFF, fp);

    fputc( hilong & 0xFF, fp);
    fputc( (hilong >> 8) & 0xFF, fp);
    fputc( (hilong >> 16) & 0xFF, fp);
    fputc( (hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
    }
    Malcolm McLean, Feb 17, 2010
    #1
    1. Advertising

  2. On Wed, 17 Feb 2010 09:04:16 -0800 (PST), Malcolm McLean
    <> wrote:

    >
    >Here it is. Many thanks to all who helped, especially Michael on whose
    >contribution the code is mainly based.
    >
    >The idea is to write a float in binary IEEE format, reagrdless of the
    >native representation in the machine you are on.
    >
    >I've only access to a Windows PC at the moment, so I'd be grateful if
    >people could test it on other architectures, especially non-IEEE
    >native floating point units. Particularly I'm not sure about the
    >denormalised numbers - my printf seems to think they are identical to
    >zero so I'm not sure if I'm dropping precision.
    >
    >int fwriteieee754(double x, FILE *fp, int bigendian)
    >{
    > int shift;
    > unsigned long sign, exp, hibits, hilong, lowlong;
    > double fnorm, significand;
    > int expbits = 11;
    > int significandbits = 52;
    >
    > /* zero (can't handle signed zero) */
    > if(x == 0)
    > {
    > hilong = 0;
    > lowlong = 0;
    > goto writedata;
    > }
    > /* infinity */
    > if(x > DBL_MAX)
    > {
    > hilong = 1024 + ((1<<(expbits-1)) - 1);
    > hilong <<= (31 - expbits);
    > lowlong = 0;
    > goto writedata;
    > }
    > /* -infinity */
    > if(x < -DBL_MAX)
    > {
    > hilong = 1024 + ((1<<(expbits-1)) - 1);
    > hilong <<= (31-expbits);
    > hilong |= (1 << 31);
    > lowlong = 0;
    > goto writedata;
    > }
    > /* NaN - dodgy because many compilers optimise out this test, but
    > *there is no portable isnan() */
    > if(x != x)
    > {
    > hilong = 1024 + ((1<<(expbits-1)) - 1);
    > hilong <<= (31 - expbits);
    > lowlong = 1234;
    > goto writedata;
    > }
    >
    > /* get the sign */
    > if(x < 0) {sign = 1; fnorm = -x;}
    > else {sign = 0; fnorm = x;}
    >
    > /* get the normalized form of f and track the exponent */
    > shift = 0;
    > while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    > while(fnorm < 1.0) { fnorm *= 2.0; shift--; }
    >
    > /* check for denormalized numbers */
    > if(shift < -1022)
    > {
    > while(shift <-1022) {fnorm /= 2.0; shift++;}
    > shift = -1023;
    > }
    > else
    > fnorm = fnorm - 1.0; /* take the significant bit off mantissa */
    >
    > /* calculate the integer form of the significand */
    > /* hold it in a double for now */
    >
    > significand = fnorm * ((1LL<<significandbits) + 0.5f);


    Why did you make the 0.5 a float instead of letting it default to
    double. Now the integer expression must be converted to float and
    then the sum to double to participate in the multiplication. Without
    the f suffix, the integer expression would be converted to double
    immediately and no further conversion would be needed.

    >
    >
    > /* get the biased exponent */
    > exp = shift + ((1<<(expbits-1)) - 1); /* shift + bias */
    >
    > /* put the data into tweo longs (for convenience) */
    > hibits = (long) (significand / 4294967296);
    > hilong = (sign << 31) | (exp << (31-expbits) ) | hibits;
    > lowlong = (long) (significand - hibits * 4294967296);
    >
    >writedata:
    > /* write the bytes out to the stream */
    > if(bigendian)
    > {
    > fputc( (hilong >> 24) & 0xFF, fp);
    > fputc( (hilong >> 16) & 0xFF, fp);
    > fputc( (hilong >> 8) & 0xFF, fp);
    > fputc( hilong & 0xFF, fp);
    >
    > fputc( (lowlong >> 24) & 0xFF, fp);
    > fputc( (lowlong >> 16) & 0xFF, fp);
    > fputc( (lowlong >> 8) & 0xFF, fp);
    > fputc( lowlong & 0xFF, fp);
    > }
    > else
    > {
    > fputc( lowlong & 0xFF, fp);
    > fputc( (lowlong >> 8) & 0xFF, fp);
    > fputc( (lowlong >> 16) & 0xFF, fp);
    > fputc( (lowlong >> 24) & 0xFF, fp);
    >
    > fputc( hilong & 0xFF, fp);
    > fputc( (hilong >> 8) & 0xFF, fp);
    > fputc( (hilong >> 16) & 0xFF, fp);
    > fputc( (hilong >> 24) & 0xFF, fp);
    > }
    > return ferror(fp);
    >}


    --
    Remove del for email
    Barry Schwarz, Feb 17, 2010
    #2
    1. Advertising

  3. Malcolm McLean

    Alan Curry Guest

    |
    |The idea is to write a float in binary IEEE format, reagrdless of the
    |native representation in the machine you are on.
    |
    |I've only access to a Windows PC at the moment, so I'd be grateful if
    |people could test it on other architectures, especially non-IEEE

    I don't have anything that exotic, but on boring old big endian PowerPC
    there seems to be some trouble. I ran this test:

    #include <stdio.h>
    #include <float.h>

    |
    |int fwriteieee754(double x, FILE *fp, int bigendian)
    |{
    [...your code unchanged...]
    |}

    int main(void)
    {
    FILE *f;
    double x;
    unsigned char *p;

    x = 2.00000095367431640625; /* 0x1.000008p+1 */

    printf("Before:\n");
    printf(" %a\n", x);
    printf(" %.60f\n", x);
    for(p=(unsigned char *)&x;p<(unsigned char *)(&x+1);++p)
    printf(" %02X", *p);
    puts("");

    f=fopen("fnord", "w+");
    if(!f)
    return 1;
    fwriteieee754(x, f, 1);
    rewind(f);
    if(!fread(&x, sizeof x, 1, f))
    return 1;
    remove("fnord");

    printf("After:\n");
    printf(" %a\n", x);
    printf(" %.60f\n", x);
    for(p=(unsigned char *)&x;p<(unsigned char *)(&x+1);++p)
    printf(" %02X", *p);
    puts("");

    return 0;
    }

    And the output was:

    Before:
    0x1.000008p+1
    2.000000953674316406250000000000000000000000000000000000000000
    40 00 00 00 80 00 00 00
    After:
    0x1.000007fffffffp+1
    2.000000953674315962160790149937383830547332763671875000000000
    40 00 00 00 7F FF FF FF

    I was thrown off by the "bigendian" parameter. At first I thought it
    meant "native byte order is bigendian, so swaps must be performed on the
    output to make it acceptable to the target little endian machine", but
    it turned out to mean the opposite(?) Be clear on that when you write
    the documentation.

    --
    Alan Curry
    Alan Curry, Feb 17, 2010
    #3
  4. On Feb 18, 1:00 am, (Alan Curry) wrote:
    > |
    >
    > And the output was:
    >
    > Before:
    >  0x1.000008p+1
    >  2.000000953674316406250000000000000000000000000000000000000000
    >  40 00 00 00 80 00 00 00
    > After:
    >  0x1.000007fffffffp+1
    >  2.000000953674315962160790149937383830547332763671875000000000
    >  40 00 00 00 7F FF FF FF
    >

    Thanks for running the test.
    It's losing a bit. If native format is not IEEE754 (but something very
    close) the loss of a bit or two is inevitable. However if the mantissa
    is exactly the same size, it would be nice to preserve the exact
    pattern.
    Malcolm McLean, Feb 18, 2010
    #4
  5. Malcolm McLean

    Alan Curry Guest

    In article <>,
    Malcolm McLean <> wrote:
    |
    |Thanks for running the test.
    |It's losing a bit. If native format is not IEEE754 (but something very
    |close) the loss of a bit or two is inevitable. However if the mantissa
    |is exactly the same size, it would be nice to preserve the exact
    |pattern.

    PowerPC not IEEE754? News to me...

    It's not just that bit though. It seems to be losing the entire lower 32
    bits, replacing them with 7fffffff, whenever the high bit of those 32 bits is
    1 in the input. Otherwise, the output exactly matches the input.

    Here's a more complete list of values

    Before After
    0x1.00000feedfacep+1 0x1.000007fffffffp+1
    0x1.0000f0eedfacep+1 0x1.0000f0eedfacep+1
    0x1.0000000000000p+1 0x1.0000000000000p+1
    0x1.0000000000001p+1 0x1.0000000000001p+1
    0x1.0000000000010p+1 0x1.0000000000010p+1
    0x1.0000000000100p+1 0x1.0000000000100p+1
    0x1.0000000001000p+1 0x1.0000000001000p+1
    0x1.0000000010000p+1 0x1.0000000010000p+1
    0x1.0000000100000p+1 0x1.0000000100000p+1
    0x1.0000001000000p+1 0x1.0000001000000p+1
    0x1.0000002000000p+1 0x1.0000002000000p+1
    0x1.0000004000000p+1 0x1.0000004000000p+1
    0x1.0000008000000p+1 0x1.0000008000000p+1
    0x1.0000010000000p+1 0x1.0000010000000p+1
    0x1.000007fffffffp+1 0x1.000007fffffffp+1
    0x1.0000080000000p+1 0x1.000007fffffffp+1
    0x1.0000080000001p+1 0x1.000007fffffffp+1
    0x1.0000080000010p+1 0x1.000007fffffffp+1
    0x1.0000080000100p+1 0x1.000007fffffffp+1
    0x1.0000080001000p+1 0x1.000007fffffffp+1
    0x1.0000080010000p+1 0x1.000007fffffffp+1
    0x1.0000080100000p+1 0x1.000007fffffffp+1
    0x1.0000081000000p+1 0x1.000007fffffffp+1
    0x1.0000082000000p+1 0x1.000007fffffffp+1
    0x1.0000084000000p+1 0x1.000007fffffffp+1
    0x1.0000088000000p+1 0x1.000007fffffffp+1
    0x1.0000090000000p+1 0x1.000007fffffffp+1
    0x1.00000ffffffffp+1 0x1.000007fffffffp+1

    int main(void)
    {
    FILE *f;
    double x;
    double *p;
    double testdoubles[] =
    {
    0x1.00000feedfacep+1, 0x1.0000f0eedfacep+1, 0x1.0000000000000p+1,
    0x1.0000000000001p+1, 0x1.0000000000010p+1, 0x1.0000000000100p+1,
    0x1.0000000001000p+1, 0x1.0000000010000p+1, 0x1.0000000100000p+1,
    0x1.0000001000000p+1, 0x1.0000002000000p+1, 0x1.0000004000000p+1,
    0x1.0000008000000p+1, 0x1.0000010000000p+1, 0x1.000007fffffffp+1,
    0x1.0000080000000p+1, 0x1.0000080000001p+1, 0x1.0000080000010p+1,
    0x1.0000080000100p+1, 0x1.0000080001000p+1, 0x1.0000080010000p+1,
    0x1.0000080100000p+1, 0x1.0000081000000p+1, 0x1.0000082000000p+1,
    0x1.0000084000000p+1, 0x1.0000088000000p+1, 0x1.0000090000000p+1,
    0x1.00000ffffffffp+1, 0
    };

    printf("%-30s%-30s\n", "Before", "After");
    for(p=testdoubles;*p;++p) {
    x = *p;

    printf("%-30.13a", x);

    f=fopen("fnord", "w+");
    if(!f)
    return 1;
    fwriteieee754(x, f, 1);
    rewind(f);
    if(!fread(&x, sizeof x, 1, f))
    return 1;
    remove("fnord");

    printf("%-30.13a\n", x);
    }

    return 0;
    }
    --
    Alan Curry
    Alan Curry, Feb 18, 2010
    #5
  6. On Feb 18, 2:50 pm, (Alan Curry) wrote:
    > In article <..com>,
    > Malcolm McLean  <> wrote:
    > |
    > |Thanks for running the test.
    > |It's losing a bit. If native format is not IEEE754 (but something very
    > |close) the loss of a bit or two is inevitable. However if the mantissa
    > |is exactly the same size, it would be nice to preserve the exact
    > |pattern.
    >
    > PowerPC not IEEE754? News to me...
    >
    > It's not just that bit though. It seems to be losing the entire lower 32
    > bits, replacing them with 7fffffff, whenever the high bit of those 32 bits is
    > 1 in the input. Otherwise, the output exactly matches the input.
    >
    > Here's a more complete list of values
    >
    > Before                        After
    > 0x1.00000feedfacep+1          0x1.000007fffffffp+1
    > 0x1.0000f0eedfacep+1          0x1.0000f0eedfacep+1
    > 0x1.0000000000000p+1          0x1.0000000000000p+1
    > 0x1.0000000000001p+1          0x1.0000000000001p+1
    > 0x1.0000000000010p+1          0x1.0000000000010p+1
    > 0x1.0000000000100p+1          0x1.0000000000100p+1
    > 0x1.0000000001000p+1          0x1.0000000001000p+1
    > 0x1.0000000010000p+1          0x1.0000000010000p+1
    > 0x1.0000000100000p+1          0x1.0000000100000p+1
    > 0x1.0000001000000p+1          0x1.0000001000000p+1
    > 0x1.0000002000000p+1          0x1.0000002000000p+1
    > 0x1.0000004000000p+1          0x1.0000004000000p+1
    > 0x1.0000008000000p+1          0x1.0000008000000p+1
    > 0x1.0000010000000p+1          0x1.0000010000000p+1
    > 0x1.000007fffffffp+1          0x1.000007fffffffp+1
    > 0x1.0000080000000p+1          0x1.000007fffffffp+1
    > 0x1.0000080000001p+1          0x1.000007fffffffp+1
    > 0x1.0000080000010p+1          0x1.000007fffffffp+1
    > 0x1.0000080000100p+1          0x1.000007fffffffp+1
    > 0x1.0000080001000p+1          0x1.000007fffffffp+1
    > 0x1.0000080010000p+1          0x1.000007fffffffp+1
    > 0x1.0000080100000p+1          0x1.000007fffffffp+1
    > 0x1.0000081000000p+1          0x1.000007fffffffp+1
    > 0x1.0000082000000p+1          0x1.000007fffffffp+1
    > 0x1.0000084000000p+1          0x1.000007fffffffp+1
    > 0x1.0000088000000p+1          0x1.000007fffffffp+1
    > 0x1.0000090000000p+1          0x1.000007fffffffp+1
    > 0x1.00000ffffffffp+1          0x1.000007fffffffp+1
    >
    > int main(void)
    > {
    >   FILE *f;
    >   double x;
    >   double *p;
    >   double testdoubles[] =
    >   {
    >     0x1.00000feedfacep+1, 0x1.0000f0eedfacep+1, 0x1.0000000000000p+1,
    >     0x1.0000000000001p+1, 0x1.0000000000010p+1, 0x1.0000000000100p+1,
    >     0x1.0000000001000p+1, 0x1.0000000010000p+1, 0x1.0000000100000p+1,
    >     0x1.0000001000000p+1, 0x1.0000002000000p+1, 0x1.0000004000000p+1,
    >     0x1.0000008000000p+1, 0x1.0000010000000p+1, 0x1.000007fffffffp+1,
    >     0x1.0000080000000p+1, 0x1.0000080000001p+1, 0x1.0000080000010p+1,
    >     0x1.0000080000100p+1, 0x1.0000080001000p+1, 0x1.0000080010000p+1,
    >     0x1.0000080100000p+1, 0x1.0000081000000p+1, 0x1.0000082000000p+1,
    >     0x1.0000084000000p+1, 0x1.0000088000000p+1, 0x1.0000090000000p+1,
    >     0x1.00000ffffffffp+1, 0
    >   };
    >
    >   printf("%-30s%-30s\n", "Before", "After");
    >   for(p=testdoubles;*p;++p) {
    >     x = *p;
    >
    >     printf("%-30.13a", x);
    >
    >     f=fopen("fnord", "w+");
    >     if(!f)
    >       return 1;
    >     fwriteieee754(x, f, 1);
    >     rewind(f);
    >     if(!fread(&x, sizeof x, 1, f))
    >       return 1;
    >     remove("fnord");
    >
    >     printf("%-30.13a\n", x);
    >   }
    >
    >   return 0;}
    >
    > --
    > Alan Curry



    This
    lowlong = (long) (significand - hibits * 4294967296);
    should be
    lowlong = (unsigned long) (significand hibits * 4294967296).

    Well tested.
    Malcolm McLean, Feb 18, 2010
    #6
  7. Malcolm McLean

    Alan Curry Guest

    In article <>,
    Malcolm McLean <> wrote:
    |
    |This
    | lowlong = (long) (significand - hibits * 4294967296);
    |should be
    | lowlong = (unsigned long) (significand hibits * 4294967296).
    |

    After that fix, I ran a few million random bit patterns through, and the
    only ones that didn't make it through identical were the NaNs.

    Then I modified your function to replace all the doubles with long doubles,
    to see what would happen if it ran on a system where the native double was
    more precise than the IEEE double. The worst error was off-by-one in the
    low bit.

    Even that went away when I made sure that all the arithmetic was being done
    with the larger type, like it would be if your original code was compiled
    someplace where "double" was the larger type.

    --
    Alan Curry
    Alan Curry, Feb 18, 2010
    #7
  8. Malcolm McLean

    Ben Pfaff Guest

    (Alan Curry) writes:

    > In article <>,
    > Malcolm McLean <> wrote:
    > |
    > |This
    > | lowlong = (long) (significand - hibits * 4294967296);
    > |should be
    > | lowlong = (unsigned long) (significand hibits * 4294967296).
    > |
    >
    > After that fix, I ran a few million random bit patterns through, and the
    > only ones that didn't make it through identical were the NaNs.


    For what it's worth, it is usually pretty easy, and not too slow,
    to test floating-point routines for all 2**32 possible "float"
    values, to raise confidence even further.
    --
    Ben Pfaff
    http://benpfaff.org
    Ben Pfaff, Feb 19, 2010
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Eli Bendersky
    Replies:
    1
    Views:
    1,151
    Mike Treseler
    Mar 1, 2006
  2. Replies:
    7
    Views:
    900
  3. Pflanzen Gold
    Replies:
    11
    Views:
    437
    Moonie
    Aug 12, 2004
  4. Ravishankar S

    IEEE754 fp

    Ravishankar S, Feb 14, 2008, in forum: C Programming
    Replies:
    3
    Views:
    808
    Army1987
    Feb 14, 2008
  5. Pavel
    Replies:
    13
    Views:
    917
    Richard Herring
    Mar 14, 2008
Loading...

Share This Page