Portable IEEE754 write routine

Malcolm McLean · Feb 17, 2010

Here it is. Many thanks to all who helped, especially Michael on whose
contribution the code is mainly based.

The idea is to write a float in binary IEEE format, reagrdless of the
native representation in the machine you are on.

I've only access to a Windows PC at the moment, so I'd be grateful if
people could test it on other architectures, especially non-IEEE
native floating point units. Particularly I'm not sure about the
denormalised numbers - my printf seems to think they are identical to
zero so I'm not sure if I'm dropping precision.

int fwriteieee754(double x, FILE *fp, int bigendian)
{
int shift;
unsigned long sign, exp, hibits, hilong, lowlong;
double fnorm, significand;
int expbits = 11;
int significandbits = 52;

/* zero (can't handle signed zero) */
if(x == 0)
{
hilong = 0;
lowlong = 0;
goto writedata;
}
/* infinity */
if(x > DBL_MAX)
{
hilong = 1024 + ((1<<(expbits-1)) - 1);
hilong <<= (31 - expbits);
lowlong = 0;
goto writedata;
}
/* -infinity */
if(x < -DBL_MAX)
{
hilong = 1024 + ((1<<(expbits-1)) - 1);
hilong <<= (31-expbits);
hilong |= (1 << 31);
lowlong = 0;
goto writedata;
}
/* NaN - dodgy because many compilers optimise out this test, but
*there is no portable isnan() */
if(x != x)
{
hilong = 1024 + ((1<<(expbits-1)) - 1);
hilong <<= (31 - expbits);
lowlong = 1234;
goto writedata;
}

/* get the sign */
if(x < 0) {sign = 1; fnorm = -x;}
else {sign = 0; fnorm = x;}

/* get the normalized form of f and track the exponent */
shift = 0;
while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while(fnorm < 1.0) { fnorm *= 2.0; shift--; }

/* check for denormalized numbers */
if(shift < -1022)
{
while(shift <-1022) {fnorm /= 2.0; shift++;}
shift = -1023;
}
else
fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

/* calculate the integer form of the significand */
/* hold it in a double for now */

significand = fnorm * ((1LL<<significandbits) + 0.5f);

/* get the biased exponent */
exp = shift + ((1<<(expbits-1)) - 1); /* shift + bias */

/* put the data into tweo longs (for convenience) */
hibits = (long) (significand / 4294967296);
hilong = (sign << 31) | (exp << (31-expbits) ) | hibits;
lowlong = (long) (significand - hibits * 4294967296);

writedata:
/* write the bytes out to the stream */
if(bigendian)
{
fputc( (hilong >> 24) & 0xFF, fp);
fputc( (hilong >> 16) & 0xFF, fp);
fputc( (hilong >> 8) & 0xFF, fp);
fputc( hilong & 0xFF, fp);

fputc( (lowlong >> 24) & 0xFF, fp);
fputc( (lowlong >> 16) & 0xFF, fp);
fputc( (lowlong >> 8) & 0xFF, fp);
fputc( lowlong & 0xFF, fp);
}
else
{
fputc( lowlong & 0xFF, fp);
fputc( (lowlong >> 8) & 0xFF, fp);
fputc( (lowlong >> 16) & 0xFF, fp);
fputc( (lowlong >> 24) & 0xFF, fp);

fputc( hilong & 0xFF, fp);
fputc( (hilong >> 8) & 0xFF, fp);
fputc( (hilong >> 16) & 0xFF, fp);
fputc( (hilong >> 24) & 0xFF, fp);
}
return ferror(fp);
}

Barry Schwarz · Feb 17, 2010

Here it is. Many thanks to all who helped, especially Michael on whose
contribution the code is mainly based.

The idea is to write a float in binary IEEE format, reagrdless of the
native representation in the machine you are on.

I've only access to a Windows PC at the moment, so I'd be grateful if
people could test it on other architectures, especially non-IEEE
native floating point units. Particularly I'm not sure about the
denormalised numbers - my printf seems to think they are identical to
zero so I'm not sure if I'm dropping precision.

int fwriteieee754(double x, FILE *fp, int bigendian)
{
int shift;
unsigned long sign, exp, hibits, hilong, lowlong;
double fnorm, significand;
int expbits = 11;
int significandbits = 52;

/* zero (can't handle signed zero) */
if(x == 0)
{
hilong = 0;
lowlong = 0;
goto writedata;
}
/* infinity */
if(x > DBL_MAX)
{
hilong = 1024 + ((1<<(expbits-1)) - 1);
hilong <<= (31 - expbits);
lowlong = 0;
goto writedata;
}
/* -infinity */
if(x < -DBL_MAX)
{
hilong = 1024 + ((1<<(expbits-1)) - 1);
hilong <<= (31-expbits);
hilong |= (1 << 31);
lowlong = 0;
goto writedata;
}
/* NaN - dodgy because many compilers optimise out this test, but
*there is no portable isnan() */
if(x != x)
{
hilong = 1024 + ((1<<(expbits-1)) - 1);
hilong <<= (31 - expbits);
lowlong = 1234;
goto writedata;
}

/* get the sign */
if(x < 0) {sign = 1; fnorm = -x;}
else {sign = 0; fnorm = x;}

/* get the normalized form of f and track the exponent */
shift = 0;
while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while(fnorm < 1.0) { fnorm *= 2.0; shift--; }

/* check for denormalized numbers */
if(shift < -1022)
{
while(shift <-1022) {fnorm /= 2.0; shift++;}
shift = -1023;
}
else
fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

/* calculate the integer form of the significand */
/* hold it in a double for now */

significand = fnorm * ((1LL<<significandbits) + 0.5f);

Why did you make the 0.5 a float instead of letting it default to
double. Now the integer expression must be converted to float and
then the sum to double to participate in the multiplication. Without
the f suffix, the integer expression would be converted to double
immediately and no further conversion would be needed.

Alan Curry · Feb 17, 2010

|
|The idea is to write a float in binary IEEE format, reagrdless of the
|native representation in the machine you are on.
|
|I've only access to a Windows PC at the moment, so I'd be grateful if
|people could test it on other architectures, especially non-IEEE

I don't have anything that exotic, but on boring old big endian PowerPC
there seems to be some trouble. I ran this test:

#include <stdio.h>
#include <float.h>

|
|int fwriteieee754(double x, FILE *fp, int bigendian)
|{
[...your code unchanged...]
|}

int main(void)
{
FILE *f;
double x;
unsigned char *p;

x = 2.00000095367431640625; /* 0x1.000008p+1 */

printf("Before:\n");
printf(" %a\n", x);
printf(" %.60f\n", x);
for(p=(unsigned char *)&x;p<(unsigned char *)(&x+1);++p)
printf(" %02X", *p);
puts("");

f=fopen("fnord", "w+");
if(!f)
return 1;
fwriteieee754(x, f, 1);
rewind(f);
if(!fread(&x, sizeof x, 1, f))
return 1;
remove("fnord");

printf("After:\n");
printf(" %a\n", x);
printf(" %.60f\n", x);
for(p=(unsigned char *)&x;p<(unsigned char *)(&x+1);++p)
printf(" %02X", *p);
puts("");

return 0;
}

And the output was:

Before:
0x1.000008p+1
2.000000953674316406250000000000000000000000000000000000000000
40 00 00 00 80 00 00 00
After:
0x1.000007fffffffp+1
2.000000953674315962160790149937383830547332763671875000000000
40 00 00 00 7F FF FF FF

I was thrown off by the "bigendian" parameter. At first I thought it
meant "native byte order is bigendian, so swaps must be performed on the
output to make it acceptable to the target little endian machine", but
it turned out to mean the opposite(?) Be clear on that when you write
the documentation.

Malcolm McLean · Feb 18, 2010

|

And the output was:

Before:
0x1.000008p+1
2.000000953674316406250000000000000000000000000000000000000000
40 00 00 00 80 00 00 00
After:
0x1.000007fffffffp+1
2.000000953674315962160790149937383830547332763671875000000000
40 00 00 00 7F FF FF FF

Thanks for running the test.
It's losing a bit. If native format is not IEEE754 (but something very
close) the loss of a bit or two is inevitable. However if the mantissa
is exactly the same size, it would be nice to preserve the exact
pattern.

Alan Curry · Feb 18, 2010

|
|Thanks for running the test.
|It's losing a bit. If native format is not IEEE754 (but something very
|close) the loss of a bit or two is inevitable. However if the mantissa
|is exactly the same size, it would be nice to preserve the exact
|pattern.

PowerPC not IEEE754? News to me...

It's not just that bit though. It seems to be losing the entire lower 32
bits, replacing them with 7fffffff, whenever the high bit of those 32 bits is
1 in the input. Otherwise, the output exactly matches the input.

Here's a more complete list of values

Before After
0x1.00000feedfacep+1 0x1.000007fffffffp+1
0x1.0000f0eedfacep+1 0x1.0000f0eedfacep+1
0x1.0000000000000p+1 0x1.0000000000000p+1
0x1.0000000000001p+1 0x1.0000000000001p+1
0x1.0000000000010p+1 0x1.0000000000010p+1
0x1.0000000000100p+1 0x1.0000000000100p+1
0x1.0000000001000p+1 0x1.0000000001000p+1
0x1.0000000010000p+1 0x1.0000000010000p+1
0x1.0000000100000p+1 0x1.0000000100000p+1
0x1.0000001000000p+1 0x1.0000001000000p+1
0x1.0000002000000p+1 0x1.0000002000000p+1
0x1.0000004000000p+1 0x1.0000004000000p+1
0x1.0000008000000p+1 0x1.0000008000000p+1
0x1.0000010000000p+1 0x1.0000010000000p+1
0x1.000007fffffffp+1 0x1.000007fffffffp+1
0x1.0000080000000p+1 0x1.000007fffffffp+1
0x1.0000080000001p+1 0x1.000007fffffffp+1
0x1.0000080000010p+1 0x1.000007fffffffp+1
0x1.0000080000100p+1 0x1.000007fffffffp+1
0x1.0000080001000p+1 0x1.000007fffffffp+1
0x1.0000080010000p+1 0x1.000007fffffffp+1
0x1.0000080100000p+1 0x1.000007fffffffp+1
0x1.0000081000000p+1 0x1.000007fffffffp+1
0x1.0000082000000p+1 0x1.000007fffffffp+1
0x1.0000084000000p+1 0x1.000007fffffffp+1
0x1.0000088000000p+1 0x1.000007fffffffp+1
0x1.0000090000000p+1 0x1.000007fffffffp+1
0x1.00000ffffffffp+1 0x1.000007fffffffp+1

int main(void)
{
FILE *f;
double x;
double *p;
double testdoubles[] =
{
0x1.00000feedfacep+1, 0x1.0000f0eedfacep+1, 0x1.0000000000000p+1,
0x1.0000000000001p+1, 0x1.0000000000010p+1, 0x1.0000000000100p+1,
0x1.0000000001000p+1, 0x1.0000000010000p+1, 0x1.0000000100000p+1,
0x1.0000001000000p+1, 0x1.0000002000000p+1, 0x1.0000004000000p+1,
0x1.0000008000000p+1, 0x1.0000010000000p+1, 0x1.000007fffffffp+1,
0x1.0000080000000p+1, 0x1.0000080000001p+1, 0x1.0000080000010p+1,
0x1.0000080000100p+1, 0x1.0000080001000p+1, 0x1.0000080010000p+1,
0x1.0000080100000p+1, 0x1.0000081000000p+1, 0x1.0000082000000p+1,
0x1.0000084000000p+1, 0x1.0000088000000p+1, 0x1.0000090000000p+1,
0x1.00000ffffffffp+1, 0
};

printf("%-30s%-30s\n", "Before", "After");
for(p=testdoubles;*p;++p) {
x = *p;

printf("%-30.13a", x);

f=fopen("fnord", "w+");
if(!f)
return 1;
fwriteieee754(x, f, 1);
rewind(f);
if(!fread(&x, sizeof x, 1, f))
return 1;
remove("fnord");

printf("%-30.13a\n", x);
}

return 0;
}

Malcolm McLean · Feb 18, 2010

|
|Thanks for running the test.
|It's losing a bit. If native format is not IEEE754 (but something very
|close) the loss of a bit or two is inevitable. However if the mantissa
|is exactly the same size, it would be nice to preserve the exact
|pattern.

PowerPC not IEEE754? News to me...

It's not just that bit though. It seems to be losing the entire lower 32
bits, replacing them with 7fffffff, whenever the high bit of those 32 bits is
1 in the input. Otherwise, the output exactly matches the input.

Here's a more complete list of values

Before After
0x1.00000feedfacep+1 0x1.000007fffffffp+1
0x1.0000f0eedfacep+1 0x1.0000f0eedfacep+1
0x1.0000000000000p+1 0x1.0000000000000p+1
0x1.0000000000001p+1 0x1.0000000000001p+1
0x1.0000000000010p+1 0x1.0000000000010p+1
0x1.0000000000100p+1 0x1.0000000000100p+1
0x1.0000000001000p+1 0x1.0000000001000p+1
0x1.0000000010000p+1 0x1.0000000010000p+1
0x1.0000000100000p+1 0x1.0000000100000p+1
0x1.0000001000000p+1 0x1.0000001000000p+1
0x1.0000002000000p+1 0x1.0000002000000p+1
0x1.0000004000000p+1 0x1.0000004000000p+1
0x1.0000008000000p+1 0x1.0000008000000p+1
0x1.0000010000000p+1 0x1.0000010000000p+1
0x1.000007fffffffp+1 0x1.000007fffffffp+1
0x1.0000080000000p+1 0x1.000007fffffffp+1
0x1.0000080000001p+1 0x1.000007fffffffp+1
0x1.0000080000010p+1 0x1.000007fffffffp+1
0x1.0000080000100p+1 0x1.000007fffffffp+1
0x1.0000080001000p+1 0x1.000007fffffffp+1
0x1.0000080010000p+1 0x1.000007fffffffp+1
0x1.0000080100000p+1 0x1.000007fffffffp+1
0x1.0000081000000p+1 0x1.000007fffffffp+1
0x1.0000082000000p+1 0x1.000007fffffffp+1
0x1.0000084000000p+1 0x1.000007fffffffp+1
0x1.0000088000000p+1 0x1.000007fffffffp+1
0x1.0000090000000p+1 0x1.000007fffffffp+1
0x1.00000ffffffffp+1 0x1.000007fffffffp+1

int main(void)
{
FILE *f;
double x;
double *p;
double testdoubles[] =
{
0x1.00000feedfacep+1, 0x1.0000f0eedfacep+1, 0x1.0000000000000p+1,
0x1.0000000000001p+1, 0x1.0000000000010p+1, 0x1.0000000000100p+1,
0x1.0000000001000p+1, 0x1.0000000010000p+1, 0x1.0000000100000p+1,
0x1.0000001000000p+1, 0x1.0000002000000p+1, 0x1.0000004000000p+1,
0x1.0000008000000p+1, 0x1.0000010000000p+1, 0x1.000007fffffffp+1,
0x1.0000080000000p+1, 0x1.0000080000001p+1, 0x1.0000080000010p+1,
0x1.0000080000100p+1, 0x1.0000080001000p+1, 0x1.0000080010000p+1,
0x1.0000080100000p+1, 0x1.0000081000000p+1, 0x1.0000082000000p+1,
0x1.0000084000000p+1, 0x1.0000088000000p+1, 0x1.0000090000000p+1,
0x1.00000ffffffffp+1, 0
};

printf("%-30s%-30s\n", "Before", "After");
for(p=testdoubles;*p;++p) {
x = *p;

printf("%-30.13a", x);

f=fopen("fnord", "w+");
if(!f)
return 1;
fwriteieee754(x, f, 1);
rewind(f);
if(!fread(&x, sizeof x, 1, f))
return 1;
remove("fnord");

printf("%-30.13a\n", x);
}

return 0;}

This
lowlong = (long) (significand - hibits * 4294967296);
should be
lowlong = (unsigned long) (significand hibits * 4294967296).

Well tested.

Alan Curry · Feb 18, 2010

|
|This
| lowlong = (long) (significand - hibits * 4294967296);
|should be
| lowlong = (unsigned long) (significand hibits * 4294967296).
|

After that fix, I ran a few million random bit patterns through, and the
only ones that didn't make it through identical were the NaNs.

Then I modified your function to replace all the doubles with long doubles,
to see what would happen if it ran on a system where the native double was
more precise than the IEEE double. The worst error was off-by-one in the
low bit.

Even that went away when I made sure that all the arithmetic was being done
with the larger type, like it would be if your original code was compiled
someplace where "double" was the larger type.

Ben Pfaff · Feb 18, 2010

|
|This
| lowlong = (long) (significand - hibits * 4294967296);
|should be
| lowlong = (unsigned long) (significand hibits * 4294967296).
|

After that fix, I ran a few million random bit patterns through, and the
only ones that didn't make it through identical were the NaNs.

For what it's worth, it is usually pretty easy, and not too slow,
to test floating-point routines for all 2**32 possible "float"
values, to raise confidence even further.

RSA implementation issues in public key pem loader function	0	May 21, 2025
Problem Taking input:(	4	Apr 11, 2006
Universal BMP Steganography Tool (AES-128-CTR + SP800-90A CSPRNG) Full Encoder/Decoder with 3LSB Payload, PasswordDerived Key & External Key File	4	Mar 26, 2026
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026
Javascript programming in TheThingsNetwork	1	May 12, 2022
Portable marshalling of floating point data	3	Dec 10, 2009
Minimum Total Difficulty	0	Nov 15, 2023
Qsort() is messing with my entire Code!!!	0	Apr 25, 2022

Portable IEEE754 write routine

Malcolm McLean

Barry Schwarz

Alan Curry

Malcolm McLean

Alan Curry

Malcolm McLean

Alan Curry

Ben Pfaff

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads