fread/fwrite Portability Issues

J

Jonathan Lamothe

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey all.

I'm trying to find a way to (portably) write 32-bit integer values to a
file. As it stands, I've been using something like this:

#include <stdio.h>

/* Write val to the file pointed to by f. */
int write32(unsigned long val, FILE *f)
{
unsigned char c;
int i;

/* Make sure the pointer is valid. */
if(!f)
{
fprintf(stderr, "NULL file pointer.\n");
return 1;
}

/* Write the 4 bytes from LSB to MSB. */
for(i = 0; i < 4; i++)
{
c = (val >> (i * 8)) & 0xff;
if(fwrite(&c, 1, 1, f) != 1)
{
/* Ensure the writing occurred. */
fprintf(stderr, "Write error.\n");
return 1;
}
}

/* Exit normally. */
return 0;
}

This seems to work, but I'm told that char is not always 8 bits. This
means that on big-endian systems with larger char sizes, I'll be writing
all zeros to the file.

Can anyone suggest an alternative?

- --
Regards,
Jonathan Lamothe

/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
*/

die_if_kernel("Oops", regs, error_code);
-- From linux/arch/i386/mm/fault.c
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEwm7wq9nD47x87JYRAoMGAJ957wTuvop8ijiHMOrvaT81c6b6+wCfQulN
DiVY3WI5ORK+oK2YJgF0sas=
=RAEB
-----END PGP SIGNATURE-----
 
T

Tom St Denis

Jonathan said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey all.

I'm trying to find a way to (portably) write 32-bit integer values to a
file. As it stands, I've been using something like this:

c = (val >> (i * 8)) & 0xff;
if(fwrite(&c, 1, 1, f) != 1)

No, convert to a buffer then call fwrite once. You're calling the
function for single bytes and that's terribly inefficient.
This seems to work, but I'm told that char is not always 8 bits. This
means that on big-endian systems with larger char sizes, I'll be writing
all zeros to the file.

Um, no. If you store data in the lower 32-bits then that will always
work. That is, regardless of endianess

unsigned x = 0x1234;
x >>= 8;

x will now equal 0x12 regardless if the platform is big, little or even
middle endian.

Tom
 
S

spibou

Tom said:
No, convert to a buffer then call fwrite once. You're calling the
function for single bytes and that's terribly inefficient.

Why would it be inefficient ? fwrite will probably put the bytes in
a buffer anyway.
Um, no. If you store data in the lower 32-bits then that will always
work. That is, regardless of endianess

Indeed , the programme is correct. I can't imagine why Jonathan thinks
that zeros will be stored to the file if char is greater than 8 bits.

Spiros Bousbouras
 
T

Tom St Denis

Indeed , the programme is correct. I can't imagine why Jonathan thinks
that zeros will be stored to the file if char is greater than 8 bits.

He's probably thinking that on a 64-bit platform that

unsigned x = 0x1234;

would mean

x = 0x1234000000000000

And the shift by 8 gets the LSB zeroes.

Sure, [for the OP] in memory it may be stored as

12 34 00 00 00 00 ...

But when you use it as a data type the operations have well defined
behaviours. So "x & 255" is always 0x34 until you modify x.

You're correct that you can't just memcpy or fwrite the unsigned [or
whatever] types to a file and expect the code to work elsewhere.
However, if you [correctly] mask off the invidual bytes it will work as
desired.

Tom
 
J

Jonathan Lamothe

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Indeed , the programme is correct. I can't imagine why Jonathan thinks
that zeros will be stored to the file if char is greater than 8 bits.

This is my understanding, please correct me if I'm wrong.

Assuming the system uses a 16-bit char value, a value of 0x20 would be
stored like this:

Little-endian
Offset: 0 1
Value: 0x20 0x00

Big-endian
Offset: 0 1
Value: 0x00 0x20

If c were set to this value in the function I wrote, wouldn't the
expression fprintf(c, 1, 1, f) write the value stored at offset 0 to the
output file? Or do all systems point to the least significant byte, and
big-endian systems use a negative offset? (Which in hindsight makes
more sense)

Offset: -1 0
Value: 0x00 0x20

Unfortunately, I only have an Intel P3 processor to compile and run code
on, so I can't test this for myself. :(

- --
Regards,
Jonathan Lamothe

/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
*/

die_if_kernel("Oops", regs, error_code);
-- From linux/arch/i386/mm/fault.c
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEwn/rq9nD47x87JYRArQMAKCMniF+M7c1oB7T55HjIXMhqsnJygCfduOH
55K0jCNY+P6LVt6uVtfj00c=
=A9Bx
-----END PGP SIGNATURE-----
 
T

Tom St Denis

Jonathan said:
This is my understanding, please correct me if I'm wrong.

Assuming the system uses a 16-bit char value, a value of 0x20 would be
stored like this:

Little-endian
Offset: 0 1
Value: 0x20 0x00

Big-endian
Offset: 0 1
Value: 0x00 0x20

So far, reasonably correct.
If c were set to this value in the function I wrote, wouldn't the
expression fprintf(c, 1, 1, f) write the value stored at offset 0 to the
output file? Or do all systems point to the least significant byte, and
big-endian systems use a negative offset? (Which in hindsight makes
more sense)

No. fprintf writes units of "char". That is your command tells it to
write 1 char pointed to by "c" [you'd have to use &c in the above
example].

So even though c may be 0x00 20 in memory the value is still 0x20 which
means the value in the file must be 0x20. The value in the file need
not occupy 8-bits [but in all honesty you can expect it to] but simply
1 char.

Now, if you did

short c = 0x20;
fwrite(&c, 1, 1, out);

You could get in trouble because short could be larger than a char. In
which case, you may write the first char being 0x00.

Tom
 
K

Keith Thompson

Jonathan Lamothe said:
I'm trying to find a way to (portably) write 32-bit integer values to a
file.

The best approach may depend on just how portable you want the code to
be. There are some things that you can reasonable assume that aren't
actually guaranteed by the standard.

For example, a conforming freestanding implementation (typically for
an embedded system) needn't provide <stdio.h>, so any code that uses
it can only be portable to hosted implementations. That's probably
not a problem.
As it stands, I've been using something like this:

#include <stdio.h>

/* Write val to the file pointed to by f. */

This would be a good place to document the return values.
int write32(unsigned long val, FILE *f)

unsigned long is guaranteed to be *at least* 32 bits. It could be
longer. As long as you only use it for values from 0 to 2**31-1, this
shouldn't be a problem for the code you've written. You might
consider using uint32_t rather than unsigned long; it's defined in
{
unsigned char c;
int i;

/* Make sure the pointer is valid. */
if(!f)
{
fprintf(stderr, "NULL file pointer.\n");
return 1;
}

/* Write the 4 bytes from LSB to MSB. */
for(i = 0; i < 4; i++)
{
c = (val >> (i * 8)) & 0xff;
if(fwrite(&c, 1, 1, f) != 1)
{
/* Ensure the writing occurred. */
fprintf(stderr, "Write error.\n");
return 1;
}
}

/* Exit normally. */
return 0;
}

This seems to work, but I'm told that char is not always 8 bits.

Right, but you're unlikely to run into a system with CHAR_BIT > 8.
It's not uncommon on DSPs, but those systems tend to be embedded
anyway.

If you want to deal with that possibility, you should decide whether
you want to write 8-bit quantities or bytes (CHAR_BIT-bit quantities).
In either case, 8 is a "magic number"; you should replace it with
either CHAR_BIT or some constant that you declare yourself. Likewise
for 0xff.
This
means that on big-endian systems with larger char sizes, I'll be writing
all zeros to the file.

Not necessarily. The ">>" operator is defined on the *value* if its
operands. (val >> 8) & 0xff will give you the low-order 8 bits of
val, regardless of where those bits are stored.
Can anyone suggest an alternative?

If you don't care about exotic systems with CHAR_BIT > 8 (and there's
probably no good reason why you should), you can make sure your code
fails as early as possible if someone tries to run it on such a
system. For example:

#include <limits.h>
....
#if CHAR_BIT != 8
#error CHAR_BIT != 8
#endif

If you want 100% portability, you'll have to decide what the program
should do if CHAR_BIT > 8. Writing 8-bit quantities as bytes is one
reasonable approach, but there are others. If you're willing to
settle for 99.9% portability to systems you're likely to encounter,
you can avoid a fair amount of work re-defining the problem.

Byte order shouldn't be a problem, given the way you're breaking down
the value of val into 8-bit quantities. Padding bits would be an
issue if you wrote the 32-bit value directly to the file, but you're
not doing that (and unsigned char cannot have padding bits).

One more thing: On an error, your function returns a value indicating
the error *and* prints a message to stderr. For the function to be
more useful, drop the error message and define codes for different
error conditions. Let the caller decide how to handle any errors.
 
T

Tom St Denis

Keith said:
unsigned long is guaranteed to be *at least* 32 bits. It could be
longer. As long as you only use it for values from 0 to 2**31-1, this
shouldn't be a problem for the code you've written. You might
consider using uint32_t rather than unsigned long; it's defined in
<stdint.h> in C99, or you can define it yourself if your
implementation doesn't provide it.

My understanding was that *if* there is a 32-bit type then uint32_t
will exist. Otherwise, it doesn't have to be provided.

So you could write

uint32_t mydata_yipee;

And have it not compile on certain platforms.

Tom
 
J

Jonathan Lamothe

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Now, if you did

short c = 0x20;
fwrite(&c, 1, 1, out);

You could get in trouble because short could be larger than a char. In
which case, you may write the first char being 0x00.

Tom

That's what I meant. I don't know how that slipped past me (that's
actually how it's written in the original posted code).

Anyway, I know there's a typedef called u_int8_t which is defined in
sys/types.h. It seems that using this in place of a short would solve
my problem, but can I expect it to be recognized on all (or at least
most) platforms?

- --
Regards,
Jonathan Lamothe

/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
*/

die_if_kernel("Oops", regs, error_code);
-- From linux/arch/i386/mm/fault.c
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEwoYuNrv4JaRC3JsRAq2VAKCOh787N59foh+jt/vRxjI6qKsZFwCfeaSt
mRpDIJhPICW+d0ohZZr8z+k=
=HlLV
-----END PGP SIGNATURE-----
 
K

Keith Thompson

Jonathan Lamothe said:
This is my understanding, please correct me if I'm wrong.

Assuming the system uses a 16-bit char value, a value of 0x20 would be
stored like this:

Little-endian
Offset: 0 1
Value: 0x20 0x00

Big-endian
Offset: 0 1
Value: 0x00 0x20

If c were set to this value in the function I wrote, wouldn't the
expression fprintf(c, 1, 1, f) write the value stored at offset 0 to the

I think you mean fwrite(&c, 1, 1, f).
output file? Or do all systems point to the least significant byte, and
big-endian systems use a negative offset? (Which in hindsight makes
more sense)

Offset: -1 0
Value: 0x00 0x20

Unfortunately, I only have an Intel P3 processor to compile and run code
on, so I can't test this for myself. :(

You declared c as an unsigned char, which is guaranteed to be exactly
one byte. (A byte is CHAR_BIT bits. The standard requires CHAR_BIT >= 8;
on most systems you're likely to encounter, CHAR_BIT == 8.)

So fwrite(&c, 1, 1, f) writes exactly one byte to the file. There are
no byte-order issues, since you don't have multiple bytes.

If CHAR_BIT == 16 and c == 0x20, then the byte written has the value
0x20 (or, equivalently, 0x0020). If c == 0x1234, the byte written has
the value 0x1234. If you look at the representation, the byte at
offset 0 has the value 0x1234; there is no offset 1.
 
J

Jonathan Lamothe

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is very helpful. Thanks. :)

- --
Regards,
Jonathan Lamothe

/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
*/

die_if_kernel("Oops", regs, error_code);
-- From linux/arch/i386/mm/fault.c
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEwoerq9nD47x87JYRAs7jAKCghN8MRoMAJkyyPzHlfUTvpVWByQCg7Y3l
Co+p1508LVEkolGOirw7VSg=
=IV4w
-----END PGP SIGNATURE-----
 
S

spibou

Tom said:
Jonathan said:
This is my understanding, please correct me if I'm wrong.

Assuming the system uses a 16-bit char value, a value of 0x20 would be
stored like this:

Little-endian
Offset: 0 1
Value: 0x20 0x00

Big-endian
Offset: 0 1
Value: 0x00 0x20

So far, reasonably correct.
If c were set to this value in the function I wrote, wouldn't the
expression fprintf(c, 1, 1, f) write the value stored at offset 0 to the
output file? Or do all systems point to the least significant byte, and
big-endian systems use a negative offset? (Which in hindsight makes
more sense)

No. fprintf writes units of "char". That is your command tells it to
write 1 char pointed to by "c" [you'd have to use &c in the above
example].

So even though c may be 0x00 20 in memory the value is still 0x20 which
means the value in the file must be 0x20. The value in the file need
not occupy 8-bits [but in all honesty you can expect it to] but simply
1 char.

I take it that fprintf above was actually meant to be fwrite.

To return to the actual problem I realize now that I'm not clear what
"portable"
(mentioned in the opening post) is supposed to mean. For example assume
that
the programme runs on a platform where char is 16 bits and writes the
file on
some medium. Then assume that we read the file from the medium on a
platform
where char is 8 bits. To begin with it is not even clear that this
other platform will
be able to read the medium at all and if it can what kind of
preprocessing the operating
system will do before the C programme gets any data. So I think that we
need more
information on what "portable" means or what application Jonathan has
in mind.

But anyway fwrite(&c, 1, 1, f) (I'm assuming that c is a char rather
than a pointer)
will write 1 char. If char happens to be 16 bits on the platform then
it will write 16
bits. As far as the C standard is concerned byte is the same as char ,
not 8 bits.

Spiros Bousbouras
 
S

spibou

Jonathan said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


That's what I meant. I don't know how that slipped past me (that's
actually how it's written in the original posted code).

In the original posted code c is an unsigned char. Perhaps by
"original"
you mean the unposted code from which the posted code came ?
 
J

Jonathan Lamothe

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In the original posted code c is an unsigned char. Perhaps by
"original"
you mean the unposted code from which the posted code came ?

I was referring to the name of the function, not the argument types.
Sorry, I should have been more specific.

Anyway, I more or less have my answer now.

- --
Regards,
Jonathan Lamothe

/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
*/

die_if_kernel("Oops", regs, error_code);
-- From linux/arch/i386/mm/fault.c
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEwqFBq9nD47x87JYRAiVBAJ9nLrs0bM2A0fE2sDqC3ETDnzQsHwCgnKQs
aMa/uGeBoM5OOnOW37ZXG00=
=U+Yt
-----END PGP SIGNATURE-----
 
S

SM Ryan

# -----BEGIN PGP SIGNED MESSAGE-----
# Hash: SHA1
#
# Hey all.
#
# I'm trying to find a way to (portably) write 32-bit integer values to a
# file. As it stands, I've been using something like this:

A highly portable technique is
fprintf("%ld",(long)integervalue)

Beyond that it depends on how portable you want to be.

When I do stuff like this I require 32 bit twos complement where
a long is exactly 4 consecutive chars and the system implementation
include ntohl and htonl calls. On such systems I can portably use

long buffer = htonl(integervalue);
assert(sizeof buffer==4);
fwrite(&buffer,sizeof buffer,1,file);

long buffer;
assert(sizeof buffer==4);
fread(&buffer,sizeof buffer,1,file);
integervalue = ntohl(buffer);

Most systems nowadays provide ntohX and htonX, so you can use those
(on most systems) to abstract away byte order issues in a way
that is likely to be easily understood and replicated by others without
a lot of bit twiddling grot. (For example if a file system is defined
that all metadata is stored in 'network order', you immediately know
how to byte swap the metadata into an usable order for your cpu.)
 
T

Tom St Denis

SM said:
A highly portable technique is
fprintf("%ld",(long)integervalue)

Assuming you add a file stream to the left of the arg list...
When I do stuff like this I require 32 bit twos complement where
a long is exactly 4 consecutive chars and the system implementation
include ntohl and htonl calls. On such systems I can portably use

Linking in more code that may or may not be around is a bad idea. Also
"long" need not be "sizeof == 4", and indeed isn't on many platforms.
long buffer = htonl(integervalue);
assert(sizeof buffer==4);
fwrite(&buffer,sizeof buffer,1,file);

This isn't portable. And it's just "not" how to do it.

Tom
 
S

SM Ryan

# > When I do stuff like this I require 32 bit twos complement where
# > a long is exactly 4 consecutive chars and the system implementation
# > include ntohl and htonl calls. On such systems I can portably use
#
# Linking in more code that may or may not be around is a bad idea. Also
# "long" need not be "sizeof == 4", and indeed isn't on many platforms.

How many system support ntoh32 and hton32?

If I want to use widely available ntohX and htonX calls,
then I have to match the X to the integer length. I already
required these calls be available on the machines I'm porting
to. I don't care about other machines.

# > long buffer = htonl(integervalue);
# > assert(sizeof buffer==4);
# > fwrite(&buffer,sizeof buffer,1,file);
#
# This isn't portable. And it's just "not" how to do it.

Odd that it works so well on bunch of different architectures
with different operating systems. Perhaps I have a different
definition of portable than you do.
 
I

Ian Collins

SM said:
# > When I do stuff like this I require 32 bit twos complement where
# > a long is exactly 4 consecutive chars and the system implementation
# > include ntohl and htonl calls. On such systems I can portably use
#
# Linking in more code that may or may not be around is a bad idea. Also
# "long" need not be "sizeof == 4", and indeed isn't on many platforms.

How many system support ntoh32 and hton32?

If I want to use widely available ntohX and htonX calls,
then I have to match the X to the integer length. I already
required these calls be available on the machines I'm porting
to. I don't care about other machines.

# > long buffer = htonl(integervalue);
# > assert(sizeof buffer==4);
# > fwrite(&buffer,sizeof buffer,1,file);
#
# This isn't portable. And it's just "not" how to do it.

Odd that it works so well on bunch of different architectures
with different operating systems. Perhaps I have a different
definition of portable than you do.
You'd hit a problem on a 64 bit system where sizeof(long) tends to be 8.
 
S

SM Ryan

# > # > When I do stuff like this I require 32 bit twos complement where
# > # > a long is exactly 4 consecutive chars and the system implementation
# > # > include ntohl and htonl calls. On such systems I can portably use

# You'd hit a problem on a 64 bit system where sizeof(long) tends to be 8.

I also would have a problem on 60 bit ones complement machines. Good thing
I stated my assumptions explicitly.
 
I

Ian Collins

SM said:
# > # > When I do stuff like this I require 32 bit twos complement where
# > # > a long is exactly 4 consecutive chars and the system implementation
# > # > include ntohl and htonl calls. On such systems I can portably use

# You'd hit a problem on a 64 bit system where sizeof(long) tends to be 8.

I also would have a problem on 60 bit ones complement machines. Good thing
I stated my assumptions explicitly.
Fine, but it does rule out a large percentage of desktop and server
platforms. So it's not an ideal solution to the OP's portability request.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Staff online

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top