Structure size and binary format

gamehack · Dec 31, 2005

Hi all,

I've been wondering when I write a structure like:

struct {
int a;
unsigned int b;
float c;
} mystruct;

And then I'm using this as a record for a binary file. The problem is
that the size of the types is different on different
platforms(win/lin/osx) so if a file was copied on another platform and
attempted to be read then the first say 16 bytes could be regarded as
the integer a but it could have been created on system where integer
was 32 bytes. Is there a portable solution to this? Moreover, I've been
looking for some resource on designing your own binary format and I
couldn't find anything apart from short tutorials how to read binary
files. Are there any good resources?

Thanks a lot

Mark McIntyre · Dec 31, 2005

Hi all,

I've been wondering when I write a structure like:

struct {
int a;
unsigned int b;
float c;
} mystruct;

And then I'm using this as a record for a binary file. The problem is
that the size of the types is different on different
platforms(win/lin/osx) so if a file was copied on another platform and
attempted to be read then the first say 16 bytes could be regarded as
the integer a but it could have been created on system where integer
was 32 bytes. Is there a portable solution to this?

The simplest is to store the data as text, not binary data. Other
methods might involve using fixed-width data types (if your platforms
support them), or writing custom load/save functions for each platform
which still store in binary but do it element by element and take into
account the differing sizes of types on each platform.

Mark McIntyre

Chuck F. · Dec 31, 2005

gamehack said:
I've been wondering when I write a structure like:

struct {
int a;
unsigned int b;
float c;
} mystruct;

And then I'm using this as a record for a binary file. The
problem is that the size of the types is different on different
platforms(win/lin/osx) so if a file was copied on another
platform and attempted to be read then the first say 16 bytes
could be regarded as the integer a but it could have been
created on system where integer was 32 bytes.

Good. You recognize the existence of a problem. The answer is
"Don't do that". Binary representations are, in general, not
portable. You can convert things into a sequence of bytes and
write/read those to a file, but that means you also have to write
the conversion mechanisms. Now such things as byte sex can bite you.

Far and away the most portable transportation mechanism is pure
text. You already have conversion routines in the standard
library, and all you need to do is use them. Anybody and their dog
can read the files.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>

Malcolm · Dec 31, 2005

gamehack said:
I've been wondering when I write a structure like:

struct {
int a;
unsigned int b;
float c;
} mystruct;

And then I'm using this as a record for a binary file. The problem is
that the size of the types is different on different
platforms(win/lin/osx) so if a file was copied on another platform and
attempted to be read then the first say 16 bytes could be regarded as
the integer a but it could have been created on system where integer
was 32 bytes. Is there a portable solution to this? Moreover, I've been
looking for some resource on designing your own binary format and I
couldn't find anything apart from short tutorials how to read binary
files. Are there any good resources?

Integers are easy. Just use the AND and OR operators, together with the
bitshifts ( >> <<) to break up an integer into 8-bit chunks, and store it,
big-endian, in a file.

It is necessary to use the big-endian format because otherwise those
little-endians might take over the world, and force us all to store our
bytes at the little end, and we don't wnat that happening.

The float is a bit more tricky. Floating point number have their own
internal format. The good news is that virtually all are 32-bit IEEE format
(sign, exponent, mantissa). You can probably get away with a binary dump,
making sure of the endianness. However to be really portable, you do need to
break the number up into its constitutents, and then rebuild it, using the
ldexp() and frexp() functions.

Eric Sosman · Dec 31, 2005

Chuck said:
Good. You recognize the existence of a problem. The answer is "Don't
do that". Binary representations are, in general, not portable. You
can convert things into a sequence of bytes and write/read those to a
file, but that means you also have to write the conversion mechanisms.
Now such things as byte sex can bite you.

Far and away the most portable transportation mechanism is pure text.
You already have conversion routines in the standard library, and all
you need to do is use them. Anybody and their dog can read the files.

gamehack · Dec 31, 2005

Thanks a lot guys.

Eric Sosman · Dec 31, 2005

(Please excuse the vacuous reply that I fat-fingered
a moment ago.)

Good. You recognize the existence of a problem. The answer is "Don't
do that". Binary representations are, in general, not portable. You
can convert things into a sequence of bytes and write/read those to a
file, but that means you also have to write the conversion mechanisms.
Now such things as byte sex can bite you.

"Don't do that" needs a little qualification, I think.
If "that" means "just read and write the struct in whatever
form the compiler happens to choose," the advice is sound.
But the claim that binary representations are not portable
(I'm not sure what "in general" means here) doesn't hold up.
Who has not transported a ZIP or GIF or JPEG file between
dissimilar systems? At a lower level, who has not exchanged
IP packets with other systems? Portability is a matter of
agreed-upon standards, not of the underlying representations
chosen.

Far and away the most portable transportation mechanism is pure text.
You already have conversion routines in the standard library, and all
you need to do is use them. Anybody and their dog can read the files.

Text has a few pitfalls of its own. Even without appealing
to the multitude of character encoding schemes, some difficulties
are apparent. For example, it is no simple matter to devise a
portable text representation for arbitrary `double' values. A
value encoded as text, sent to another machine and decoded, then
re-encoded and sent back again may not decode to the same value
that was originally transmitted. It requires as much care to
make this work for text as for binary representations. (And I've
got the war stories from a PPOE to prove it, too ...)

Keith Thompson · Dec 31, 2005

Eric Sosman said:
Text has a few pitfalls of its own. Even without appealing
to the multitude of character encoding schemes, some difficulties
are apparent. For example, it is no simple matter to devise a
portable text representation for arbitrary `double' values. A
value encoded as text, sent to another machine and decoded, then
re-encoded and sent back again may not decode to the same value
that was originally transmitted. It requires as much care to
make this work for text as for binary representations. (And I've
got the war stories from a PPOE to prove it, too ...)

A hexadecimal floating-point representation (supported in C99,
implementable in C90) should avoid at least some of the problems.
With enough digits, you can have an exact textual representation of a
floating-point value.

gamehack · Dec 31, 2005

Thank you. That's why I wondered how to design a format, like .zip .jpg
etc

Do you basically say that each 33 bytes would be one pixel, and
the value of red would be the first 11 bytes, green next 11 bytes, and
then last 11 bytes are going to be blue. And probably some fixed-size
headers at the end file(or probably using some sequence of bytes to
mark end of fields in the header). The problem is that I haven't seen
_any_ good resources about designing file formats. Any pointers?

Regards,
gamehack

Eric Sosman · Dec 31, 2005

gamehack said:
Thank you. That's why I wondered how to design a format, like .zip .jpg
etc Do you basically say that each 33 bytes would be one pixel, and
the value of red would be the first 11 bytes, green next 11 bytes, and
then last 11 bytes are going to be blue. And probably some fixed-size
headers at the end file(or probably using some sequence of bytes to
mark end of fields in the header). The problem is that I haven't seen
_any_ good resources about designing file formats. Any pointers?

<OT>

Visit http://www.wotsit.org/ to find descriptions of
many file formats. Some are binary, some are textual. Some
are designed for portability, some are not. In any event, a
review of what's already been done should give you some ideas.
Perhaps you'll even find an existing format that meets your
needs; if so, adopting it might make available whole suites of
helpful tools for dealing with it.

</OT>

Structure Size and Padding Byte Questions	2	Oct 1, 2013
Transmitting/receiving binary content portably	16	Feb 23, 2010
How to be sure a structure field is aligned	19	Apr 19, 2012
Structure size	57	Jul 19, 2007
word size and gcc builtins usage	9	May 19, 2011
integer to binary 0-padded	4	Jun 15, 2011
The distinction between a java applet and an application	1	Jan 4, 2023
comparing binary trees in C	12	May 1, 2009

Structure size and binary format

gamehack

Mark McIntyre

Chuck F.

Malcolm

Eric Sosman

gamehack

Eric Sosman

Keith Thompson

gamehack

Eric Sosman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads