Structure size and binary format

G

gamehack

Hi all,

I've been wondering when I write a structure like:

struct {
int a;
unsigned int b;
float c;
} mystruct;

And then I'm using this as a record for a binary file. The problem is
that the size of the types is different on different
platforms(win/lin/osx) so if a file was copied on another platform and
attempted to be read then the first say 16 bytes could be regarded as
the integer a but it could have been created on system where integer
was 32 bytes. Is there a portable solution to this? Moreover, I've been
looking for some resource on designing your own binary format and I
couldn't find anything apart from short tutorials how to read binary
files. Are there any good resources?

Thanks a lot
 
M

Mark McIntyre

Hi all,

I've been wondering when I write a structure like:

struct {
int a;
unsigned int b;
float c;
} mystruct;

And then I'm using this as a record for a binary file. The problem is
that the size of the types is different on different
platforms(win/lin/osx) so if a file was copied on another platform and
attempted to be read then the first say 16 bytes could be regarded as
the integer a but it could have been created on system where integer
was 32 bytes. Is there a portable solution to this?

The simplest is to store the data as text, not binary data. Other
methods might involve using fixed-width data types (if your platforms
support them), or writing custom load/save functions for each platform
which still store in binary but do it element by element and take into
account the differing sizes of types on each platform.


Mark McIntyre
 
C

Chuck F.

gamehack said:
I've been wondering when I write a structure like:

struct {
int a;
unsigned int b;
float c;
} mystruct;

And then I'm using this as a record for a binary file. The
problem is that the size of the types is different on different
platforms(win/lin/osx) so if a file was copied on another
platform and attempted to be read then the first say 16 bytes
could be regarded as the integer a but it could have been
created on system where integer was 32 bytes.

Good. You recognize the existence of a problem. The answer is
"Don't do that". Binary representations are, in general, not
portable. You can convert things into a sequence of bytes and
write/read those to a file, but that means you also have to write
the conversion mechanisms. Now such things as byte sex can bite you.

Far and away the most portable transportation mechanism is pure
text. You already have conversion routines in the standard
library, and all you need to do is use them. Anybody and their dog
can read the files.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
 
M

Malcolm

gamehack said:
I've been wondering when I write a structure like:

struct {
int a;
unsigned int b;
float c;
} mystruct;

And then I'm using this as a record for a binary file. The problem is
that the size of the types is different on different
platforms(win/lin/osx) so if a file was copied on another platform and
attempted to be read then the first say 16 bytes could be regarded as
the integer a but it could have been created on system where integer
was 32 bytes. Is there a portable solution to this? Moreover, I've been
looking for some resource on designing your own binary format and I
couldn't find anything apart from short tutorials how to read binary
files. Are there any good resources?
Integers are easy. Just use the AND and OR operators, together with the
bitshifts ( >> <<) to break up an integer into 8-bit chunks, and store it,
big-endian, in a file.

It is necessary to use the big-endian format because otherwise those
little-endians might take over the world, and force us all to store our
bytes at the little end, and we don't wnat that happening.

The float is a bit more tricky. Floating point number have their own
internal format. The good news is that virtually all are 32-bit IEEE format
(sign, exponent, mantissa). You can probably get away with a binary dump,
making sure of the endianness. However to be really portable, you do need to
break the number up into its constitutents, and then rebuild it, using the
ldexp() and frexp() functions.
 
E

Eric Sosman

Chuck said:
Good. You recognize the existence of a problem. The answer is "Don't
do that". Binary representations are, in general, not portable. You
can convert things into a sequence of bytes and write/read those to a
file, but that means you also have to write the conversion mechanisms.
Now such things as byte sex can bite you.

Far and away the most portable transportation mechanism is pure text.
You already have conversion routines in the standard library, and all
you need to do is use them. Anybody and their dog can read the files.
 
E

Eric Sosman

(Please excuse the vacuous reply that I fat-fingered
a moment ago.)
Good. You recognize the existence of a problem. The answer is "Don't
do that". Binary representations are, in general, not portable. You
can convert things into a sequence of bytes and write/read those to a
file, but that means you also have to write the conversion mechanisms.
Now such things as byte sex can bite you.

"Don't do that" needs a little qualification, I think.
If "that" means "just read and write the struct in whatever
form the compiler happens to choose," the advice is sound.
But the claim that binary representations are not portable
(I'm not sure what "in general" means here) doesn't hold up.
Who has not transported a ZIP or GIF or JPEG file between
dissimilar systems? At a lower level, who has not exchanged
IP packets with other systems? Portability is a matter of
agreed-upon standards, not of the underlying representations
chosen.
Far and away the most portable transportation mechanism is pure text.
You already have conversion routines in the standard library, and all
you need to do is use them. Anybody and their dog can read the files.

Text has a few pitfalls of its own. Even without appealing
to the multitude of character encoding schemes, some difficulties
are apparent. For example, it is no simple matter to devise a
portable text representation for arbitrary `double' values. A
value encoded as text, sent to another machine and decoded, then
re-encoded and sent back again may not decode to the same value
that was originally transmitted. It requires as much care to
make this work for text as for binary representations. (And I've
got the war stories from a PPOE to prove it, too ...)
 
K

Keith Thompson

Eric Sosman said:
Text has a few pitfalls of its own. Even without appealing
to the multitude of character encoding schemes, some difficulties
are apparent. For example, it is no simple matter to devise a
portable text representation for arbitrary `double' values. A
value encoded as text, sent to another machine and decoded, then
re-encoded and sent back again may not decode to the same value
that was originally transmitted. It requires as much care to
make this work for text as for binary representations. (And I've
got the war stories from a PPOE to prove it, too ...)

A hexadecimal floating-point representation (supported in C99,
implementable in C90) should avoid at least some of the problems.
With enough digits, you can have an exact textual representation of a
floating-point value.
 
G

gamehack

Thank you. That's why I wondered how to design a format, like .zip .jpg
etc :) Do you basically say that each 33 bytes would be one pixel, and
the value of red would be the first 11 bytes, green next 11 bytes, and
then last 11 bytes are going to be blue. And probably some fixed-size
headers at the end file(or probably using some sequence of bytes to
mark end of fields in the header). The problem is that I haven't seen
_any_ good resources about designing file formats. Any pointers?

Regards,
gamehack
 
E

Eric Sosman

gamehack said:
Thank you. That's why I wondered how to design a format, like .zip .jpg
etc :) Do you basically say that each 33 bytes would be one pixel, and
the value of red would be the first 11 bytes, green next 11 bytes, and
then last 11 bytes are going to be blue. And probably some fixed-size
headers at the end file(or probably using some sequence of bytes to
mark end of fields in the header). The problem is that I haven't seen
_any_ good resources about designing file formats. Any pointers?

<OT>

Visit http://www.wotsit.org/ to find descriptions of
many file formats. Some are binary, some are textual. Some
are designed for portability, some are not. In any event, a
review of what's already been done should give you some ideas.
Perhaps you'll even find an existing format that meets your
needs; if so, adopting it might make available whole suites of
helpful tools for dealing with it.

</OT>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top