Reading a binary file

Angel · Apr 28, 2011

Hi folks,

I'm writing a program that can manipulate files in the format as
described on this site:
http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>
} __attribute__((__packed__));

This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

And yes, I know there are already tools out there that can manipulate
Infinite Engine stuff, I'm just doing this for entertainment and
education.

Chad · Apr 28, 2011

Hi folks,

I'm writing a program that can manipulate files in the format as
described on this site:http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>

} __attribute__((__packed__));

This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

And yes, I know there are already tools out there that can manipulate
Infinite Engine stuff, I'm just doing this for entertainment and
education.

Maybe I'm not getting this, but won't this break if there is some
(additional) padding in the structure?

Chad

Angel · Apr 28, 2011

Hi folks,

I'm writing a program that can manipulate files in the format as
described on this site:http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
? char ? ? ? ? ?signature[4];
? char ? ? ? ? ?version[4];
? uint32_t ? ? ?generic_name_strref;
? <...>

} __attribute__((__packed__));

This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

And yes, I know there are already tools out there that can manipulate
Infinite Engine stuff, I'm just doing this for entertainment and
education.

Click to expand...

Maybe I'm not getting this, but won't this break if there is some
(additional) padding in the structure?

That's exactly why the "__attribute__((__packed__))" is there at the end
of the struct declaration. My first tries indeed broke on padding.

Ben Bacarisse · Apr 29, 2011

Angel said:
I'm writing a program that can manipulate files in the format as
described on this site:
http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>
} __attribute__((__packed__));

This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

The thing that would bother me is that this code will only work when it
runs on a machine that uses the same representation if 32-bit integers
that are used by the file format.

If that's fine, go for it. If not, you will want to read the integers
as unsigned char arrays so you can construct the integers "by value"
rather than "by representation".

<snip>

Angel · Apr 29, 2011

Angel said:
Angel said:

I'm writing a program that can manipulate files in the format as
described on this site:
http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>
} __attribute__((__packed__));

This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

Click to expand...

The thing that would bother me is that this code will only work when it
runs on a machine that uses the same representation if 32-bit integers
that are used by the file format.

If that's fine, go for it. If not, you will want to read the integers
as unsigned char arrays so you can construct the integers "by value"
rather than "by representation".

Endian-ness problems have indeed crossed my mind, but since the software
that uses this file format (BioWare's Infinity Engine) only runs on
Intel anyway, I didn't consider it such a big issue.

I might use functions like le32toh() to fix this issue for completeness'
sake, but since they are not standard, I have been hesitant to do so.

Angel · Apr 29, 2011

The functions htons, htonl, ntohs, and ntohl are widely available and
understood. I use them as a processor neutral format: use htonX on any CPU
and you can use ntohX on any other CPU. What actually is network order
doesn't matter as long as you can convert to/from and always get the correct
value.

Yes, but network byte order is big-endian, and the data in the file I'm
reading has nothing to do with the network and is little-endian, having
been created on Intel.

Malcolm McLean · Apr 29, 2011

Hi folks,

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>

} __attribute__((__packed__));

[ to fread() ]

No this is a bad habit.

Write functions to read a 16 and 32-bit big and little endian integer
from file, then use them to read each member separately.

The software might just run on Intel now, but you wnat to be able to
move routines as easily as possible to new platforms. Why create a
platform dependency?

Eric Sosman · Apr 29, 2011

[...]
Endian-ness problems have indeed crossed my mind, but since the software
that uses this file format (BioWare's Infinity Engine) only runs on
Intel anyway, I didn't consider it such a big issue.

I might use functions like le32toh() to fix this issue for completeness'
sake, but since they are not standard, I have been hesitant to do so.

Your choice, of course, but it's an odd juxtaposition: To avoid
using non-standard functions, you rely on non-standard representation.
What's that line about "straining at gnats?"

Angel · Apr 29, 2011

Hi folks,

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
? char ? ? ? ? ?signature[4];
? char ? ? ? ? ?version[4];
? uint32_t ? ? ?generic_name_strref;
? <...>

} __attribute__((__packed__));

[ to fread() ]

Click to expand...

No this is a bad habit.

Well, actually I started this just as an excuse to meddle with the
fread() function. Since the file consists of a header followed by one or
more data blocks all with the same layout, it seemed the most logical
choice. And I was actually considering perhaps using mmap() instead of
fread().

Write functions to read a 16 and 32-bit big and little endian integer
from file, then use them to read each member separately.

Each structure has quite a few such members, resulting in a huge number
of separate reads. Though I agree that is the most portable way to do
it. But also the most boring, and I am doing this mainly for fun. (And
to learn something, hence my post here.)

The software might just run on Intel now, but you wnat to be able to
move routines as easily as possible to new platforms. Why create a
platform dependency?

The software that uses these files is over 10 years old and proprietary,
I don't think it'll be ported to any other platform anytime soon.

(Yes, there is a Linux implementation in GemRB, but as far as I know
that one only works on Intel as well.)

So platform independence was not the first thing on my mind when I
started on this, but with the great hints I've been given here it is
something I will attempt to achieve.

Angel · Apr 29, 2011

[...]
Endian-ness problems have indeed crossed my mind, but since the software
that uses this file format (BioWare's Infinity Engine) only runs on
Intel anyway, I didn't consider it such a big issue.

I might use functions like le32toh() to fix this issue for completeness'
sake, but since they are not standard, I have been hesitant to do so.

Click to expand...

Your choice, of course, but it's an odd juxtaposition: To avoid
using non-standard functions, you rely on non-standard representation.
What's that line about "straining at gnats?"

The presentation was already set, I did not invent the file format. And
keep in mind, I'm merely doing this for fun and education, I think I'm
allowed a bit of freedom that I would not have in a professional
environment.

Reading a Binary File....	14	Nov 19, 2007
reading in and parsing through a binary file	9	Feb 2, 2009
Reading little-endian data from a file in a portable manner	46	Jul 16, 2010
Reading text file contents to a character buffer	29	Aug 2, 2010
reading binary file into memory. Converting from char to uint32,float, double, ASCII strings etc (st	37	Oct 15, 2011
Transmitting/receiving binary content portably	16	Feb 23, 2010
Skipping bytes while reading a binary file?	2	Feb 5, 2009
x86 binary runs; x86_64 binary throws segfault	60	Jan 7, 2010

Reading a binary file

Angel

Chad

Angel

Ben Bacarisse

Angel

Angel

Malcolm McLean

Eric Sosman

Angel

Angel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads