Reading a binary file

A

Angel

Hi folks,


I'm writing a program that can manipulate files in the format as
described on this site:
http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>
} __attribute__((__packed__));


This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?


And yes, I know there are already tools out there that can manipulate
Infinite Engine stuff, I'm just doing this for entertainment and
education. :)
 
C

Chad

Hi folks,

I'm writing a program that can manipulate files in the format as
described on this site:http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
  char          signature[4];
  char          version[4];
  uint32_t      generic_name_strref;
  <...>

} __attribute__((__packed__));

This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

And yes, I know there are already tools out there that can manipulate
Infinite Engine stuff, I'm just doing this for entertainment and
education. :)

Maybe I'm not getting this, but won't this break if there is some
(additional) padding in the structure?

Chad
 
A

Angel

Hi folks,

I'm writing a program that can manipulate files in the format as
described on this site:http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
? char ? ? ? ? ?signature[4];
? char ? ? ? ? ?version[4];
? uint32_t ? ? ?generic_name_strref;
? <...>

} __attribute__((__packed__));

This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

And yes, I know there are already tools out there that can manipulate
Infinite Engine stuff, I'm just doing this for entertainment and
education. :)

Maybe I'm not getting this, but won't this break if there is some
(additional) padding in the structure?

That's exactly why the "__attribute__((__packed__))" is there at the end
of the struct declaration. My first tries indeed broke on padding. :)
 
B

Ben Bacarisse

Angel said:
I'm writing a program that can manipulate files in the format as
described on this site:
http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>
} __attribute__((__packed__));


This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

The thing that would bother me is that this code will only work when it
runs on a machine that uses the same representation if 32-bit integers
that are used by the file format.

If that's fine, go for it. If not, you will want to read the integers
as unsigned char arrays so you can construct the integers "by value"
rather than "by representation".

<snip>
 
A

Angel

Angel said:
I'm writing a program that can manipulate files in the format as
described on this site:
http://www.ugcs.caltech.edu/~jedwin/baldur_ITM.html

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>
} __attribute__((__packed__));


This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

The thing that would bother me is that this code will only work when it
runs on a machine that uses the same representation if 32-bit integers
that are used by the file format.

If that's fine, go for it. If not, you will want to read the integers
as unsigned char arrays so you can construct the integers "by value"
rather than "by representation".

Endian-ness problems have indeed crossed my mind, but since the software
that uses this file format (BioWare's Infinity Engine) only runs on
Intel anyway, I didn't consider it such a big issue.

I might use functions like le32toh() to fix this issue for completeness'
sake, but since they are not standard, I have been hesitant to do so.
 
A

Angel

The functions htons, htonl, ntohs, and ntohl are widely available and
understood. I use them as a processor neutral format: use htonX on any CPU
and you can use ntohX on any other CPU. What actually is network order
doesn't matter as long as you can convert to/from and always get the correct
value.

Yes, but network byte order is big-endian, and the data in the file I'm
reading has nothing to do with the network and is little-endian, having
been created on Intel.
 
M

Malcolm McLean

Hi folks,


Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
  char          signature[4];
  char          version[4];
  uint32_t      generic_name_strref;
  <...>

} __attribute__((__packed__));

[ to fread() ]
No this is a bad habit.

Write functions to read a 16 and 32-bit big and little endian integer
from file, then use them to read each member separately.

The software might just run on Intel now, but you wnat to be able to
move routines as easily as possible to new platforms. Why create a
platform dependency?
 
E

Eric Sosman

[...]
Endian-ness problems have indeed crossed my mind, but since the software
that uses this file format (BioWare's Infinity Engine) only runs on
Intel anyway, I didn't consider it such a big issue.

I might use functions like le32toh() to fix this issue for completeness'
sake, but since they are not standard, I have been hesitant to do so.

Your choice, of course, but it's an odd juxtaposition: To avoid
using non-standard functions, you rely on non-standard representation.
What's that line about "straining at gnats?"
 
A

Angel

Hi folks,


Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
? char ? ? ? ? ?signature[4];
? char ? ? ? ? ?version[4];
? uint32_t ? ? ?generic_name_strref;
? <...>

} __attribute__((__packed__));

[ to fread() ]
No this is a bad habit.

Well, actually I started this just as an excuse to meddle with the
fread() function. Since the file consists of a header followed by one or
more data blocks all with the same layout, it seemed the most logical
choice. And I was actually considering perhaps using mmap() instead of
fread().
Write functions to read a 16 and 32-bit big and little endian integer
from file, then use them to read each member separately.

Each structure has quite a few such members, resulting in a huge number
of separate reads. Though I agree that is the most portable way to do
it. But also the most boring, and I am doing this mainly for fun. (And
to learn something, hence my post here.)
The software might just run on Intel now, but you wnat to be able to
move routines as easily as possible to new platforms. Why create a
platform dependency?

The software that uses these files is over 10 years old and proprietary,
I don't think it'll be ported to any other platform anytime soon. :)
(Yes, there is a Linux implementation in GemRB, but as far as I know
that one only works on Intel as well.)

So platform independence was not the first thing on my mind when I
started on this, but with the great hints I've been given here it is
something I will attempt to achieve.
 
A

Angel

[...]
Endian-ness problems have indeed crossed my mind, but since the software
that uses this file format (BioWare's Infinity Engine) only runs on
Intel anyway, I didn't consider it such a big issue.

I might use functions like le32toh() to fix this issue for completeness'
sake, but since they are not standard, I have been hesitant to do so.

Your choice, of course, but it's an odd juxtaposition: To avoid
using non-standard functions, you rely on non-standard representation.
What's that line about "straining at gnats?"

The presentation was already set, I did not invent the file format. And
keep in mind, I'm merely doing this for fun and education, I think I'm
allowed a bit of freedom that I would not have in a professional
environment. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top