Reading a Binary File

T

Tanuki

Hi All:

I encounter a programming problem recently. I need to read a binary
file. I need to translate the binary data into useful information. I
have the format at hand, like 1st byte = ID, next 4 byte (int) =
serial number etc.

The first problem is Big Endian/ Little Endian problem. I can decipher
if the format is big or little endian. But got confuse as to how to
decipher the data.
Eg. if I know I am on little endian, and I have a integer whose binary
representation is
20 03 00 00, then what is the equivalent decimal?

The next problem is there are also floating point data. How can I
infer the floating point data from a binary representaiton, like what
r the numbers before the decimal point and those after the decimal
point?

I hope I am not too confusing. Any help gladly appreciated. BTW, my
language is C/C++
 
E

Eric Sosman

Tanuki said:
Hi All:

I encounter a programming problem recently. I need to read a binary
file. I need to translate the binary data into useful information. I
have the format at hand, like 1st byte = ID, next 4 byte (int) =
serial number etc.

The first problem is Big Endian/ Little Endian problem. I can decipher
if the format is big or little endian. But got confuse as to how to
decipher the data.
Eg. if I know I am on little endian, and I have a integer whose binary
representation is
20 03 00 00, then what is the equivalent decimal?

Assuming that by "binary" you actually mean "hexadecimal,"
this value is decimal 800. How did I get there?

0x20 + 0x03*0x100 + 0x00*0x10000 + 0x00*0x1000000
or equivalently
((0x00 * 0x100 + 0x00) * 0x100 + 0x03) * 0x100 + 0x20
or equivalently
0x20 + (0x03<<8) + (0x00<<16) + (0x00<<24)
or equivalently
(((((0x00 << 8) + 0x00) << 8) + 0x03) << 8) + 0x20
The next problem is there are also floating point data. How can I
infer the floating point data from a binary representaiton, like what
r the numbers before the decimal point and those after the decimal
point?

Without more knowledge of the representation, you're stuck.
In the integer case you already knew a good deal about what the
representation looked like: you knew it consisted of four eight-
bit bytes arranged in Little-Endian order. (Some questions still
remain, of course: for example, what do negative integers look
like?) But in the floating-point case, all you've told us is
that you know the numbers are floating-point -- but if you know
nothing about the representation, there's no way to decode it.

The best thing to do is consult the documentation for the
system that wrote the file, and see whether it tells you how
floating-point numbers are stored.

Failing that, you could try inspecting the data in the file
and seeing whether it "looks like" a well-known floating-point
format. The commonest such formats are surely the IEEE single-
and double-precision binary floating point; try interpreting
the bits according to those formats and see whether the values
you get "make sense" for the application at hand. If it doesn't
look like IEEE, you could also try the various VAX floating-
point formats, or the S/360 base-16 formats.

Of course, if the numbers were written out on the same system
that's reading them back again, they'll be in some native format
supported by that system. If you can figure out which one, you
needn't sweat the details: just read the bits into an object of
the proper type, and you're done.
 
B

Ben Pfaff

Bjoern Willenberg said:
fread seems to be what you are looking for.

Barring additional translation of what's read from the file, only
if data formats in the file have the same representation as on
the machine's memory.
 
M

Michael B Allen

Hi All:

I encounter a programming problem recently. I need to read a binary
file. I need to translate the binary data into useful information. I
have the format at hand, like 1st byte = ID, next 4 byte (int) = serial
number etc.

The encdec package provides the primatives necessary to pick apart as
well as encode arbitrary binary formats:

http://www.ioplex.com/~miallen/encdec/
The first problem is Big Endian/ Little Endian problem. I can decipher
if the format is big or little endian. But got confuse as to how to
decipher the data.
Eg. if I know I am on little endian, and I have a integer whose binary
representation is
20 03 00 00, then what is the equivalent decimal?

This is 0x00000320 in hex. Use calc.exe in scientific mode, kcalc, or similar
to see what it would be in decimal. To decode this properly you should
use a function like this:

uint32_t
dec_uint32le(const unsigned char *src)
{
return src[0] | ((unsigned)src[1] << 8) |
((unsigned)src[2] << 16) | ((unsigned)src[3] << 24);
}

If the data was big-endian you would use the dec_uint32be function. You
can also use the enc_ functions to encode the format. Etc...
The next problem is there are also floating point data. How can I infer
the floating point data from a binary representaiton, like what r the
numbers before the decimal point and those after the decimal point?

If it's a reasonable format and not something that the last guy just
made up, chances are floating point values are encoded using the standard
IEEE754 encoding. See {enc,dec}_{float,double}{le,be} functions again
from the encdec package.
I hope I am not too confusing. Any help gladly appreciated. BTW, my
language is C/C++

This actually comes up quite frequently believe it or not.

Keep in mind that if you know the format is just the contents of a struct
in memory writen to disk using fwrite you should probably just read it
out that way too. That will be faster. On the other hand, be aware you
cannot just fwrite a structure to a disk file, transfer it to another
computer, and fread it out again. I think you understand there is a
potential for endianess and structure member padding issues.

Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top