? about file formats

J

James

Hi,

I have a dos program that creates data files which are used in another
program that is written by the same company. I am trying to figure out how I
can read the data from the data files.

When I open the files in notepad they look like crap so I'm assuming they
are written in binary?

Now, assuming they are written in binary are there any methods I can use to
try to determine the format?

I opened one in a hex editor and I see 4 rows of numbers (like 01 00 20 02
00 20 03 etc) on the left and a bunch of dots on the right side view. From
my limited knowledge I'm guessing that I'm looking at the file in
hexadecimal on the left and the right is non printable or non text
characters thus showing up as dots.

So I'm curious how people go about determining file formats. Is it mostly
guess work or is there a more strategic approach I can use?

Thanks alot!!

Btw please recommend a group I can ask this in if it doesn't apply here.
 
K

Karl Heinz Buchegger

James said:
Hi,

I have a dos program that creates data files which are used in another
program that is written by the same company. I am trying to figure out how I
can read the data from the data files.

When I open the files in notepad they look like crap so I'm assuming they
are written in binary?

Reasonable assumption
Now, assuming they are written in binary are there any methods I can use to
try to determine the format?

I opened one in a hex editor and I see 4 rows of numbers (like 01 00 20 02
00 20 03 etc) on the left and a bunch of dots on the right side view. From
my limited knowledge I'm guessing that I'm looking at the file in
hexadecimal on the left and the right is non printable or non text
characters thus showing up as dots.

Right. Most Hex Editors present the data in that way.
So I'm curious how people go about determining file formats. Is it mostly
guess work or is there a more strategic approach I can use?

Ask the company on a documentation for the file format.
If they don't give you that information, then it is .... guess work

Usually you start with:
let the program create a data file with minimal data (no user data
at all if possible). Name that file 'Empty'.
Now let the program create a data file with a little more user
data. Compare that file with 'Empty' and try to find the user data
(the things that change). If your user data contains some text you
most likely will find that text somewhere in the file. Other parts
of the file may have changed also. They could be some organizational
entries, such as: where in the file does the text section start, how
many entries are there (if a byte changes from 0 to 1, eg.). Things
like that. Try to make sense of that.
Try various other data files (but start with small ones. There is
no sense in analyzing a multi-MB data file. You will never figure out
how all those bytes are connected).

Good luck. It can take days or weeks to analyze a binary data format.
 
T

Tim Slattery

James said:
Hi,

I have a dos program that creates data files which are used in another
program that is written by the same company. I am trying to figure out how I
can read the data from the data files.

When I open the files in notepad they look like crap so I'm assuming they
are written in binary?

Something other than ASCII. Anything other than ASCII can be called
binary.
Now, assuming they are written in binary are there any methods I can use to
try to determine the format?

No. The only foolproof way is to examine the source code of the
program that wrote it. Or examine documentation written by somebody
who knew that code,
I opened one in a hex editor and I see 4 rows of numbers (like 01 00 20 02
00 20 03 etc) on the left and a bunch of dots on the right side view. From
my limited knowledge I'm guessing that I'm looking at the file in
hexadecimal on the left and the right is non printable or non text
characters thus showing up as dots.

That's right, that's how reasonable Hex editors work. A hex display
side-by-side with an ASCII display. Dots are usually displayed on the
ASCII side for unprintables.
So I'm curious how people go about determining file formats. Is it mostly
guess work or is there a more strategic approach I can use?

As stated above, guess work unless you can find source code or
documentation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top