recommendation for dealing with legacy data

J

Jeff Kish

Greetings.

I am not too awfully advanced when it comes to java programming, but I have
done a fair amount of c/c++.

I have some legacy data files which are fixed length binary.
I'd like to figure out the best way to read the files from a java program that
may be running on any variety of platforms, and subsequently process the data.

The data fields in each record in the file may have '\n', nulls or any other
data.

Can someone recommend the best way, or even a good way to go about reading
this file and taking the 1st n bytes and processing it etc. I'll need to
find/recognize/skip '\n' etc.

The datafiles will always have come from a Wintel machine, using probably the
default western character set whatever the heck that is.

Any gotchas/watch out fors/etc would be appreciated.

Thanks
Jeff Kish
 
T

Thomas Weidenfeller

Jeff said:
I have some legacy data files

Your description does not match:

(a)
which are fixed length binary.
(b)

I'll need to
find/recognize/skip '\n' etc.
(c)

using probably the
default western character set


(a) says you have fixed-length records, (b) suggests you have
variable-length records. What is it?

If you have binary fixed-length records, then RandomeAccess file might
be a good start for accessing and skipping records.

If you have binary variable-length records, a FileInputStream plus a
BufferedInputStream are a good start. Depending on the individual data
encoding, a DataInputStream might help, too. Or you need to program the
decoding by hand.

(a) says you have binaries, in (b) and (c) you seem to talk about text
files. What is it?

If you have indeed a text file, then FileReader and a BufferedReader are
a good start.
Any gotchas/watch out fors/etc would be appreciated.

Clarify your requirements.

/Thomas
 
C

Chris Uppal

Jeff said:
Any gotchas/watch out fors/etc would be appreciated.

Take some time out to get your head properly around the difference between
textual information and binary data. As a C (or C++) programmer, you have
probably spent your life thus far without having to consider the difference
between the two (assuming you use char, or unisgned char, for both). In Java
the two are (correctly) not conflated, and you will /have/ to be aware at all
times which you are dealing with.

A recent thread, entitled "Strings and bindary data", (this will probably
wrap):
http://groups.google.co.uk/group/comp.lang.java.programmer/browse_frm/thread/1df0f881855c6d8f
might be an effective starting point. It contains an over-long post from
myself on the topic, plus -- probably more helpful -- several links to further
info.

Beyond that, it's simple as long as you keep your head straight. Represent
binary data as binary (byte[] arrays, or ints, etc), and textual data as text
(Stings, char[] arrays, etc). If the input is binary read it using a
ReadStream (or one of its variants), if it is textual use a Reader (or one of
the variants). You can't easily mix the two, so if the data is actually mixed,
then read it as binary, treat it as binary while you identify the textual
subsequences, and then convert them to some suitable textual represention (for
which you will need to understand the basics of charsets/character
encodings/code-pages).

DataInputStream is occasionally useful, but not -- in my experience -- very
often.

-- chris
 
J

Jeff Kish

Your description does not match:

(a)



(a) says you have fixed-length records, (b) suggests you have
variable-length records. What is it?

If you have binary fixed-length records, then RandomeAccess file might
be a good start for accessing and skipping records.

If you have binary variable-length records, a FileInputStream plus a
BufferedInputStream are a good start. Depending on the individual data
encoding, a DataInputStream might help, too. Or you need to program the
decoding by hand.

(a) says you have binaries, in (b) and (c) you seem to talk about text
files. What is it?

If you have indeed a text file, then FileReader and a BufferedReader are
a good start.


Clarify your requirements.

/Thomas
It's a really nasty fixed length binary with some bytes corresponding to text
data.
But it is binary, and it is fixed length and some sections of each record have
text data.

Sorry it was badly designed by someone about 13 years ago who didn't know what
they were doing (and who shall go un-named ;> ) )

regards
Jeff Kish
 
O

Oliver Wong

Jeff Kish said:
Greetings.

I am not too awfully advanced when it comes to java programming, but I
have
done a fair amount of c/c++.

I have some legacy data files which are fixed length binary.
I'd like to figure out the best way to read the files from a java program
that
may be running on any variety of platforms, and subsequently process the
data.

The data fields in each record in the file may have '\n', nulls or any
other
data.

Can someone recommend the best way, or even a good way to go about reading
this file and taking the 1st n bytes and processing it etc. I'll need to
find/recognize/skip '\n' etc.

The datafiles will always have come from a Wintel machine, using probably
the
default western character set whatever the heck that is.

Any gotchas/watch out fors/etc would be appreciated.

There are too many "ors" and "etcs." here for me to make much sense of
the nature of your files. If it's arbitrary binary data, what's wrong with
using java.io.FileInputStream to turn your file into a stream of bytes, and
working from there?

I don't understand what you mean by "I'll need to find/recognize/skip
'\n' etc." Why would the '\n' character be treated specially in arbitrary
binary data? It's fixed length, so surely this character isn't acting as a
seperator, right?

- Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top