How to determine integer values in a txt file?

C

C-man

Hey there, basically I want to use InputStream to read in a txt file into a
buffer, then I want to find an integer value in the text file so that I know
where a certain record starts. Basically, there are all these records in the
file that would look like a C struct: integer ID=?
Name=?
Address?

and stuff and each record will have an ID, I want to search the txt file for
the id, then I know that the next 40 bytes are the record and I can read
that into a temporary record so I can extract the info. How would I go about
sorting through the text byte by byte to find that integer value? Hopefully
that all makes sense. Any help would be appreciated.

Thanks

C
 
A

Anthony Borla

C-man said:
Hey there, basically I want to use InputStream to read
in a txt file into a buffer, then I want to find an integer
value in the text file so that I know where a certain record
starts. Basically, there are all these records in the
file that would look like a C struct:
integer ID=?
Name=?
Address?

and stuff and each record will have an ID, I want to search
the txt file for the id, then I know that the next 40 bytes are
the record and I can read that into a temporary record so
I can extract the info. How would I go about sorting through
the text byte by byte to find that integer value? Hopefully
that all makes sense. Any help would be appreciated.

A few questions about your file's layout. You say it is a text file, meaning
that all contents is in human-readable form e.g.

* Integer 12345 ==> Five characters [12345]
* Floating Point 2.2 ==> Three characters [2.2]
* Non-numeric "ABC" ==> Three characters [ABC]

Assuming this is the case, you also mention that the data is formatted as
'records', and are of a fixed size: the record identifier plus 40 bytes. The
way this data is best handled is dependant on:

* If each record is newline-delimited it might be easist to
use 'readLine' to retrieve a record, then parse the resulting
'String'

* If the record identifier is fixed-size, thereby making each
record fixed-size, then it might be easist to read the data
into a 'char' array via 'read(charBuffer, 0, RECSIZE)',
then extract each 'field' using positional information e.g.
'char' 0 - 5 ==> Record Identifier
'char' 6 - 8 ==> ...

* If the record identifier is variable-size and separated from
the rest of the data by a delimiter character, then a two-stage
'read' might be appropriate:

- Read character-by-character, appending each character
to a 'StringBuffer', until the delimiter is found - this is your
'Record Identifier' [use: while ((ch = read() != -1) ...]

Next, read the 40 characters into a 'char' array via:
'read(charDataBuf, 0, 40)'

* If the record identifier is variable-size but *not* separated
from the rest of the data by a delimiter character you have
no choice by to read byte-by-byte until a non-numeric
character is found - this 'marks off' the end of the 'Record
Identifier'; the next part is the 40 character data, handled as
for the previous section

In all cases it has been assumed any file header data has already been
extracted, and that something like the following code has been used:

BufferedReader in =
new BufferedReader(
new InputStreamReader(... inputstream ...));

I hope this helps.

Anthony Borla

P.S.

Post again, perhaps providing more details of your data layout, if any of
this is not clear
 
A

Anthony Borla

----- Original Message -----
From: "bejay" <[email protected]>
To: <[email protected]>
Sent: Saturday, December 06, 2003 1:28 AM
Subject: Re: How to determine integer values in a txt file?

Baj,

<original text>

Sorry for butting into this thread, but the initial question that was posed
here is something that I am having trouble with too, and you seem to be
somebody well versed in the art of Java... unlike myself I hasten to add.

I have read a few books, but need more examples to better understand the
code and thought if I can write a little programme I would be able to
understand and build on this.

I have written a basic programme that basically accepts a binary file and
creates an output file. Nothing much in between, a few Exception Catches,
but so far it was a little exercise to have input and output selected from
the command line.

I would like to do similar to the question below but not sure where to start
and what the syntax should/would be.

The task:

From any given binary input file, which has a fixed header for example in
hex 01,02,03. I would like my utility to search the input file, lock onto
the fixed header, skip 2 bytes and read the 5th byte to determine how many
bytes to output to my new file?

For example if the 5th byte was in hex 32 (Decimal 50) then take the next 50
bytes and output this to my new file?

Is there an easy way of doing this?

</original text>

The problem boils down to one of positioning and then reading - how it can
best be accomplished depends on how the data is stored.

You say the data is in binary. Sadly, there is no such thing as a
'universal' binary format, so, if your data file was generated by a non-Java
program, maybe on another platform [i.e. different processor / operating
system], you could experience difficulties with different byte orders, or
different floating point formats [if applicable]. For the current
discussion, we'll assume no such problems exist.

For this problem the most appropriate stream would seem to be a
'FileInputStream', optionally wrapped in a 'BufferedInputStream'. However,
it is also possible to use a 'RandomAccessFile', particularly useful if
arbitrary file pointer movement is needed.

Before describing the location / read process in detail, it is worth
mentioning that had the data been generated by a 'DataOutputStream',
creating a 'standardised' type of binary data, a 'DataInputStream' would
have been used to read the data. The biggest benefit of doing so would have
been the many convenient methods for reading data such as 'readInt', and
'readDouble' etc. It goes without saying that if it is possible to control
the format data will be stored in much can be done to make its procssing
convenient and efficient.

Ok, now to the problem at hand. If the fixed-size header is at the start of
the stream then the task is easy - just read the header in ! If the
fixed-size header is in an arbitrary location you need to search the stream
for the header's signature [i.e. a sequence of bytes acting as a magic
number or identifier]. I see no other means than a byte-by-byte read [e.g.
if 1st byte found, check for 2nd, and 3rd etc - either a signature is found,
or you simply keep searching and testing].

In either case, having found the header, skipping bytes can be accomplished
via 'read' [and discarding the unneeded bytes], or via 'skip'. Reading in
the number of bytes sees a single 'read' performed, while reading in the
data can probably best be accomplished by reading into a 'byte' array.

For writing the 'byte' array to a text file [previously opened using a
'FileWriter' ?] just wrap it up in a 'String' [though note there may be
problems decoding byte -> char], and use the 'FileWriter's' 'write(String,
offset, len)' method.

Barring binary incompatibilty / byte decoding problems, this task is,
really, quite easy to accomplish. The best of luck in your endeavours :) !

Cheers,

Anthony Borla
 
A

Anthony Borla

I just realised I forgot to include a post-post script to my earlier post.
You could also use ' RandomAccessFile', useful because it allows arbitrary
file pointer positioning, and allows in-place rewriting of data [though care
needs to be taken].

I hope this helps.

Anthony Borla
 
P

P.Hill

The original poster might also consider that if he
has fixed seperator tokens, i.e. like comma seperated value,
the use of read line followed by StringTokenizer on the string works
fine.

Just make sure you set returnDelimiter to true in
when you initialize the StringTokenizer so you can
count the commas in a,,,b,123,5

see
http://java.sun.com/j2se/1.4.2/docs...(java.lang.String, java.lang.String, boolean)

Then once you have token have a String token
use Integer.parseInt or anything else Roedy's
helpful pages suggest.

-Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top