Reading lines of a Text File form the end

D

Debajyoti Sarma

Suppose a text file is given.Each line is separated by next line char

line 1
line 2
line 3
..
..
line n-1
line n

I want to read the file from the end i.e.
first read line n ,than line n-1 , so on up to some condition in a
loop
Please provide guidance.

One more doubt : can we insert line 0 at the beginning of the file by
using some inbuilt facility without doing manual shift of the line 1
to line n in forward??
 
T

Tom Anderson

Suppose a text file is given.Each line is separated by next line char

line 1
line 2
line 3
.
.
line n-1
line n

I want to read the file from the end i.e. first read line n ,than line
n-1 , so on up to some condition in a loop Please provide guidance.

Start at the end and scan backwards for a linefeed. Pull out the text you
cover as a string. To read the previous line, scan for the next linefeed
before that, and pull out some more text, and so on.
One more doubt : can we insert line 0 at the beginning of the file by
using some inbuilt facility without doing manual shift of the line 1 to
line n in forward??

No. If you need to do that, a plain file is not a very good way to store
your data. If you could use a database or something instead, you might
find it easier.

tom
 
M

Martin Gregorie

Suppose a text file is given.Each line is separated by next line char

line 1
line 2
line 3
.
.
line n-1
line n

I want to read the file from the end i.e. first read line n ,than line
n-1 , so on up to some condition in a loop
Please provide guidance.
How big is the file? If its only a few MB it might be easiest to read it
into an ArrayList and simply index back through it. Possibly faster too,
since you could use BufferedReader.readLine() and expect decent
performance from the buffering scheme.
One more doubt : can we insert line 0 at the beginning of the file by
using some inbuilt facility without doing manual shift of the line 1 to
line n in forward??
An ArrayList will let you add lines wherever you like.
 
L

Lew

Martin said:
An ArrayList will let you add lines wherever you like.

Which avoids the manual move of elements (insertpos+1) forward, but not the
move of those elements, which now happens automatically and invisibly.

There are a few data structures around the API that will optimize that action
better if it's frequent enough.
 
S

Stefan Ram

Debajyoti Sarma said:
Please provide guidance.

In pseudocode:

process( file )
{ if( not eof( file ))
{ string line = getline( file ); process( file ); read( line ); }}

read( line )
{ /* is called with lines in inverse order */ }
 
B

BGB / cr88192

Stefan Ram said:
In pseudocode:

process( file )
{ if( not eof( file ))
{ string line = getline( file ); process( file ); read( line ); }}

read( line )
{ /* is called with lines in inverse order */ }

errm, this operation is likely to be slower and risk overflowing the stack
for large inputs, which is not good.

better could be to read in all the lines into an array, and then either
reverse the array, or step over the array backwards. this should at least
scale a little better.
 
R

Roedy Green

Suppose a text file is given.Each line is separated by next line char

line 1
line 2
line 3
.
.
line n-1
line n

Your file will be encoded. Encodings are not designed to be
interpreted backwards. To handle it char by char, you must read bytes
and do you own decoding. Much easier, read the file in one big i/o
into a String, then scan backwards in your String. see
http://mindprod.com/products1.html#HUNKIO If the file is too big, read
it forward line by line, export it with a line number on the front.
Use an external sort such as optsort to sort it is reverse order.
see http://mindprod.com/jgloss/optsort.html
 
S

Stefan Ram

BGB / cr88192 said:
better could be to read in all the lines into an array, and then either
reverse the array, or step over the array backwards. this should at least
scale a little better.

This is correct with respect to usual Java implementations.

AFAIK, the Java language specification (possibly, not the
JVM specification) would allow for implementations with a
stack implemented via linked segments on the heap, which
would allow the stack to grow even larger than an array:
While an array needs to have contiguous storage IIRC, a
segmented stack might even use smaller areas of free memory.
 
T

Tom Anderson

Some are, some aren't (well, I suppose technically none are _designed_ for
backwards use, but many are suitable for backwards use nonetheless). ASCII
works fine in reverse, as does UTF-16, and a number of other pre-Unicode
encodings.

Off the top of my head, i think UTF-8 would be straightforward to read
backwards; there's enough type data in the high bits of each byte to get
the structure right from the wrong end. You'd have to write your own
decoder, but it wouldn't be too hard.

It wouldn't be necessary for this application, though, since that only
needs the lines in reverse order, not a complete backwards stream, and LF
and CR are single-byte characters which can be unambiguously identified in
the stream without having to decode everything in between.

tom
 
J

Jim Janney

Debajyoti Sarma said:
Suppose a text file is given.Each line is separated by next line char

line 1
line 2
line 3
.
.
line n-1
line n

I want to read the file from the end i.e.
first read line n ,than line n-1 , so on up to some condition in a
loop
Please provide guidance.

Yet another approach: scan the file once and build a list in memory of
offsets. To read a particular line, pull its offset from the list,
seek to that location, and read until you see a newline (or to the
following offset).
 
B

BGB / cr88192

Stefan Ram said:
This is correct with respect to usual Java implementations.

AFAIK, the Java language specification (possibly, not the
JVM specification) would allow for implementations with a
stack implemented via linked segments on the heap, which
would allow the stack to grow even larger than an array:
While an array needs to have contiguous storage IIRC, a
segmented stack might even use smaller areas of free memory.

AFAIK, the JVM spec would allow this as well, although I would hesitate to
assume that this is would be a typical implementation, especially if JIT'ed,
since this would likely lead to higher performance overhead and increased
interop issues vs using a more traditional fixed-size stack.

and "java.lang.StackOverflowError" probably exists for some good reason...

granted, I can't rule out this possibility, and am not sure what most JVM's
will have actually done.
 
T

Tom Anderson

No, that's not how it works in UTF-8:
-- code points which encode into a single byte yield byte values between
0 and 127 (inclusive);
-- other code points become a sequence of bytes:
** first byte has value between 192 and 247 (inclusive)
** subsequent bytes (one to three extra bytes) have value between
128 and 191 (inclusive)

The first byte of a multi-byte sequence also encodes how many extra
bytes are to be found afterwards. With Unicode as currently defined, no
code point requires more than four bytes: valid code points are in the
0..1114111 range, while allocated code points use about 10% of that
range (so there is still quite some room). The UTF-8 encoding is good up
to 2097152. If a future Unicode version extends the range, UTF-8
encoding can be extended to up to 6-byte encodings, and the first byte
may then assume values 192 to 253. It is a feature of UTF-8 that byte
values 254 and 255 never appear anywhere (it is used for BOM handling,
so that UTF-8 and UTF-16 can be telled appart unambiguously).

Anyway, the ending byte of the UTF-8 encoding of a code point is not
specially marked; but _starting_ bytes are easy to detect. Hence it is
easy to know whether you are at the start of a code point, or should go
back for at least one byte.

Exactly.

To rephrase Thomas's description in terms of bits, bytes in a UTF-8 stream
look like this:

0xxxxxxx ASCII
10xxxxxx trail byte of multibyte character
110xxxxx start byte of a two-byte character
1110xxxx start byte of a three-byte character

A character starts with a byte which does not start with 10. Those are
pretty easy to spot.

See also:

http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html

And everyone should know about this, highly useful:

http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder

tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top