Reading lines of a Text File form the end

Discussion in 'Java' started by Debajyoti Sarma, Aug 25, 2010.

  1. Suppose a text file is given.Each line is separated by next line char

    line 1
    line 2
    line 3
    ..
    ..
    line n-1
    line n

    I want to read the file from the end i.e.
    first read line n ,than line n-1 , so on up to some condition in a
    loop
    Please provide guidance.

    One more doubt : can we insert line 0 at the beginning of the file by
    using some inbuilt facility without doing manual shift of the line 1
    to line n in forward??
    Debajyoti Sarma, Aug 25, 2010
    #1
    1. Advertising

  2. Debajyoti Sarma

    Tom Anderson Guest

    On Wed, 25 Aug 2010, Debajyoti Sarma wrote:

    > Suppose a text file is given.Each line is separated by next line char
    >
    > line 1
    > line 2
    > line 3
    > .
    > .
    > line n-1
    > line n
    >
    > I want to read the file from the end i.e. first read line n ,than line
    > n-1 , so on up to some condition in a loop Please provide guidance.


    Start at the end and scan backwards for a linefeed. Pull out the text you
    cover as a string. To read the previous line, scan for the next linefeed
    before that, and pull out some more text, and so on.

    > One more doubt : can we insert line 0 at the beginning of the file by
    > using some inbuilt facility without doing manual shift of the line 1 to
    > line n in forward??


    No. If you need to do that, a plain file is not a very good way to store
    your data. If you could use a database or something instead, you might
    find it easier.

    tom

    --
    I sometimes think that the IETF is one of the crown jewels in all of
    western civilization. -- Tim O'Reilly
    Tom Anderson, Aug 25, 2010
    #2
    1. Advertising

  3. On Wed, 25 Aug 2010 12:21:13 -0700, Debajyoti Sarma wrote:

    > Suppose a text file is given.Each line is separated by next line char
    >
    > line 1
    > line 2
    > line 3
    > .
    > .
    > line n-1
    > line n
    >
    > I want to read the file from the end i.e. first read line n ,than line
    > n-1 , so on up to some condition in a loop
    > Please provide guidance.
    >

    How big is the file? If its only a few MB it might be easiest to read it
    into an ArrayList and simply index back through it. Possibly faster too,
    since you could use BufferedReader.readLine() and expect decent
    performance from the buffering scheme.

    > One more doubt : can we insert line 0 at the beginning of the file by
    > using some inbuilt facility without doing manual shift of the line 1 to
    > line n in forward??
    >

    An ArrayList will let you add lines wherever you like.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
    Martin Gregorie, Aug 25, 2010
    #3
  4. Debajyoti Sarma

    Lew Guest

    Debajyoti Sarma wrote:
    >> One more doubt : can we insert line 0 at the beginning of the file by
    >> using some inbuilt facility without doing manual shift of the line 1 to
    >> line n in forward??


    Martin Gregorie wrote:
    > An ArrayList will let you add lines wherever you like.


    Which avoids the manual move of elements (insertpos+1) forward, but not the
    move of those elements, which now happens automatically and invisibly.

    There are a few data structures around the API that will optimize that action
    better if it's frequent enough.

    --
    Lew
    Lew, Aug 26, 2010
    #4
  5. Debajyoti Sarma

    Stefan Ram Guest

    Debajyoti Sarma <> writes:
    >Please provide guidance.


    In pseudocode:

    process( file )
    { if( not eof( file ))
    { string line = getline( file ); process( file ); read( line ); }}

    read( line )
    { /* is called with lines in inverse order */ }
    Stefan Ram, Aug 26, 2010
    #5
  6. "Stefan Ram" <-berlin.de> wrote in message
    news:-berlin.de...
    > Debajyoti Sarma <> writes:
    >>Please provide guidance.

    >
    > In pseudocode:
    >
    > process( file )
    > { if( not eof( file ))
    > { string line = getline( file ); process( file ); read( line ); }}
    >
    > read( line )
    > { /* is called with lines in inverse order */ }
    >


    errm, this operation is likely to be slower and risk overflowing the stack
    for large inputs, which is not good.

    better could be to read in all the lines into an array, and then either
    reverse the array, or step over the array backwards. this should at least
    scale a little better.
    BGB / cr88192, Aug 26, 2010
    #6
  7. Debajyoti Sarma

    Roedy Green Guest

    On Wed, 25 Aug 2010 12:21:13 -0700 (PDT), Debajyoti Sarma
    <> wrote, quoted or indirectly quoted someone
    who said :

    >Suppose a text file is given.Each line is separated by next line char
    >
    >line 1
    >line 2
    >line 3
    >.
    >.
    >line n-1
    >line n


    Your file will be encoded. Encodings are not designed to be
    interpreted backwards. To handle it char by char, you must read bytes
    and do you own decoding. Much easier, read the file in one big i/o
    into a String, then scan backwards in your String. see
    http://mindprod.com/products1.html#HUNKIO If the file is too big, read
    it forward line by line, export it with a line number on the front.
    Use an external sort such as optsort to sort it is reverse order.
    see http://mindprod.com/jgloss/optsort.html

    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    You encapsulate not just to save typing, but more importantly, to make it easy and safe to change the code later, since you then need change the logic in only one place. Without it, you might fail to change the logic in all the places it occurs.
    Roedy Green, Aug 26, 2010
    #7
  8. Debajyoti Sarma

    Stefan Ram Guest

    "BGB / cr88192" <> writes:
    >better could be to read in all the lines into an array, and then either
    >reverse the array, or step over the array backwards. this should at least
    >scale a little better.


    This is correct with respect to usual Java implementations.

    AFAIK, the Java language specification (possibly, not the
    JVM specification) would allow for implementations with a
    stack implemented via linked segments on the heap, which
    would allow the stack to grow even larger than an array:
    While an array needs to have contiguous storage IIRC, a
    segmented stack might even use smaller areas of free memory.
    Stefan Ram, Aug 26, 2010
    #8
  9. Debajyoti Sarma

    Tom Anderson Guest

    On Thu, 26 Aug 2010, Peter Duniho wrote:

    > Roedy Green wrote:
    >> Your file will be encoded. Encodings are not designed to be
    >> interpreted backwards.

    >
    > Some are, some aren't (well, I suppose technically none are _designed_ for
    > backwards use, but many are suitable for backwards use nonetheless). ASCII
    > works fine in reverse, as does UTF-16, and a number of other pre-Unicode
    > encodings.


    Off the top of my head, i think UTF-8 would be straightforward to read
    backwards; there's enough type data in the high bits of each byte to get
    the structure right from the wrong end. You'd have to write your own
    decoder, but it wouldn't be too hard.

    It wouldn't be necessary for this application, though, since that only
    needs the lines in reverse order, not a complete backwards stream, and LF
    and CR are single-byte characters which can be unambiguously identified in
    the stream without having to decode everything in between.

    tom

    --
    I'd get more sense out of a crossed line with the Krankies
    Tom Anderson, Aug 26, 2010
    #9
  10. Debajyoti Sarma

    Jim Janney Guest

    Debajyoti Sarma <> writes:

    > Suppose a text file is given.Each line is separated by next line char
    >
    > line 1
    > line 2
    > line 3
    > .
    > .
    > line n-1
    > line n
    >
    > I want to read the file from the end i.e.
    > first read line n ,than line n-1 , so on up to some condition in a
    > loop
    > Please provide guidance.


    Yet another approach: scan the file once and build a list in memory of
    offsets. To read a particular line, pull its offset from the list,
    seek to that location, and read until you see a newline (or to the
    following offset).

    --
    Jim Janney
    Jim Janney, Aug 26, 2010
    #10
  11. "Stefan Ram" <-berlin.de> wrote in message
    news:-berlin.de...
    > "BGB / cr88192" <> writes:
    >>better could be to read in all the lines into an array, and then either
    >>reverse the array, or step over the array backwards. this should at least
    >>scale a little better.

    >
    > This is correct with respect to usual Java implementations.
    >
    > AFAIK, the Java language specification (possibly, not the
    > JVM specification) would allow for implementations with a
    > stack implemented via linked segments on the heap, which
    > would allow the stack to grow even larger than an array:
    > While an array needs to have contiguous storage IIRC, a
    > segmented stack might even use smaller areas of free memory.
    >


    AFAIK, the JVM spec would allow this as well, although I would hesitate to
    assume that this is would be a typical implementation, especially if JIT'ed,
    since this would likely lead to higher performance overhead and increased
    interop issues vs using a more traditional fixed-size stack.

    and "java.lang.StackOverflowError" probably exists for some good reason...

    granted, I can't rule out this possibility, and am not sure what most JVM's
    will have actually done.
    BGB / cr88192, Aug 27, 2010
    #11
  12. Debajyoti Sarma

    Tom Anderson Guest

    On Fri, 27 Aug 2010, Thomas Pornin wrote:

    > According to Peter Duniho <>:
    >
    >> My understanding is that you can tell from a single byte in UTF-8
    >> whether it's the end of a character or not. But to identify the
    >> beginning of a character, you need to look for the end of the
    >> _previous_ character.

    >
    > No, that's not how it works in UTF-8:
    > -- code points which encode into a single byte yield byte values between
    > 0 and 127 (inclusive);
    > -- other code points become a sequence of bytes:
    > ** first byte has value between 192 and 247 (inclusive)
    > ** subsequent bytes (one to three extra bytes) have value between
    > 128 and 191 (inclusive)
    >
    > The first byte of a multi-byte sequence also encodes how many extra
    > bytes are to be found afterwards. With Unicode as currently defined, no
    > code point requires more than four bytes: valid code points are in the
    > 0..1114111 range, while allocated code points use about 10% of that
    > range (so there is still quite some room). The UTF-8 encoding is good up
    > to 2097152. If a future Unicode version extends the range, UTF-8
    > encoding can be extended to up to 6-byte encodings, and the first byte
    > may then assume values 192 to 253. It is a feature of UTF-8 that byte
    > values 254 and 255 never appear anywhere (it is used for BOM handling,
    > so that UTF-8 and UTF-16 can be telled appart unambiguously).
    >
    > Anyway, the ending byte of the UTF-8 encoding of a code point is not
    > specially marked; but _starting_ bytes are easy to detect. Hence it is
    > easy to know whether you are at the start of a code point, or should go
    > back for at least one byte.


    Exactly.

    To rephrase Thomas's description in terms of bits, bytes in a UTF-8 stream
    look like this:

    0xxxxxxx ASCII
    10xxxxxx trail byte of multibyte character
    110xxxxx start byte of a two-byte character
    1110xxxx start byte of a three-byte character

    A character starts with a byte which does not start with 10. Those are
    pretty easy to spot.

    See also:

    http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html

    And everyone should know about this, highly useful:

    http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder

    tom

    --
    A problem well stated is a problem half solved. -- Charles F. Kettering
    Tom Anderson, Aug 27, 2010
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe Wright
    Replies:
    0
    Views:
    511
    Joe Wright
    Jul 27, 2003
  2. Murali
    Replies:
    2
    Views:
    560
    Jerry Coffin
    Mar 9, 2006
  3. Jimmy
    Replies:
    3
    Views:
    329
    J. J. Farrell
    Sep 9, 2011
  4. Jimmy
    Replies:
    13
    Views:
    651
    Peter Nilsson
    Sep 21, 2011
  5. Cah Sableng
    Replies:
    0
    Views:
    236
    Cah Sableng
    Apr 23, 2007
Loading...

Share This Page