Find the number of lines in a text file

Discussion in 'Java' started by Chris Brat, Sep 13, 2006.

  1. Chris Brat

    Chris Brat Guest

    Hi,

    I need to find the total number of lines in a text file -so that I can
    skip the header and filler information and process just the body.

    I've done some searching and can't find a class or method that does
    this directly and the solutions I've found require either :

    - reading each line of the file (using the BufferedReader) and
    incrementing a line counter for each line, or
    - using the LineNumberReader directly and using its result from its
    getLineNumber() method once the entire file is read, or
    - searching for the eol characters and counting these.
    - using the the RandomAccessFile, seeking to the end of the file and
    dividing the total number of bytes by the number of bytes expected in
    the line (I believe this relies on guarantee that each line will have
    the same number of characters).

    I dont like the idea of counting eol characters or having to read the
    entire file twice (once to get the number of line numbers and the
    second time to do my actual processing).


    Does anyone have another or better solution?

    Thanks,
    Chris
     
    Chris Brat, Sep 13, 2006
    #1
    1. Advertising

  2. Hi,

    Chris Brat wrote:
    > Hi,
    >
    > I need to find the total number of lines in a text file -so that I can
    > skip the header and filler information and process just the body.
    >
    > I've done some searching and can't find a class or method that does
    > this directly and the solutions I've found require either :
    >
    > - reading each line of the file (using the BufferedReader) and
    > incrementing a line counter for each line, or
    > - using the LineNumberReader directly and using its result from its
    > getLineNumber() method once the entire file is read, or


    Note that this is the same. (IIRC LineNumberReader internally does
    exactly the same as your first suggestion)

    > - searching for the eol characters and counting these.


    Note that this is also nearly the same - especially in 'runtime-complexity'.

    > - using the the RandomAccessFile, seeking to the end of the file and
    > dividing the total number of bytes by the number of bytes expected in
    > the line (I believe this relies on guarantee that each line will have
    > the same number of characters).


    Of course this relies on this guarantee! Is it really guaranteed? If
    yes, this is the best idea. BTW: AFAIK you do not need a RAF for that -
    IIRC, java.io.File has the methode you need (getLength() or sth like
    that) as well.

    > I dont like the idea of counting eol characters or having to read the
    > entire file twice (once to get the number of line numbers and the
    > second time to do my actual processing).
    >
    > Does anyone have another or better solution?


    Depends on what *exactly* you want to do, and on the format of the file,
    and what you mean with "skip the header and filler information and
    process just the body" - isn't it possible to do everything by reading
    the file only once? (Do "headers" and "fillers" have a certain prefix? ...?)

    Ciao,
    Ingo
     
    Ingo R. Homann, Sep 13, 2006
    #2
    1. Advertising

  3. Chris Brat wrote:
    ....
    > I dont like the idea of counting eol characters or having to read the
    > entire file twice (once to get the number of line numbers and the
    > second time to do my actual processing).

    ...
    > Does anyone have another or better solution?


    'Use a file-system that counts them for you'?

    (Which is my way of saying. Other tools that provide
    a line count are doing something like "counting the EOL's"
    internally - even if they might imply otherwise and obscure
    the details.)

    Note that if you *know* that further processing is required
    on the file(s), it probably makes more sense to read the
    lines into an array on the first pass.

    (And a LineNumberReader or similar might be the best
    way to sort those EOL's)

    Andrew T.
     
    Andrew Thompson, Sep 13, 2006
    #3
  4. Chris Brat

    Chris Brat Guest

    Hi Ingo,

    I effectively want to skip an known number of lines in a file
    (immediately at the beginning of the file) and immediately at the end
    of a file. The header is not a problem to skip but the footer is.

    Sorry, I actually meant 'footer' and not 'filller'

    The scenario is :The user defines that the first 6 lines and the last 3
    lines of the file are in an unknown format and must be ignored - this
    means that they must not be checked for validity and processed.

    Regards,
    Chris


    Ingo R. Homann wrote:
    > Hi,
    >
    > Chris Brat wrote:
    > > Hi,
    > >
    > > I need to find the total number of lines in a text file -so that I can
    > > skip the header and filler information and process just the body.
    > >
    > > I've done some searching and can't find a class or method that does
    > > this directly and the solutions I've found require either :
    > >
    > > - reading each line of the file (using the BufferedReader) and
    > > incrementing a line counter for each line, or
    > > - using the LineNumberReader directly and using its result from its
    > > getLineNumber() method once the entire file is read, or

    >
    > Note that this is the same. (IIRC LineNumberReader internally does
    > exactly the same as your first suggestion)
    >
    > > - searching for the eol characters and counting these.

    >
    > Note that this is also nearly the same - especially in 'runtime-complexity'.
    >
    > > - using the the RandomAccessFile, seeking to the end of the file and
    > > dividing the total number of bytes by the number of bytes expected in
    > > the line (I believe this relies on guarantee that each line will have
    > > the same number of characters).

    >
    > Of course this relies on this guarantee! Is it really guaranteed? If
    > yes, this is the best idea. BTW: AFAIK you do not need a RAF for that -
    > IIRC, java.io.File has the methode you need (getLength() or sth like
    > that) as well.
    >
    > > I dont like the idea of counting eol characters or having to read the
    > > entire file twice (once to get the number of line numbers and the
    > > second time to do my actual processing).
    > >
    > > Does anyone have another or better solution?

    >
    > Depends on what *exactly* you want to do, and on the format of the file,
    > and what you mean with "skip the header and filler information and
    > process just the body" - isn't it possible to do everything by reading
    > the file only once? (Do "headers" and "fillers" have a certain prefix? ...?)
    >
    > Ciao,
    > Ingo
     
    Chris Brat, Sep 13, 2006
    #4
  5. Hi,

    Chris Brat wrote:
    > The scenario is :The user defines that the first 6 lines and the last 3
    > lines of the file are in an unknown format and must be ignored - this
    > means that they must not be checked for validity and processed.


    I think, the simplest idea would be to buffer 3 lines...

    Ciao,
    Ingo
     
    Ingo R. Homann, Sep 13, 2006
    #5
  6. Chris Brat

    Chris Brat Guest

    Hi Andrew,

    > 'Use a file-system that counts them for you'?

    Unfortunately I do not maintain the environment and it is very possible
    that the OS and everything associated with it may change in the future
    without my knowlege.

    > Note that if you *know* that further processing is required
    > on the file(s), it probably makes more sense to read the
    > lines into an array on the first pass.

    True - do you think this is a good idea with a file of 30 000+ lines
    though?
    I dont think the memory expense is worth the few extra seconds.

    To be honest I was hoping that someone knew of a OSS lib (like commons
    IO) or a method I didn't know of in the java.io package that already
    did this.

    Thanks for the input though.

    Regards,
    Chris
     
    Chris Brat, Sep 13, 2006
    #6
  7. Hi,

    Chris Brat wrote:
    > To be honest I was hoping that someone knew of a OSS lib (like commons
    > IO) or a method I didn't know of in the java.io package that already
    > did this.


    Well if the filesystem/os does not cache this information, how should a
    library get the information without reading the whole file? I would have
    to be 'magic'!

    Ciao,
    Ingo
     
    Ingo R. Homann, Sep 13, 2006
    #7
  8. Chris Brat

    bugbear Guest

    Chris Brat wrote:
    > Hi Ingo,
    >
    > I effectively want to skip an known number of lines in a file
    > (immediately at the beginning of the file) and immediately at the end
    > of a file. The header is not a problem to skip but the footer is.
    >
    > Sorry, I actually meant 'footer' and not 'filller'
    >
    > The scenario is :The user defines that the first 6 lines and the last 3
    > lines of the file are in an unknown format and must be ignored - this
    > means that they must not be checked for validity and processed.


    In that case you certainly don't need to count the total
    number of lines in the file.

    Simply count the first 6 lines moving forward,
    then lseek to the end, and count the last 3 lines backwards.

    RandomAccessFile may be useful to you.

    You now have the offsets withing the file that define
    your "valid zone".

    Either work with these, or create
    a IO decorator that presents the subset of the
    file as stream/reader object.

    BugBear
     
    bugbear, Sep 13, 2006
    #8
  9. Chris Brat

    Simon Guest

    Chris Brat wrote:
    > I effectively want to skip an known number of lines in a file
    > (immediately at the beginning of the file) and immediately at the end
    > of a file. The header is not a problem to skip but the footer is.
    >
    > Sorry, I actually meant 'footer' and not 'filller'
    >
    > The scenario is :The user defines that the first 6 lines and the last 3
    > lines of the file are in an unknown format and must be ignored - this
    > means that they must not be checked for validity and processed.


    Use a queue, e.g. a List<String>:

    1. Skip the header
    2. Create a queue containing the lines.
    3. Read 3 lines into the queue
    4. As long as there are more lines
    - read one line and append it to the queue.
    - take the first line out of the queue and process it
    5. Throw away the 3 remaining lines in the queue.

    Cheers,
    Simon
     
    Simon, Sep 13, 2006
    #9
  10. Chris Brat

    Chris Brat Guest

    That's brilliant !!

    Thanks Simon.

    Simon wrote:
    > Chris Brat wrote:
    > > I effectively want to skip an known number of lines in a file
    > > (immediately at the beginning of the file) and immediately at the end
    > > of a file. The header is not a problem to skip but the footer is.
    > >
    > > Sorry, I actually meant 'footer' and not 'filller'
    > >
    > > The scenario is :The user defines that the first 6 lines and the last 3
    > > lines of the file are in an unknown format and must be ignored - this
    > > means that they must not be checked for validity and processed.

    >
    > Use a queue, e.g. a List<String>:
    >
    > 1. Skip the header
    > 2. Create a queue containing the lines.
    > 3. Read 3 lines into the queue
    > 4. As long as there are more lines
    > - read one line and append it to the queue.
    > - take the first line out of the queue and process it
    > 5. Throw away the 3 remaining lines in the queue.
    >
    > Cheers,
    > Simon
     
    Chris Brat, Sep 13, 2006
    #10
  11. Chris Brat wrote:
    ....
    > > Note that if you *know* that further processing is required
    > > on the file(s), it probably makes more sense to read the
    > > lines into an array on the first pass.

    > True - do you think this is a good idea with a file of 30 000+ lines
    > though?


    I would need to run some tests (as I suggest
    you do, since I do not 'need to know'*)

    * For this current environment, in which I have no need to
    parse text files of such length.

    > I dont think the memory expense is worth the few extra seconds.


    The results might surprise you (they might not,
    as well). In situations as fundamental is this,
    it pays to do a quick test, though.

    OTOH - Ingo raised some interesting points re. the
    file format. There might be some significant 'cheating'
    you can do if the files are of 'fixed line length'.

    Andrew T.
     
    Andrew Thompson, Sep 13, 2006
    #11
  12. Chris Brat

    EJP Guest

    File file = ...;
    LineNumberReader lnr = new LineNumberReader(new FileReader(file));
    lnr.skip(file.length()-1);
    int lines = lnr.getLineNumber();
     
    EJP, Sep 13, 2006
    #12
  13. Hi,

    EJP wrote:
    > File file = ...;
    > LineNumberReader lnr = new LineNumberReader(new FileReader(file));
    > lnr.skip(file.length()-1);
    > int lines = lnr.getLineNumber();


    Bad idea - that's exactly what Chris wanted to avoid! (Or what do you
    think this code does internally?)

    Ciao,
    Ingo
     
    Ingo R. Homann, Sep 13, 2006
    #13
  14. Chris Brat

    Chris Brat Guest

    Hi EJP,

    Thanks - very interesting idea.

    Will give it a try.
    Chris


    EJP wrote:
    > File file = ...;
    > LineNumberReader lnr = new LineNumberReader(new FileReader(file));
    > lnr.skip(file.length()-1);
    > int lines = lnr.getLineNumber();
     
    Chris Brat, Sep 13, 2006
    #14
  15. Chris Brat

    EJP Guest

    Ingo R. Homann wrote:
    > Bad idea - that's exactly what Chris wanted to avoid! (Or what do you
    > think this code does internally?)


    I don't think he *can* avod it actually, and thanks, I know exactly what
    the code does internally too.
     
    EJP, Sep 14, 2006
    #15
  16. Hi EJP,

    EJP wrote:
    > Ingo R. Homann wrote:
    >
    >> Bad idea - that's exactly what Chris wanted to avoid! (Or what do you
    >> think this code does internally?)

    >
    > I don't think he *can* avod it actually, and thanks, I know exactly what
    > the code does internally too.


    Then, I think it would be a good idea to tell this to the OP, because I
    imagine that he does not know exactly what the code does internally.

    I think he might find it a very interesting idea and will give it a try
    just to find out that it is a bad idea and that it is exactly what he
    wanted to avoid. ;-)

    Ciao,
    Ingo
     
    Ingo R. Homann, Sep 14, 2006
    #16
  17. Chris Brat

    Chris Brat Guest

    Ingo,

    I was asking for other possibly better solutions because none of those
    that I found myself seemed like the best way to do it.

    Please dont make comments on my behalf - I appreciate any suggestions
    by contributors.

    EJP, I tested your solution and it gives a 300ms performance
    improvement on a 40 Mb file.

    Regards,
    Chris

    Ingo R. Homann wrote:
    > Hi EJP,
    >
    > EJP wrote:
    > > Ingo R. Homann wrote:
    > >
    > >> Bad idea - that's exactly what Chris wanted to avoid! (Or what do you
    > >> think this code does internally?)

    > >
    > > I don't think he *can* avod it actually, and thanks, I know exactly what
    > > the code does internally too.

    >
    > Then, I think it would be a good idea to tell this to the OP, because I
    > imagine that he does not know exactly what the code does internally.
    >
    > I think he might find it a very interesting idea and will give it a try
    > just to find out that it is a bad idea and that it is exactly what he
    > wanted to avoid. ;-)
    >
    > Ciao,
    > Ingo
     
    Chris Brat, Sep 14, 2006
    #17
  18. Hi,

    Chris Brat wrote:
    > I was asking for other possibly better solutions...
    >
    > EJP, I tested your solution and it gives [no real] performance
    > improvement on a 40 Mb file.


    Well, internally, it does *exactly* the same what you wanted to avoid.
    Without asking someone and without testing anything, just with thinking
    a bit about the problem, I can tell you:

    THERE IS NO POSSIBILITY TO GET THE NUMER OF LINES IN A FILE WITHOUT
    READING THE WHOLE FILE.

    Sorry for shouting, but that is a fact.

    However, your (*other*) problem (reading a file only once, but skipping
    the last three lines) can be solved otherwise, as Simon and me mentioned.

    Ciao,
    Ingo
     
    Ingo R. Homann, Sep 14, 2006
    #18
  19. Hi Ingo ;)

    Ingo R. Homann schrieb:
    >> EJP, I tested your solution and it gives [no real] performance
    >> improvement on a 40 Mb file.

    >
    > Well, internally, it does *exactly* the same what you wanted to avoid.
    > Without asking someone and without testing anything, just with thinking
    > a bit about the problem, I can tell you:
    >
    > THERE IS NO POSSIBILITY TO GET THE NUMER OF LINES IN A FILE WITHOUT
    > READING THE WHOLE FILE.
    >
    > Sorry for shouting, but that is a fact.


    No reason to shout. It just happened what you've already predicted:

    <quote>
    I think he might find it a very interesting idea and will give it a try
    just to find out that it is a bad idea and that it is exactly what he
    wanted to avoid. ;-)
    </quote>

    LOL
    Michael
     
    Michael Rauscher, Sep 14, 2006
    #19
  20. "Ingo R. Homann" <> writes:

    > THERE IS NO POSSIBILITY TO GET THE NUMER OF LINES IN A FILE WITHOUT
    > READING THE WHOLE FILE.


    Exception: If it is known the file has a set line (record) size in
    bytes, and the line separator is known, then the number of lines =
    file.size()/(recordSize+separatorSize)
     
    Tor Iver Wilhelmsen, Sep 14, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mullin
    Replies:
    4
    Views:
    5,489
    Harald
    Jun 12, 2005
  2. Joe Wright
    Replies:
    0
    Views:
    526
    Joe Wright
    Jul 27, 2003
  3. Murali
    Replies:
    2
    Views:
    575
    Jerry Coffin
    Mar 9, 2006
  4. Andrés Suárez
    Replies:
    1
    Views:
    88
    Andrés Suárez
    Jul 16, 2008
  5. Cah Sableng
    Replies:
    0
    Views:
    241
    Cah Sableng
    Apr 23, 2007
Loading...

Share This Page