Line count, the best strategy?

Discussion in 'Perl Misc' started by MSG, Jan 26, 2006.

  1. MSG

    MSG Guest

    Suppose I had a big big text file and I needed to count the number
    of lines. I have two questions about it:

    (A). Perldoc FAQ suggests counting "\n" like this:
    while (sysread FILE, $buffer, 4096) {
    $lines += ($buffer =~ tr/\n//);
    but in Perl Cookbook, the buffer becomes 2*20 (1Mb).
    I am sure the authors didn't choose those numbers completely
    arbitrarily, so the question is: how did they come up with the
    different numbers, or how should a programmer go about choosing
    the "right" number?

    (B). How does the above method compare to the following code,
    which simply uses the automatic line number?
    (also from Perl Cookbook)
    1 while ( <FH> );
    print $. ;
    MSG, Jan 26, 2006
    #1
    1. Advertising

  2. MSG

    Jimbo Guest

    Suppose the "big big text" file was 800MB and contained only one
    \n--and it was the last character(s) of the file. Method (A) would
    work, method (B) would choke on so much data.

    If you know--ahead of time--what line lengths to expect and they are
    reasonable (given modern RAM sizes), then use (B). If you *really*
    want to be robust, use (A).

    As for how the numbers were arrived at... 4096 was the typical size of
    "disk buffers"--years ago. Modern drives have 4MB or 8MB buffers, so
    you can up 4096 to match your drive. (This is the amount of bytes that
    the drive will read at one time. Even if you wanted to read just one
    byte, the drive will actually populate its entire buffer.)
    Jimbo, Jan 26, 2006
    #2
    1. Advertising

  3. MSG

    Dr.Ruud Guest

    [OT] quoting (was: Re: Line count, the best strategy?)

    Jimbo schreef:
    > Suppose


    For Jimbo, and everybody else with "User-Agent: G2/#.#":

    "How can I automatically quote the previous message
    when I post a reply?"
    http://groups.google.co.uk/support/bin/answer.py?answer=14213

    See also:
    http://www.safalra.com/special/googlegroupsreply/


    What's good 'netiquette' when posting to Usenet?
    http://groups.google.co.uk/support/bin/answer.py?answer=12348
    http://directory.google.com/Top/Computers/Usenet/Etiquette/

    But Google needs you to vote for 'Default quoting of
    previous message in replies'
    http://groups-beta.google.com/support/bin/request.py?contact_type=features


    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Jan 26, 2006
    #3
  4. MSG

    Brad Baxter Guest

    MSG wrote:
    > Suppose I had a big big text file and I needed to count the number
    > of lines. I have two questions about it:
    >
    > (A). Perldoc FAQ suggests counting "\n" like this:
    > while (sysread FILE, $buffer, 4096) {
    > $lines += ($buffer =~ tr/\n//);
    > but in Perl Cookbook, the buffer becomes 2*20 (1Mb).
    > I am sure the authors didn't choose those numbers completely
    > arbitrarily, so the question is: how did they come up with the
    > different numbers, or how should a programmer go about choosing
    > the "right" number?
    >
    > (B). How does the above method compare to the following code,
    > which simply uses the automatic line number?
    > (also from Perl Cookbook)
    > 1 while ( <FH> );
    > print $. ;


    Well, B doesn't explicitly manage a buffer.

    FWIW:

    - perl -wlpe '}{*_=*.}{' file
    (http://perl.abigail.nl/Talks/Japhs/)

    - http://search.cpan.org/~cwest/ppt-0.14/bin/wc

    --
    Brad
    Brad Baxter, Jan 26, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marty McDonald

    Best strategy for all errors?

    Marty McDonald, Feb 4, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    716
    Steven Cheng[MSFT]
    Feb 7, 2004
  2. Lord0
    Replies:
    7
    Views:
    12,820
  3. Petr Jakes
    Replies:
    2
    Views:
    321
    Rene Pijlman
    Jun 1, 2006
  4. Henry
    Replies:
    2
    Views:
    285
    David K. Wall
    Dec 18, 2003
  5. Replies:
    3
    Views:
    141
    Frank Seitz
    May 9, 2009
Loading...

Share This Page