Line count, the best strategy?

MSG · Jan 26, 2006

Suppose I had a big big text file and I needed to count the number
of lines. I have two questions about it:

(A). Perldoc FAQ suggests counting "\n" like this:
while (sysread FILE, $buffer, 4096) {
$lines += ($buffer =~ tr/\n//);
but in Perl Cookbook, the buffer becomes 2*20 (1Mb).
I am sure the authors didn't choose those numbers completely
arbitrarily, so the question is: how did they come up with the
different numbers, or how should a programmer go about choosing
the "right" number?

(B). How does the above method compare to the following code,
which simply uses the automatic line number?
(also from Perl Cookbook)
1 while ( <FH> );
print $. ;

Jimbo · Jan 26, 2006

Suppose the "big big text" file was 800MB and contained only one
\n--and it was the last character(s) of the file. Method (A) would
work, method (B) would choke on so much data.

If you know--ahead of time--what line lengths to expect and they are
reasonable (given modern RAM sizes), then use (B). If you *really*
want to be robust, use (A).

As for how the numbers were arrived at... 4096 was the typical size of
"disk buffers"--years ago. Modern drives have 4MB or 8MB buffers, so
you can up 4096 to match your drive. (This is the amount of bytes that
the drive will read at one time. Even if you wanted to read just one
byte, the drive will actually populate its entire buffer.)

Dr.Ruud · Jan 26, 2006

Jimbo schreef:

Suppose

For Jimbo, and everybody else with "User-Agent: G2/#.#":

"How can I automatically quote the previous message
when I post a reply?"
http://groups.google.co.uk/support/bin/answer.py?answer=14213

See also:
http://www.safalra.com/special/googlegroupsreply/

What's good 'netiquette' when posting to Usenet?
http://groups.google.co.uk/support/bin/answer.py?answer=12348
http://directory.google.com/Top/Computers/Usenet/Etiquette/

But Google needs you to vote for 'Default quoting of
previous message in replies'
http://groups-beta.google.com/support/bin/request.py?contact_type=features

Brad Baxter · Jan 26, 2006

MSG said:
Suppose I had a big big text file and I needed to count the number
of lines. I have two questions about it:

(A). Perldoc FAQ suggests counting "\n" like this:
while (sysread FILE, $buffer, 4096) {
$lines += ($buffer =~ tr/\n//);
but in Perl Cookbook, the buffer becomes 2*20 (1Mb).
I am sure the authors didn't choose those numbers completely
arbitrarily, so the question is: how did they come up with the
different numbers, or how should a programmer go about choosing
the "right" number?

(B). How does the above method compare to the following code,
which simply uses the automatic line number?
(also from Perl Cookbook)
1 while ( <FH> );
print $. ;

Well, B doesn't explicitly manage a buffer.

FWIW:

- perl -wlpe '}{*_=*.}{' file
(http://perl.abigail.nl/Talks/Japhs/)

- http://search.cpan.org/~cwest/ppt-0.14/bin/wc

FAQ 5.3 How do I count the number of lines in a file?	0	Jan 31, 2011
How to keep count of right answer and wrong answers in C++?	0	Nov 3, 2021
strategy for parsing text file	11	Aug 28, 2009
FAQ 4.29 How can I count the number of occurrences of a substring within a string?	0	Jan 4, 2011
html-->text, keep line breaks, best strategy is?	2	Dec 17, 2003
count the number of element in an array that are greater than somevalues?	3	Jun 19, 2010
FAQ 5.2 How do I change, delete, or insert a line in a file, or append to the beginning of a file?	0	Feb 24, 2011
make a program that count lines in a text	37	Aug 17, 2010

Line count, the best strategy?

MSG

Jimbo

Dr.Ruud

Brad Baxter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads