Line count, the best strategy?

M

MSG

Suppose I had a big big text file and I needed to count the number
of lines. I have two questions about it:

(A). Perldoc FAQ suggests counting "\n" like this:
while (sysread FILE, $buffer, 4096) {
$lines += ($buffer =~ tr/\n//);
but in Perl Cookbook, the buffer becomes 2*20 (1Mb).
I am sure the authors didn't choose those numbers completely
arbitrarily, so the question is: how did they come up with the
different numbers, or how should a programmer go about choosing
the "right" number?

(B). How does the above method compare to the following code,
which simply uses the automatic line number?
(also from Perl Cookbook)
1 while ( <FH> );
print $. ;
 
J

Jimbo

Suppose the "big big text" file was 800MB and contained only one
\n--and it was the last character(s) of the file. Method (A) would
work, method (B) would choke on so much data.

If you know--ahead of time--what line lengths to expect and they are
reasonable (given modern RAM sizes), then use (B). If you *really*
want to be robust, use (A).

As for how the numbers were arrived at... 4096 was the typical size of
"disk buffers"--years ago. Modern drives have 4MB or 8MB buffers, so
you can up 4096 to match your drive. (This is the amount of bytes that
the drive will read at one time. Even if you wanted to read just one
byte, the drive will actually populate its entire buffer.)
 
D

Dr.Ruud

Jimbo schreef:

For Jimbo, and everybody else with "User-Agent: G2/#.#":

"How can I automatically quote the previous message
when I post a reply?"
http://groups.google.co.uk/support/bin/answer.py?answer=14213

See also:
http://www.safalra.com/special/googlegroupsreply/


What's good 'netiquette' when posting to Usenet?
http://groups.google.co.uk/support/bin/answer.py?answer=12348
http://directory.google.com/Top/Computers/Usenet/Etiquette/

But Google needs you to vote for 'Default quoting of
previous message in replies'
http://groups-beta.google.com/support/bin/request.py?contact_type=features
 
B

Brad Baxter

MSG said:
Suppose I had a big big text file and I needed to count the number
of lines. I have two questions about it:

(A). Perldoc FAQ suggests counting "\n" like this:
while (sysread FILE, $buffer, 4096) {
$lines += ($buffer =~ tr/\n//);
but in Perl Cookbook, the buffer becomes 2*20 (1Mb).
I am sure the authors didn't choose those numbers completely
arbitrarily, so the question is: how did they come up with the
different numbers, or how should a programmer go about choosing
the "right" number?

(B). How does the above method compare to the following code,
which simply uses the automatic line number?
(also from Perl Cookbook)
1 while ( <FH> );
print $. ;

Well, B doesn't explicitly manage a buffer.

FWIW:

- perl -wlpe '}{*_=*.}{' file
(http://perl.abigail.nl/Talks/Japhs/)

- http://search.cpan.org/~cwest/ppt-0.14/bin/wc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top