Also sprach toylet:
In many examples I found from the web, they just $buffer=<INFILE>. What
would happen if INFILE is a big one? Would it hang the server? Would
Perl handle the memory usage properly?
It entirely depends. You have to look at $/ when you see <FILEHANDLE>.
When $/ is set to undef, perl reads the whole file in one go. Naturally,
with very large files this will blow your memory. There is nothing that
perl can do about it.
If however $/ is set to something other than undef (see 'perldoc
perlvar' for all the details about $/), then reading happens chunk-wise.
Each read returns one chunk of data. What the chunk will be like is
determined by $/:
local $/ = "\n"; # this is the default
while (<FILE>) {
# $_ now contains one line of the file
}
You very likely wont run out of memory with the above. However, it could
happen. Namely when the file contains ridiculously long lines. This is
extremely unlikely for text files, but it could happen with binary
files. A binary file that doesn't have the byte-sequence "\012" on
UNIXish systems, "\015\12" on Win32 or "\015" on Macintoshs will
necessarily be slurped as a whole because perl doesn't find any newline
in it.
local $/ = undef; # slurp mode
while (<FILE>) {
# whole file in $_
}
# the while loop here is equivalent to
local $_ = <FILE>;
Here you'll run out of memory when the file is larger than your memory
(minus some overhead).
local $/ = \4096;
while (<FILE>) {
# 4096 bytes of data in $_
}
This is harmless. It will always read 4096 or less (at the end of the
file) bytes.
There are some simple rules you can stick to:
- Read text-files linewise (i.e. don't change the value of $/).
- Only slurp whole files into memory (local $/ = undef) when
your algorithm requires it AND the file is not too large.
- Read binary files with a fixed block size (local $/ = \$SIZE)
or use read() for them.
There are some more tricks. One useful one is setting $/ to the empty
string. This is referred to as paragraph-mode. It treats multiple
consecutive empty lines as one empty line. And thus each read will
return the next paragraph. Needless to say, this only makes sense with
text-files.
Tassilo