perl fileio

T

toylet

use strict;

my $buffer;
open INFILE, "/etc/textfile";
while <INFILE> {
$buffer=$_;
}
close INFILE;

Is $buffer a null-terminated string? Does it contain the end-of-line or
end-of-file character? Is it the reason to use chop() and chomp() on
$buffer?

Most perl examples use text files. What if one wants to read a binary
file? How to specify the number of bytes to read when there is no known
position of end-of-line character? read it all into memory by
$buffer=<INFILE> and let virtual memory cater for it?
 
T

Tassilo v. Parseval

Also sprach toylet:
use strict;

my $buffer;
open INFILE, "/etc/textfile";
while <INFILE> {
$buffer=$_;
}
close INFILE;

Is $buffer a null-terminated string? Does it contain the end-of-line or
end-of-file character? Is it the reason to use chop() and chomp() on
$buffer?

$buffer is null-terminated inside of perl, yes. But this is hardly ever
your concern. Null-terminated strings are an issue when you are
programming C or so, but not Perl. Observe:

perl -MDevel::peel -e 'Dump("foobar")'
SV = PV(0x814b9ac) at 0x81430a0
REFCNT = 1
FLAGS = (POK,READONLY,pPOK)
PV = 0x8158e40 "foobar"\0
CUR = 6
LEN = 7

The PV slot is a pointer to the underlying C string which needs to have
a terminating null character. CUR is the equivalent of Perl's length()
and LEN is the physical amount of bytes (it is here CUR+1 because of the
null-termination).

When reading through a file linewise, the string will contain the
newline characters. chomp() removes it for you. Or more correctly: it
chops off the value of the special variable $/ when it finds it at the
end of the string.

chop() on the other hand always removes the last character no matter
what it is. chomp() is therefore what you should be using.
Most perl examples use text files. What if one wants to read a binary
file? How to specify the number of bytes to read when there is no known
position of end-of-line character? read it all into memory by
$buffer=<INFILE> and let virtual memory cater for it?

There are several ways. You could use read():

my $bytes_read = read INFILE, my $buffer, 4096;

But you can also use the readline() operator. In this case, set $/ to a
reference to a number:

local $/ = \4096;
while (<INFILE>) {
...
}

That will make perl read the file in chunks of 4096 bytes.

When dealing with binary files, it can never hurt to binmode() the
filehandle before doing any reads or writes on it:

binmode INFILE;

On Windows, this is obligatory or otherwise the IO subsystem will
translate Windows newlines ("\015\012") to the logical newline \n.

Tassilo
 
T

toylet

local $/ = \4096;
while (<INFILE>) {
...
}

In many examples I found from the web, they just $buffer=<INFILE>. What
would happen if INFILE is a big one? Would it hang the server? Would
Perl handle the memory usage properly?
 
T

Tassilo v. Parseval

Also sprach toylet:
In many examples I found from the web, they just $buffer=<INFILE>. What
would happen if INFILE is a big one? Would it hang the server? Would
Perl handle the memory usage properly?

It entirely depends. You have to look at $/ when you see <FILEHANDLE>.

When $/ is set to undef, perl reads the whole file in one go. Naturally,
with very large files this will blow your memory. There is nothing that
perl can do about it.

If however $/ is set to something other than undef (see 'perldoc
perlvar' for all the details about $/), then reading happens chunk-wise.
Each read returns one chunk of data. What the chunk will be like is
determined by $/:

local $/ = "\n"; # this is the default
while (<FILE>) {
# $_ now contains one line of the file
}

You very likely wont run out of memory with the above. However, it could
happen. Namely when the file contains ridiculously long lines. This is
extremely unlikely for text files, but it could happen with binary
files. A binary file that doesn't have the byte-sequence "\012" on
UNIXish systems, "\015\12" on Win32 or "\015" on Macintoshs will
necessarily be slurped as a whole because perl doesn't find any newline
in it.

local $/ = undef; # slurp mode
while (<FILE>) {
# whole file in $_
}
# the while loop here is equivalent to
local $_ = <FILE>;

Here you'll run out of memory when the file is larger than your memory
(minus some overhead).

local $/ = \4096;
while (<FILE>) {
# 4096 bytes of data in $_
}

This is harmless. It will always read 4096 or less (at the end of the
file) bytes.

There are some simple rules you can stick to:

- Read text-files linewise (i.e. don't change the value of $/).

- Only slurp whole files into memory (local $/ = undef) when
your algorithm requires it AND the file is not too large.

- Read binary files with a fixed block size (local $/ = \$SIZE)
or use read() for them.

There are some more tricks. One useful one is setting $/ to the empty
string. This is referred to as paragraph-mode. It treats multiple
consecutive empty lines as one empty line. And thus each read will
return the next paragraph. Needless to say, this only makes sense with
text-files.

Tassilo
 
T

Tad McClellan

In many examples I found from the web, they just $buffer=<INFILE>. What
would happen if INFILE is a big one? Would it hang the server?


What server?

You don't need a server to run Perl programs.
 
T

toylet

There are some more tricks. One useful one is setting $/ to the empty
string. This is referred to as paragraph-mode. It treats multiple
consecutive empty lines as one empty line. And thus each read will
return the next paragraph. Needless to say, this only makes sense with
text-files.

I don't under the paragraph, but there shouldn't be any need for it in
my near future. Knowing the reaplcement for fget() and fgets() is good
enough.
 
T

toylet

In many examples I found from the web said:
What server?
You don't need a server to run Perl programs.

you do. The server could be a localhost, which could be Windows, Linux.
I should have used the word "machine" or "operating system".
 
M

Matt Garrish

toylet said:
you do. The server could be a localhost, which could be Windows, Linux.
I should have used the word "machine" or "operating system".

You don't (even cgi apps in Perl don't require a server for testing,
although it can make life simpler). It's a polite way of telling you you're
getting off-topic. One you never said what server you're using, and two
there's little point in asking in a Perl newsgroup what the limitations of
your server/OS are.

Matt
 
T

toylet

You don't (even cgi apps in Perl don't require a server for testing,
although it can make life simpler). It's a polite way of telling you you're
getting off-topic. One you never said what server you're using, and two
there's little point in asking in a Perl newsgroup what the limitations of
your server/OS are.

That depends on how you use the word server. Anyway, I don't want to
expand thhis sub-thread, pls.

I was worrying about perl's memory management related to
$buffer=<INFILE>, which quite on-topic I beleive.
 
J

James Willmore

you do. The server could be a localhost, which could be Windows, Linux.
I should have used the word "machine" or "operating system".

You're right, you need to use another term :)

Linux != server && Windows != server

Linux == OS && Windows == OS

Think about it :)

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
For some reason, this fortune reminds everyone of Marvin
Zelkowitz.
 
M

Matt Garrish

toylet said:
I rather make it:

Linux = (server|client)
Windows = (server|client)

And that's where you're confusing people. You just threw out the word server
without explaining what you meant. My impression was that you were talking
about IIS or Apache. For future reference, use statements like "my Linux
box, which is running as a file server..." then no one needs to guess what
you're talking about.

Matt
 
U

Uri Guttman

t> I rather make it:

t> Linux = (server|client)
t> Windows = (server|client)

wrong again. stop making up shit.

you can have a box running an OS that is neither a server nor a
client. it could be simply not communicating with any other boxes and so
calling it client or server is dumb. boxes are cpus running an OS. how
they are used is irrelevent to running perl (or any other program) on it.

t> I agree.

then stop calling them client or server.

and just to annoy you, what is a box is both a client AND a server? will
your head explode? best to not use those terms unless you have a proper
context and understand it. perl does not need a server nor client to
run. it needs an OS.

uri
 
J

Joe Smith

toylet said:
That depends on how you use the word server. Anyway, I don't want to
expand thhis sub-thread, pls.

I was worrying about perl's memory management related to
$buffer=<INFILE>, which quite on-topic I beleive.

When you run programs from the C:\> prompt, your PC is not running
in a client/server mode. Here is a command-line program you can
run without a server:

C:\>perl -le "$_=1;while(1){print length;$_.=$_;}"

It will give a hint to the question "what's the biggest string perl can
handle on my computer".
-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top