How to tell what is using memory

N

Nikki R

Hi,

I'm running a Perl script on Linux Red Hat 7.3 (in a company - I can't
upgrade).
My script uses up a lot of memory because it is reading large data files.
The largest file is about 10MB.

The data files were stored by using Data::Dumper, and my script reads each
data file one at a time:

foreach .... {
my $var1;
# Suck in the entire Data::Dumper'd file.
my $filestr = join('', <SOMEFILE>);
# The Data::Dumper file consists of a statement like:
# $var1 = {....} ; # Large hash reference.
eval $filestr;
# That sets $var1 which is a hash reference. Now do something with
$var1.
....
}

As the script runs, memory grows fairly quickly. I don't think have a
memory leak anywhere - I think that Perl allocates new memory whenever
it sees a larger data file than previously.

That's my understanding of the way Perl's garbage collection works, anyway.
In the above loop even though all of the 'my' variables go out of scope,
Perl doesn't normally release any memory back to the operating system until
the script finishes running.

Is there anything I can do about this, e.g. change the way that the data
file is read in? I definitely need to use a hash and I don't have the
choice
of using anything else but Data::Dumper (this is part of a large project
that I cannot change).

Is there also a good way of finding out (via a tracer or profiler of
some sort perhaps) which variables are using how much memory in a Perl
script? I eventually found out the above culprits only by doing a fair
amount of analysis using the Perl debugger and using Unix's "top"
program to see how much memory my perl script was using in total.
 
C

chris-usenet

Nikki R said:
I'm running a Perl script on Linux Red Hat 7.3 (in a company - I can't
upgrade).

I don't see the logical equvalence between working for a company and
not being able to upgrade.
Perl doesn't normally release any memory back to the operating system until
the script finishes running.

That's normal behaviour for any unix-based application.
Is there anything I can do about this, e.g. change the way that the data
file is read in?

Have you read said:
I definitely need to use a hash and I don't have the choice of using
anything else but Data::Dumper (this is part of a large project that
I cannot change).

If memory really is a problem, *and* it's only relevant for the small
section you've described, then you may want to consider forking a
separate process to read in the Data::Dumper format and translate it
into a more efficient format that the rest of your program can use. Of
course, if you're needing to use these large data structures throughout
your program then unless you can make them more space efficient you're
pretty much stuck.

Chris
 
S

Sherm Pendley

I don't see the logical equvalence between working for a company and
not being able to upgrade.

Corporate politics are rarely based in logic. Don't think "My company
can't upgrade." Think more along the lines of "I don't have the personal
authority to force an upgrade through the political morass of our IT
department in a reasonable amount of time."

sherm--
 
B

Brian McCauley

Nikki said:
# Suck in the entire Data::Dumper'd file.
my $filestr = join('', <SOMEFILE>);
# The Data::Dumper file consists of a statement like:
# $var1 = {....} ; # Large hash reference.
eval $filestr;

I've not tested it but I'd expect do() to be possibly more memory
efficient efficient than slurp and eval().

If you do want to slurp the whole file into a string there are more
efficient ways (see FAQ) than sturping it into a list of lines then
joining those lines together.
 
M

mjcarman

Nikki said:
even though all of the 'my' variables go out of scope, Perl doesn't
normally release any memory back to the operating system until the
script finishes running.

Correct. Perl will hold on to the memory in case it needs it again
later. Note that data allocated for my() variables will *not* be reused
elsewhere. It is reserved for the original user (in case it comes back
into scope).
Is there anything I can do about this

A little, maybe. With the restrictions you've imposed you haven't left
much room for us to help you.
e.g. change the way that the data file is read in? I definitely need
to use a hash and I don't have the choice of using anything else but
Data::Dumper (this is part of a large project that I cannot change).

Brian McCauley gave you a suggestion that stays within these
boundaries, but you should consider trying to change this anyway.
Data::Dumper's output is meant to be human readable and eval-able. It
really isn't all that suitable for storing and restoring large data
structures. That's what Storable is for:
* It has very little overhead.
* It creates much smaller data files.
* It's much faster.

If your data file is 10 MB then using Data::Dumper+eval would have *at
least* 10 MB of overhead. (The data structure itself + the string it's
created from.) Storable's overhead is (comparitively) negligable.

If your project is well-architected, the change could be as simple as
changing a few lines of code inside the routines for reading/writing a
data file.

Of course, changing between Data::Dumper and Storable won't make a bit
of difference in how large your data structure is after it's been read
into memory. If that's the real problem you need to try something else
-- maybe tieing the data structure to disk. (This has the side effect
of making it slower to access.)
Is there also a good way of finding out [...] which variables are
using how much memory in a Perl script?

Take a look at Devel::Size. If you can't change the data structure,
though, then that information won't be very helpful either.

-mjc
 
X

xhoster

Hi,

I'm running a Perl script on Linux Red Hat 7.3 (in a company - I can't
upgrade).
My script uses up a lot of memory because it is reading large data files.
The largest file is about 10MB.

And to what size does this 10MB file cause the script to grow? What
size would you find acceptable?

The data files were stored by using Data::Dumper, and my script reads
each data file one at a time:

foreach .... {
my $var1;
# Suck in the entire Data::Dumper'd file.
my $filestr = join('', <SOMEFILE>);

I think this first reads the entire file into an anonymous array or list
(taking one file-size worth of memory plus overhead) and then joins that
list into one string, requiring a second file-size worth of memory.
The array part can be freed as soon as the join is done. If you are going
to go this route, You should slurp it directly into a scalar.

# The Data::Dumper file consists of a statement like:
# $var1 = {....} ; # Large hash reference.
eval $filestr;
# That sets $var1 which is a hash reference. Now do something with
$var1.
....
}

As the script runs, memory grows fairly quickly. I don't think have a
memory leak anywhere - I think that Perl allocates new memory whenever
it sees a larger data file than previously.

Well, then arrange for it process the largest data file first. If your
theory is correct, there should be no (or at least little) futher increase
in memory use as it goes on to tackle the smaller files. Then at least you
will know, one way or the other.

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,775
Messages
2,569,601
Members
45,182
Latest member
BettinaPol

Latest Threads

Top