How to tell what is using memory

Nikki R · Dec 17, 2004

Hi,

I'm running a Perl script on Linux Red Hat 7.3 (in a company - I can't
upgrade).
My script uses up a lot of memory because it is reading large data files.
The largest file is about 10MB.

The data files were stored by using Data:

umper, and my script reads each
data file one at a time:

foreach .... {
my $var1;
# Suck in the entire Data:

umper'd file.
my $filestr = join('', <SOMEFILE>);
# The Data:

umper file consists of a statement like:
# $var1 = {....} ; # Large hash reference.
eval $filestr;
# That sets $var1 which is a hash reference. Now do something with
$var1.
....
}

As the script runs, memory grows fairly quickly. I don't think have a
memory leak anywhere - I think that Perl allocates new memory whenever
it sees a larger data file than previously.

That's my understanding of the way Perl's garbage collection works, anyway.
In the above loop even though all of the 'my' variables go out of scope,
Perl doesn't normally release any memory back to the operating system until
the script finishes running.

Is there anything I can do about this, e.g. change the way that the data
file is read in? I definitely need to use a hash and I don't have the
choice
of using anything else but Data:

umper (this is part of a large project
that I cannot change).

Is there also a good way of finding out (via a tracer or profiler of
some sort perhaps) which variables are using how much memory in a Perl
script? I eventually found out the above culprits only by doing a fair
amount of analysis using the Perl debugger and using Unix's "top"
program to see how much memory my perl script was using in total.

chris-usenet · Dec 17, 2004

Nikki R said:
I'm running a Perl script on Linux Red Hat 7.3 (in a company - I can't
upgrade).

I don't see the logical equvalence between working for a company and
not being able to upgrade.

Perl doesn't normally release any memory back to the operating system until
the script finishes running.

That's normal behaviour for any unix-based application.

Is there anything I can do about this, e.g. change the way that the data
file is read in?

Have you read said:
I definitely need to use a hash and I don't have the choice of using
anything else but Data:umper (this is part of a large project that
I cannot change).

If memory really is a problem, *and* it's only relevant for the small
section you've described, then you may want to consider forking a
separate process to read in the Data:

umper format and translate it
into a more efficient format that the rest of your program can use. Of
course, if you're needing to use these large data structures throughout
your program then unless you can make them more space efficient you're
pretty much stuck.

Chris

Sherm Pendley · Dec 17, 2004

I don't see the logical equvalence between working for a company and
not being able to upgrade.

Corporate politics are rarely based in logic. Don't think "My company
can't upgrade." Think more along the lines of "I don't have the personal
authority to force an upgrade through the political morass of our IT
department in a reasonable amount of time."

sherm--

Brian McCauley · Dec 17, 2004

Nikki said:
# Suck in the entire Data:umper'd file.
my $filestr = join('', <SOMEFILE>);
# The Data:umper file consists of a statement like:
# $var1 = {....} ; # Large hash reference.
eval $filestr;

I've not tested it but I'd expect do() to be possibly more memory
efficient efficient than slurp and eval().

If you do want to slurp the whole file into a string there are more
efficient ways (see FAQ) than sturping it into a list of lines then
joining those lines together.

mjcarman · Dec 17, 2004

Nikki said:
even though all of the 'my' variables go out of scope, Perl doesn't
normally release any memory back to the operating system until the
script finishes running.

Correct. Perl will hold on to the memory in case it needs it again
later. Note that data allocated for my() variables will *not* be reused
elsewhere. It is reserved for the original user (in case it comes back
into scope).

Is there anything I can do about this

A little, maybe. With the restrictions you've imposed you haven't left
much room for us to help you.

e.g. change the way that the data file is read in? I definitely need
to use a hash and I don't have the choice of using anything else but
Data:umper (this is part of a large project that I cannot change).

Brian McCauley gave you a suggestion that stays within these
boundaries, but you should consider trying to change this anyway.
Data:

umper's output is meant to be human readable and eval-able. It
really isn't all that suitable for storing and restoring large data
structures. That's what Storable is for:
* It has very little overhead.
* It creates much smaller data files.
* It's much faster.

If your data file is 10 MB then using Data:

umper+eval would have *at
least* 10 MB of overhead. (The data structure itself + the string it's
created from.) Storable's overhead is (comparitively) negligable.

If your project is well-architected, the change could be as simple as
changing a few lines of code inside the routines for reading/writing a
data file.

Of course, changing between Data:

umper and Storable won't make a bit
of difference in how large your data structure is after it's been read
into memory. If that's the real problem you need to try something else
-- maybe tieing the data structure to disk. (This has the side effect
of making it slower to access.)

Is there also a good way of finding out [...] which variables are
using how much memory in a Perl script?

Take a look at Devel::Size. If you can't change the data structure,
though, then that information won't be very helpful either.

-mjc

xhoster · Dec 20, 2004

Hi,

I'm running a Perl script on Linux Red Hat 7.3 (in a company - I can't
upgrade).
My script uses up a lot of memory because it is reading large data files.
The largest file is about 10MB.

And to what size does this 10MB file cause the script to grow? What
size would you find acceptable?

The data files were stored by using Data:umper, and my script reads
each data file one at a time:

foreach .... {
my $var1;
# Suck in the entire Data:umper'd file.
my $filestr = join('', <SOMEFILE>);

I think this first reads the entire file into an anonymous array or list
(taking one file-size worth of memory plus overhead) and then joins that
list into one string, requiring a second file-size worth of memory.
The array part can be freed as soon as the join is done. If you are going
to go this route, You should slurp it directly into a scalar.

# The Data:umper file consists of a statement like:
# $var1 = {....} ; # Large hash reference.
eval $filestr;
# That sets $var1 which is a hash reference. Now do something with
$var1.
....
}

As the script runs, memory grows fairly quickly. I don't think have a
memory leak anywhere - I think that Perl allocates new memory whenever
it sees a larger data file than previously.

Well, then arrange for it process the largest data file first. If your
theory is correct, there should be no (or at least little) futher increase
in memory use as it goes on to tackle the smaller files. Then at least you
will know, one way or the other.

Xho

How to print function with the code in the function body?	3	Dec 5, 2012
What is AI programming to us non-bigtech programmers?	4	Jun 1, 2023
What is `transaction`event	0	Apr 9, 2022
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023
Any way to tell if a scalar is a string?	21	Dec 9, 2009
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
How to add dropdown selected data to table using jquery	2	Jul 2, 2022
I am not sure what to do :(	0	Jun 6, 2023

How to tell what is using memory

Nikki R

chris-usenet

Sherm Pendley

Brian McCauley

mjcarman

xhoster

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads