memory management

T

Ted Byers

Activestate's perl 5.10.0 on WXP.

I have recently found a couple of my scripts failing with out of
memory error messages, notably with XML::Twig.

This makes no sense since the files being processed are only of the
order of a few dozen megabytes to a maximum of 100MB, and the system
has 4 GB RAM. The machine is not especially heavily loaded (e.g.,
most of the time, when these scripts fail, they have executed over
night with nothing else running except, of course, the OS - WXP).

Curiously, I have yet to find anything useful in the Activestate
documentation for (Active)Perl.5.10.0 regarding memory management. Is
there anything, or any package, that I can use to tell me what is
going awry and how to fix it? I didn't see any likely candidates
using PPM and CPAN. It would be nice if I could have my script tell
me how much memory it is using, and for which data structures. Or
must I remain effectively blind and just split the task into smaller
tasks until it runs to completion on each?

Thanks

Ted
 
A

A. Sinan Unur

Activestate's perl 5.10.0 on WXP.

I have recently found a couple of my scripts failing with out of
memory error messages, notably with XML::Twig.

This makes no sense since the files being processed are only of the
order of a few dozen megabytes to a maximum of 100MB, and the system
has 4 GB RAM. The machine is not especially heavily loaded (e.g.,
most of the time, when these scripts fail, they have executed over
night with nothing else running except, of course, the OS - WXP).

This seems to be a FAQ:

http://xmltwig.com/xmltwig/XML-Twig-FAQ.html#Q12

http://xmltwig.com/xmltwig/XML-Twig-FAQ.html#Q21

http://tomacorp.com/perl/xml/saxvstwig.html

Reports memory usage of 12M for a 614K input file.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
S

sln

Activestate's perl 5.10.0 on WXP.

I have recently found a couple of my scripts failing with out of
memory error messages, notably with XML::Twig.

This makes no sense since the files being processed are only of the
order of a few dozen megabytes to a maximum of 100MB, and the system
has 4 GB RAM. The machine is not especially heavily loaded (e.g.,
most of the time, when these scripts fail, they have executed over
night with nothing else running except, of course, the OS - WXP).

Curiously, I have yet to find anything useful in the Activestate
documentation for (Active)Perl.5.10.0 regarding memory management. Is
there anything, or any package, that I can use to tell me what is
going awry and how to fix it? I didn't see any likely candidates
using PPM and CPAN. It would be nice if I could have my script tell
me how much memory it is using, and for which data structures. Or
must I remain effectively blind and just split the task into smaller
tasks until it runs to completion on each?

Thanks

Ted

You can check data structure sizes with some Devil:: packages.

use Devel::Size qw( total_size );
# build an array or create objects.. then
print total_size(_reference_), "\n";

Twig does its own special memory management. Mostly it builds
node tree's in memory, but it might have hybrid qualities as well.
This adds tremendous memory overhead, probably on the order of 10-50 to
1, depending on what your doing.

Another consideration is what your doing in the code. Are you making
temporaries all over the place?

By and large, 100MB's of raw data will translate into a possible Gig or
more with all the overhead.

sln
 
T

Ted Byers

This seems to be a FAQ:

http://xmltwig.com/xmltwig/XML-Twig-FAQ.html#Q12

http://xmltwig.com/xmltwig/XML-Twig-FAQ.html#Q21

http://tomacorp.com/perl/xml/saxvstwig.html

Reports memory usage of 12M for a 614K input file.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:http://www.rehabitation.com/clpmisc/

Ah, OK. I hadn't thought it specific to Twig since I had seen issues
with memory in other scripts using LWP. I thought maybe Perl, or
Active State's distribution of it, might have some issues, because
each of the scripts that encountered trouble was handling only a few
MB, and ran perfectly when working with contrived data of only a few
hundred K.

Thanks, I'll take a look there too.
 
T

Ted Byers

You can check data structure sizes with some Devil:: packages.

use Devel::Size qw( total_size );
# build an array or create objects.. then
print total_size(_reference_), "\n";

Twig does its own special memory management. Mostly it builds
node tree's in memory, but it might have hybrid qualities as well.
This adds tremendous memory overhead, probably on the order of 10-50 to
1, depending on what your doing.

Another consideration is what your doing in the code. Are you making
temporaries all over the place?

By and large, 100MB's of raw data will translate into a possible Gig or
more with all the overhead.

sln

Thanks.

Actually, the script giving the most trouble is just using Twig to
parse an XML file and write the data to flat, tab delimited files to
be used to bulk load the data into our DB (but that is done using a
SQL script passed to a command line client in a separate process).

Usually, when this script is executed, there is about half of the 4 GB
of physical memory free, so even with the numbers you give, we ought
to have plenty of memory available. In fact, I have yet to see
anything less than 1.5 GB free memory even when I am working my system
hard (the bottle neck is usually HDD IO, regardless of the language
I'm using).

Thanks again,

Ted
 
S

sln

Thanks.

Actually, the script giving the most trouble is just using Twig to
parse an XML file and write the data to flat, tab delimited files to
be used to bulk load the data into our DB (but that is done using a
SQL script passed to a command line client in a separate process).

Usually, when this script is executed, there is about half of the 4 GB
of physical memory free, so even with the numbers you give, we ought
to have plenty of memory available. In fact, I have yet to see
anything less than 1.5 GB free memory even when I am working my system
hard (the bottle neck is usually HDD IO, regardless of the language
I'm using).

Thanks again,

Ted

Be careful when you say Twig and Parse in the same sentence.
Although I think Twig does its on parsing on some level, it can
use other Parsers if directed. The unique thing about Twig is its
ability to do its own parsing. How it does that I don't know.
What it means is it has the ability to introduce tools outside of
mainstream SAX parsers. How it does that is unknown to me, I'm not
really interested. This results in the ability to do stream as well as
bufferred processing, culminating in a node tree, possible illusionary
object in the hybrid sense. But the node-tree is the result. There are
performance issues, it can also search, like XPath, and replace, then
rewrite xml. This is no small feat.

I am in the process of doing similar tools, but mine captures, does
SAX, does search and replace with regular expressions and some other stuff.
I can tell you its fairly complicated. The reward though is just phenominal.
I manage memory differently. And I do other things than Twig.

Perhaps you could post a skeleton structure of what it is your doing
and I could run it through my routines.

You could however do this all yourself with a fast SAX parser.
The fastest Parser on the planet is Expat, not the Perl interface to it,
which is 6 times slower, but using C/C++.
Unfortunately, all it does is parse, its really a tremendously impaired work,
lacking any tools whatsoever.

sln
 
T

Ted Byers

Be careful when you say Twig and Parse in the same sentence.
Although I think Twig does its on parsing on some level, it can
use other Parsers if directed. The unique thing about Twig is its
ability to do its own parsing. How it does that I don't know.
What it means is it has the ability to introduce tools outside of
mainstream SAX parsers. How it does that is unknown to me, I'm not
really interested. This results in the ability to do stream as well as
bufferred processing, culminating in a node tree, possible illusionary
object in the hybrid sense. But the node-tree is the result. There are
performance issues, it can also search, like XPath, and replace, then
rewrite xml. This is no small feat.

I am in the process of doing similar tools, but mine captures, does
SAX, does search and replace with regular expressions and some other stuff.
I can tell you its fairly complicated. The reward though is just phenominal.
I manage memory differently. And I do other things than Twig.

Perhaps you could post a skeleton structure of what it is your doing
and I could run it through my routines.

You could however do this all yourself with a fast SAX parser.
The fastest Parser on the planet is Expat, not the Perl interface to it,
which is 6 times slower, but using C/C++.
Unfortunately, all it does is parse, its really a tremendously impaired work,
lacking any tools whatsoever.

sln

OK, I'll work up a skeleton after dinner (once I'm not on the clock).
Basically, I get a data feed, in well formed XML, and I need to get
that data into our DB. This feed consists of over 100 XML files,
ranging from less than 1 kb to several dozen MB. Since I have no
direct connection between the feed and the DB (which lacks the ability
to import XML data), I resorted to reading the XML and writing tab
delimited files, which the DB can bulk load in a flash (it is PDQ with
this bulk load).

Maybe it is blasphemy here, but C++ is one of my favourite programing
languages.

I respect guys like you and your efforts with XML. You're strong in
an area where I am challenged. Once of the things I always hated
doing was writing code to parse and validate input. My forte is in
making numeric algorithms fast (hence my preference for fortran and C+
+). I believe you when you say it is complicated, and would be very
interested in hearing about the rewards you describe as phenomenal.
Maybe I'll develop a taste for it? ;-)

Anyway, this relates to one of the things I find frustrating in modern
application development is that I can define a suite of interrelated
data structures (picture a properly normalized database with dozens
tables). The frustration is that I have to waste time repeating this,
in SQL to set up the tables, in classes in (pick one of C++, Java,
Perl, your favourite OO language) for use in business logic, and then
again in the user interface. And of course, XML can be added to the
mix, for communicating between layers (back end, business layer, GUI,
&c.). The data and relationships in it remain the same and it is
quite tedious to duplicate it in so many languages used in the
different layers.

Thanks

Ted
 
A

A. Sinan Unur

The frustration is that I have to waste time repeating this,
in SQL to set up the tables, in classes in (pick one of C++, Java,
Perl, your favourite OO language) for use in business logic, and then
again in the user interface. And of course, XML can be added to the
mix, for communicating between layers

http://www.google.com/search?&q=site:thedailywtf.com+xml

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
P

Peter J. Holzer

You may have significantly less memory available per process.


Yup. Each string in perl has a quite noticable overhead. Now add hashes
or arrays to build a tree structure, and each element in the XML files
may consume a few hundred bytes ...

(I haven't actually measured this for XML::Twig - just a general
observation)

Actually, the script giving the most trouble is just using Twig to
parse an XML file and write the data to flat, tab delimited files to

The nice thing about Twig is that you can flush each subtree from memory
once you are done with it. For converting an XML file into a tab
delimited file I suspect that you only need to keep a small portion of
the tree in memory and can flush frequently - are you doing this?

If you need to keep some information from previously seen subtrees, keep
this information in a separate data structure so that you can flush
these subtrees.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top