memory management

Discussion in 'Perl Misc' started by Ted Byers, Dec 22, 2008.

  1. Ted Byers

    Ted Byers Guest

    Activestate's perl 5.10.0 on WXP.

    I have recently found a couple of my scripts failing with out of
    memory error messages, notably with XML::Twig.

    This makes no sense since the files being processed are only of the
    order of a few dozen megabytes to a maximum of 100MB, and the system
    has 4 GB RAM. The machine is not especially heavily loaded (e.g.,
    most of the time, when these scripts fail, they have executed over
    night with nothing else running except, of course, the OS - WXP).

    Curiously, I have yet to find anything useful in the Activestate
    documentation for (Active)Perl.5.10.0 regarding memory management. Is
    there anything, or any package, that I can use to tell me what is
    going awry and how to fix it? I didn't see any likely candidates
    using PPM and CPAN. It would be nice if I could have my script tell
    me how much memory it is using, and for which data structures. Or
    must I remain effectively blind and just split the task into smaller
    tasks until it runs to completion on each?

    Thanks

    Ted
     
    Ted Byers, Dec 22, 2008
    #1
    1. Advertising

  2. Ted Byers <> wrote in news:e58a033c-c05c-4dd4-85a4-
    :

    > Activestate's perl 5.10.0 on WXP.
    >
    > I have recently found a couple of my scripts failing with out of
    > memory error messages, notably with XML::Twig.
    >
    > This makes no sense since the files being processed are only of the
    > order of a few dozen megabytes to a maximum of 100MB, and the system
    > has 4 GB RAM. The machine is not especially heavily loaded (e.g.,
    > most of the time, when these scripts fail, they have executed over
    > night with nothing else running except, of course, the OS - WXP).


    This seems to be a FAQ:

    http://xmltwig.com/xmltwig/XML-Twig-FAQ.html#Q12

    http://xmltwig.com/xmltwig/XML-Twig-FAQ.html#Q21

    http://tomacorp.com/perl/xml/saxvstwig.html

    Reports memory usage of 12M for a 614K input file.

    Sinan

    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://www.rehabitation.com/clpmisc/
     
    A. Sinan Unur, Dec 22, 2008
    #2
    1. Advertising

  3. Ted Byers

    Guest

    On Mon, 22 Dec 2008 10:05:01 -0800 (PST), Ted Byers <> wrote:

    >Activestate's perl 5.10.0 on WXP.
    >
    >I have recently found a couple of my scripts failing with out of
    >memory error messages, notably with XML::Twig.
    >
    >This makes no sense since the files being processed are only of the
    >order of a few dozen megabytes to a maximum of 100MB, and the system
    >has 4 GB RAM. The machine is not especially heavily loaded (e.g.,
    >most of the time, when these scripts fail, they have executed over
    >night with nothing else running except, of course, the OS - WXP).
    >
    >Curiously, I have yet to find anything useful in the Activestate
    >documentation for (Active)Perl.5.10.0 regarding memory management. Is
    >there anything, or any package, that I can use to tell me what is
    >going awry and how to fix it? I didn't see any likely candidates
    >using PPM and CPAN. It would be nice if I could have my script tell
    >me how much memory it is using, and for which data structures. Or
    >must I remain effectively blind and just split the task into smaller
    >tasks until it runs to completion on each?
    >
    >Thanks
    >
    >Ted


    You can check data structure sizes with some Devil:: packages.

    use Devel::Size qw( total_size );
    # build an array or create objects.. then
    print total_size(_reference_), "\n";

    Twig does its own special memory management. Mostly it builds
    node tree's in memory, but it might have hybrid qualities as well.
    This adds tremendous memory overhead, probably on the order of 10-50 to
    1, depending on what your doing.

    Another consideration is what your doing in the code. Are you making
    temporaries all over the place?

    By and large, 100MB's of raw data will translate into a possible Gig or
    more with all the overhead.

    sln
     
    , Dec 22, 2008
    #3
  4. <> wrote:


    > You can check data structure sizes with some Devil:: packages.



    But those only work on October 31st...


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Dec 22, 2008
    #4
  5. Ted Byers

    Guest

    On Mon, 22 Dec 2008 13:24:39 -0600, Tad J McClellan <> wrote:

    > <> wrote:
    >
    >
    >> You can check data structure sizes with some Devil:: packages.

    >
    >
    >But those only work on October 31st...


    Oh, maybe just 1 then. I'm not a Devil fan so dunno.

    sln
     
    , Dec 22, 2008
    #5
  6. Ted Byers

    Ted Byers Guest

    On Dec 22, 1:42 pm, "A. Sinan Unur" <> wrote:
    > Ted Byers <> wrote in news:e58a033c-c05c-4dd4-85a4-
    > :
    >
    > > Activestate's perl 5.10.0 on WXP.

    >
    > > I have recently found a couple of my scripts failing with out of
    > > memory error messages, notably with XML::Twig.

    >
    > > This makes no sense since the files being processed are only of the
    > > order of a few dozen megabytes to a maximum of 100MB, and the system
    > > has 4 GB RAM.  The machine is not especially heavily loaded (e.g.,
    > > most of the time, when these scripts fail, they have executed over
    > > night with nothing else running except, of course, the OS - WXP).

    >
    > This seems to be a FAQ:
    >
    > http://xmltwig.com/xmltwig/XML-Twig-FAQ.html#Q12
    >
    > http://xmltwig.com/xmltwig/XML-Twig-FAQ.html#Q21
    >
    > http://tomacorp.com/perl/xml/saxvstwig.html
    >
    > Reports memory usage of 12M for a 614K input file.
    >
    > Sinan
    >
    > --
    > A. Sinan Unur <>
    > (remove .invalid and reverse each component for email address)
    >
    > comp.lang.perl.misc guidelines on the WWW:http://www.rehabitation.com/clpmisc/


    Ah, OK. I hadn't thought it specific to Twig since I had seen issues
    with memory in other scripts using LWP. I thought maybe Perl, or
    Active State's distribution of it, might have some issues, because
    each of the scripts that encountered trouble was handling only a few
    MB, and ran perfectly when working with contrived data of only a few
    hundred K.

    Thanks, I'll take a look there too.
     
    Ted Byers, Dec 22, 2008
    #6
  7. Ted Byers

    Ted Byers Guest

    On Dec 22, 1:53 pm, wrote:
    > On Mon, 22 Dec 2008 10:05:01 -0800 (PST), Ted Byers <> wrote:
    > >Activestate's perl 5.10.0 on WXP.

    >
    > >I have recently found a couple of my scripts failing with out of
    > >memory error messages, notably with XML::Twig.

    >
    > >This makes no sense since the files being processed are only of the
    > >order of a few dozen megabytes to a maximum of 100MB, and the system
    > >has 4 GB RAM.  The machine is not especially heavily loaded (e.g.,
    > >most of the time, when these scripts fail, they have executed over
    > >night with nothing else running except, of course, the OS - WXP).

    >
    > >Curiously, I have yet to find anything useful in the Activestate
    > >documentation for (Active)Perl.5.10.0 regarding memory management.  Is
    > >there anything, or any package, that I can use to tell me what is
    > >going awry and how to fix it?  I didn't see any likely candidates
    > >using PPM and CPAN.  It would be nice if I could have my script tell
    > >me how much memory it is using, and for which data structures.  Or
    > >must I remain effectively blind and just split the task into smaller
    > >tasks until it runs to completion on each?

    >
    > >Thanks

    >
    > >Ted

    >
    > You can check data structure sizes with some Devil:: packages.
    >
    > use Devel::Size qw( total_size );
    > # build an array or create objects.. then
    > print total_size(_reference_), "\n";
    >
    > Twig does its own special memory management. Mostly it builds
    > node tree's in memory, but it might have hybrid qualities as well.
    > This adds tremendous memory overhead, probably on the order of 10-50 to
    > 1, depending on what your doing.
    >
    > Another consideration is what your doing in the code. Are you making
    > temporaries all over the place?
    >
    > By and large, 100MB's of raw data will translate into a possible Gig or
    > more with all the overhead.
    >
    > sln


    Thanks.

    Actually, the script giving the most trouble is just using Twig to
    parse an XML file and write the data to flat, tab delimited files to
    be used to bulk load the data into our DB (but that is done using a
    SQL script passed to a command line client in a separate process).

    Usually, when this script is executed, there is about half of the 4 GB
    of physical memory free, so even with the numbers you give, we ought
    to have plenty of memory available. In fact, I have yet to see
    anything less than 1.5 GB free memory even when I am working my system
    hard (the bottle neck is usually HDD IO, regardless of the language
    I'm using).

    Thanks again,

    Ted
     
    Ted Byers, Dec 22, 2008
    #7
  8. Ted Byers

    Guest

    On Mon, 22 Dec 2008 12:39:01 -0800 (PST), Ted Byers <> wrote:

    >On Dec 22, 1:53 pm, wrote:
    >> On Mon, 22 Dec 2008 10:05:01 -0800 (PST), Ted Byers <> wrote:
    >> >Activestate's perl 5.10.0 on WXP.

    >>
    >> >I have recently found a couple of my scripts failing with out of
    >> >memory error messages, notably with XML::Twig.

    >>
    >> >This makes no sense since the files being processed are only of the
    >> >order of a few dozen megabytes to a maximum of 100MB, and the system
    >> >has 4 GB RAM.  The machine is not especially heavily loaded (e.g.,
    >> >most of the time, when these scripts fail, they have executed over
    >> >night with nothing else running except, of course, the OS - WXP).

    >>
    >> >Curiously, I have yet to find anything useful in the Activestate
    >> >documentation for (Active)Perl.5.10.0 regarding memory management.  Is
    >> >there anything, or any package, that I can use to tell me what is
    >> >going awry and how to fix it?  I didn't see any likely candidates
    >> >using PPM and CPAN.  It would be nice if I could have my script tell
    >> >me how much memory it is using, and for which data structures.  Or
    >> >must I remain effectively blind and just split the task into smaller
    >> >tasks until it runs to completion on each?

    >>
    >> >Thanks

    >>
    >> >Ted

    >>
    >> You can check data structure sizes with some Devil:: packages.
    >>
    >> use Devel::Size qw( total_size );
    >> # build an array or create objects.. then
    >> print total_size(_reference_), "\n";
    >>
    >> Twig does its own special memory management. Mostly it builds
    >> node tree's in memory, but it might have hybrid qualities as well.
    >> This adds tremendous memory overhead, probably on the order of 10-50 to
    >> 1, depending on what your doing.
    >>
    >> Another consideration is what your doing in the code. Are you making
    >> temporaries all over the place?
    >>
    >> By and large, 100MB's of raw data will translate into a possible Gig or
    >> more with all the overhead.
    >>
    >> sln

    >
    >Thanks.
    >
    >Actually, the script giving the most trouble is just using Twig to
    >parse an XML file and write the data to flat, tab delimited files to
    >be used to bulk load the data into our DB (but that is done using a
    >SQL script passed to a command line client in a separate process).
    >
    >Usually, when this script is executed, there is about half of the 4 GB
    >of physical memory free, so even with the numbers you give, we ought
    >to have plenty of memory available. In fact, I have yet to see
    >anything less than 1.5 GB free memory even when I am working my system
    >hard (the bottle neck is usually HDD IO, regardless of the language
    >I'm using).
    >
    >Thanks again,
    >
    >Ted


    Be careful when you say Twig and Parse in the same sentence.
    Although I think Twig does its on parsing on some level, it can
    use other Parsers if directed. The unique thing about Twig is its
    ability to do its own parsing. How it does that I don't know.
    What it means is it has the ability to introduce tools outside of
    mainstream SAX parsers. How it does that is unknown to me, I'm not
    really interested. This results in the ability to do stream as well as
    bufferred processing, culminating in a node tree, possible illusionary
    object in the hybrid sense. But the node-tree is the result. There are
    performance issues, it can also search, like XPath, and replace, then
    rewrite xml. This is no small feat.

    I am in the process of doing similar tools, but mine captures, does
    SAX, does search and replace with regular expressions and some other stuff.
    I can tell you its fairly complicated. The reward though is just phenominal.
    I manage memory differently. And I do other things than Twig.

    Perhaps you could post a skeleton structure of what it is your doing
    and I could run it through my routines.

    You could however do this all yourself with a fast SAX parser.
    The fastest Parser on the planet is Expat, not the Perl interface to it,
    which is 6 times slower, but using C/C++.
    Unfortunately, all it does is parse, its really a tremendously impaired work,
    lacking any tools whatsoever.

    sln
     
    , Dec 22, 2008
    #8
  9. Ted Byers

    Ted Byers Guest

    On Dec 22, 4:12 pm, wrote:
    > On Mon, 22 Dec 2008 12:39:01 -0800 (PST), Ted Byers <> wrote:
    > >On Dec 22, 1:53 pm, wrote:
    > >> On Mon, 22 Dec 2008 10:05:01 -0800 (PST), Ted Byers <r.ted.by...@gmail..com> wrote:
    > >> >Activestate's perl 5.10.0 on WXP.

    >
    > >> >I have recently found a couple of my scripts failing with out of
    > >> >memory error messages, notably with XML::Twig.

    >
    > >> >This makes no sense since the files being processed are only of the
    > >> >order of a few dozen megabytes to a maximum of 100MB, and the system
    > >> >has 4 GB RAM.  The machine is not especially heavily loaded (e.g.,
    > >> >most of the time, when these scripts fail, they have executed over
    > >> >night with nothing else running except, of course, the OS - WXP).

    >
    > >> >Curiously, I have yet to find anything useful in the Activestate
    > >> >documentation for (Active)Perl.5.10.0 regarding memory management.  Is
    > >> >there anything, or any package, that I can use to tell me what is
    > >> >going awry and how to fix it?  I didn't see any likely candidates
    > >> >using PPM and CPAN.  It would be nice if I could have my script tell
    > >> >me how much memory it is using, and for which data structures.  Or
    > >> >must I remain effectively blind and just split the task into smaller
    > >> >tasks until it runs to completion on each?

    >
    > >> >Thanks

    >
    > >> >Ted

    >
    > >> You can check data structure sizes with some Devil:: packages.

    >
    > >> use Devel::Size qw( total_size );
    > >> # build an array or create objects.. then
    > >> print total_size(_reference_), "\n";

    >
    > >> Twig does its own special memory management. Mostly it builds
    > >> node tree's in memory, but it might have hybrid qualities as well.
    > >> This adds tremendous memory overhead, probably on the order of 10-50 to
    > >> 1, depending on what your doing.

    >
    > >> Another consideration is what your doing in the code. Are you making
    > >> temporaries all over the place?

    >
    > >> By and large, 100MB's of raw data will translate into a possible Gig or
    > >> more with all the overhead.

    >
    > >> sln

    >
    > >Thanks.

    >
    > >Actually, the script giving the most trouble is just using Twig to
    > >parse an XML file and write the data to flat, tab delimited files to
    > >be used to bulk load the data into our DB (but that is done using a
    > >SQL script passed to a command line client in a separate process).

    >
    > >Usually, when this script is executed, there is about half of the 4 GB
    > >of physical memory free, so even with the numbers you give, we ought
    > >to have plenty of memory available.  In fact, I have yet to see
    > >anything less than 1.5 GB free memory even when I am working my system
    > >hard (the bottle neck is usually HDD IO, regardless of the language
    > >I'm using).

    >
    > >Thanks again,

    >
    > >Ted

    >
    > Be careful when you say Twig and Parse in the same sentence.
    > Although I think Twig does its on parsing on some level, it can
    > use other Parsers if directed. The unique thing about Twig is its
    > ability to do its own parsing. How it does that I don't know.
    > What it means is it has the ability to introduce tools outside of
    > mainstream SAX parsers. How it does that is unknown to me, I'm not
    > really interested. This results in the ability to do stream as well as
    > bufferred processing, culminating in a node tree, possible illusionary
    > object in the hybrid sense. But the node-tree is the result. There are
    > performance issues, it can also search, like XPath, and replace, then
    > rewrite xml. This is no small feat.
    >
    > I am in the process of doing similar tools, but mine captures, does
    > SAX, does search and replace with regular expressions and some other stuff.
    > I can tell you its fairly complicated. The reward though is just phenominal.
    > I manage memory differently. And I do other things than Twig.
    >
    > Perhaps you could post a skeleton structure of what it is your doing
    > and I could run it through my routines.
    >
    > You could however do this all yourself with a fast SAX parser.
    > The fastest Parser on the planet is Expat, not the Perl interface to it,
    > which is 6 times slower, but using C/C++.
    > Unfortunately, all it does is parse, its really a tremendously impaired work,
    > lacking any tools whatsoever.
    >
    > sln


    OK, I'll work up a skeleton after dinner (once I'm not on the clock).
    Basically, I get a data feed, in well formed XML, and I need to get
    that data into our DB. This feed consists of over 100 XML files,
    ranging from less than 1 kb to several dozen MB. Since I have no
    direct connection between the feed and the DB (which lacks the ability
    to import XML data), I resorted to reading the XML and writing tab
    delimited files, which the DB can bulk load in a flash (it is PDQ with
    this bulk load).

    Maybe it is blasphemy here, but C++ is one of my favourite programing
    languages.

    I respect guys like you and your efforts with XML. You're strong in
    an area where I am challenged. Once of the things I always hated
    doing was writing code to parse and validate input. My forte is in
    making numeric algorithms fast (hence my preference for fortran and C+
    +). I believe you when you say it is complicated, and would be very
    interested in hearing about the rewards you describe as phenomenal.
    Maybe I'll develop a taste for it? ;-)

    Anyway, this relates to one of the things I find frustrating in modern
    application development is that I can define a suite of interrelated
    data structures (picture a properly normalized database with dozens
    tables). The frustration is that I have to waste time repeating this,
    in SQL to set up the tables, in classes in (pick one of C++, Java,
    Perl, your favourite OO language) for use in business logic, and then
    again in the user interface. And of course, XML can be added to the
    mix, for communicating between layers (back end, business layer, GUI,
    &c.). The data and relationships in it remain the same and it is
    quite tedious to duplicate it in so many languages used in the
    different layers.

    Thanks

    Ted
     
    Ted Byers, Dec 22, 2008
    #9
  10. Ted Byers <> wrote in
    news::

    > The frustration is that I have to waste time repeating this,
    > in SQL to set up the tables, in classes in (pick one of C++, Java,
    > Perl, your favourite OO language) for use in business logic, and then
    > again in the user interface. And of course, XML can be added to the
    > mix, for communicating between layers


    http://www.google.com/search?&q=site:thedailywtf.com xml

    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://www.rehabitation.com/clpmisc/
     
    A. Sinan Unur, Dec 22, 2008
    #10
  11. On 2008-12-22 20:39, Ted Byers <> wrote:
    > On Dec 22, 1:53 pm, wrote:
    >> On Mon, 22 Dec 2008 10:05:01 -0800 (PST), Ted Byers <> wrote:
    >> >Activestate's perl 5.10.0 on WXP.

    >>
    >> >I have recently found a couple of my scripts failing with out of
    >> >memory error messages, notably with XML::Twig.

    >>
    >> >This makes no sense since the files being processed are only of the
    >> >order of a few dozen megabytes to a maximum of 100MB, and the system
    >> >has 4 GB RAM.


    You may have significantly less memory available per process.


    >> By and large, 100MB's of raw data will translate into a possible Gig or
    >> more with all the overhead.


    Yup. Each string in perl has a quite noticable overhead. Now add hashes
    or arrays to build a tree structure, and each element in the XML files
    may consume a few hundred bytes ...

    (I haven't actually measured this for XML::Twig - just a general
    observation)


    > Actually, the script giving the most trouble is just using Twig to
    > parse an XML file and write the data to flat, tab delimited files to


    The nice thing about Twig is that you can flush each subtree from memory
    once you are done with it. For converting an XML file into a tab
    delimited file I suspect that you only need to keep a small portion of
    the tree in memory and can flush frequently - are you doing this?

    If you need to keep some information from previously seen subtrees, keep
    this information in a separate data structure so that you can flush
    these subtrees.

    hp
     
    Peter J. Holzer, Dec 23, 2008
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. penguinista

    Embeded perl & memory management

    penguinista, Jun 30, 2004, in forum: Perl
    Replies:
    0
    Views:
    621
    penguinista
    Jun 30, 2004
  2. Welman Jordan

    memory management problem during debugging

    Welman Jordan, Jan 5, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    326
    S. Justin Gengo
    Jan 5, 2004
  3. Floris van Haaster

    Project management / bug management

    Floris van Haaster, Sep 23, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    1,272
    Jon Paal
    Sep 23, 2005
  4. pouet
    Replies:
    2
    Views:
    814
    Will Hartung
    Jul 30, 2004
  5. Matt Oefinger
    Replies:
    0
    Views:
    230
    Matt Oefinger
    Jun 25, 2003
Loading...

Share This Page