Re: python resource management

Discussion in 'Python' started by Philip Semanchuk, Jan 19, 2009.

  1. On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:

    > Hi all,
    >
    > I am running a python script which parses nearly 22,000 html files
    > locally
    > stored using BeautifulSoup.
    > The problem is the memory usage linearly increases as the files are
    > being
    > parsed.
    > When the script has crossed parsing 200 files or so, it consumes all
    > the
    > available RAM and The CPU usage comes down to 0% (may be due to
    > excessive
    > paging).
    >
    > We tried 'del soup_object' and used 'gc.collect()'. But, no
    > improvement.
    >
    > Please guide me how to limit python's memory-usage or proper method
    > for
    > handling BeautifulSoup object in resource effective manner


    You need to figure out where the memory is disappearing. Try
    commenting out parts of your script. For instance, maybe start with a
    minimalist script: open and close the files but don't process them.
    See if the memory usage continues to be a problem. Then add elements
    back in, making your minimalist script more and more like the real
    one. If the extreme memory usage problem is isolated to one component
    or section, you'll find it this way.

    HTH
    Philip
    Philip Semanchuk, Jan 19, 2009
    #1
    1. Advertising

  2. Philip Semanchuk

    Tim Arnold Guest

    "Philip Semanchuk" <> wrote in message
    news:...
    >
    > On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:
    >
    >> Hi all,
    >>
    >> I am running a python script which parses nearly 22,000 html files
    >> locally
    >> stored using BeautifulSoup.
    >> The problem is the memory usage linearly increases as the files are
    >> being
    >> parsed.
    >> When the script has crossed parsing 200 files or so, it consumes all the
    >> available RAM and The CPU usage comes down to 0% (may be due to
    >> excessive
    >> paging).
    >>
    >> We tried 'del soup_object' and used 'gc.collect()'. But, no
    >> improvement.
    >>
    >> Please guide me how to limit python's memory-usage or proper method for
    >> handling BeautifulSoup object in resource effective manner

    >
    > You need to figure out where the memory is disappearing. Try commenting
    > out parts of your script. For instance, maybe start with a minimalist
    > script: open and close the files but don't process them. See if the
    > memory usage continues to be a problem. Then add elements back in, making
    > your minimalist script more and more like the real one. If the extreme
    > memory usage problem is isolated to one component or section, you'll find
    > it this way.
    >
    > HTH
    > Philip


    Also, are you creating a separate soup object for each file or reusing one
    object over and over?
    --Tim
    Tim Arnold, Jan 19, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dirc Khan-Evans
    Replies:
    1
    Views:
    886
    Karl Seguin
    Oct 17, 2005
  2. avishosh
    Replies:
    2
    Views:
    10,532
    avishosh
    Aug 8, 2004
  3. Stefan Ram

    Two macros for resource management

    Stefan Ram, Jul 14, 2004, in forum: C Programming
    Replies:
    9
    Views:
    315
    Stefan Ram
    Aug 1, 2004
  4. Mark Rafn
    Replies:
    1
    Views:
    683
    Daniel Pitts
    Aug 29, 2007
  5. Heinrich Moser
    Replies:
    1
    Views:
    451
    Heinrich Moser
    Mar 27, 2008
Loading...

Share This Page