4DOM eating all my memory

Discussion in 'Python' started by ewan, Feb 1, 2004.

  1. ewan

    ewan Guest

    hello all -

    I'm looping over a set of urls pulled from a database, fetching the
    corresponding webpage, and building a DOM tree for it using
    xml.dom.ext.reader.HtmlLib (then trying to match titles in a web library
    catalogue). all the trees seem to be kept in memory,

    however, when I get through fifty or so iterations the program has used
    about half my memory and slowed the system to a crawl.

    tried turning on all gc debugging flags. they produce lots of output, but it
    all says 'collectable' - sounds fine to me.

    I even tried doing gc.collect() at the end of every iteration. nothing.
    everything seems to be being collected. so why does each iteration increase
    the memory usage by several megabytes?

    below is some code (and by the way, do I have those 'global's in the right
    places?)

    any suggestions would be appreciated immeasurably...
    ewan



    import MySQLdb

    ....

    cursor = db.cursor()
    result = cursor.execute("""SELECT CALLNO, TITLE FROM %s""" % table)
    rows = cursor.fetchall()
    cursor.close()

    for row in rows:
    current_callno = row[0]
    title = row[1]
    url = construct_url(title)
    cf = callno_finder()
    cf.find(title.decode('latin-1'), url)
    ...

    (meanwhile, in another file)
    ....

    class callno_finder:
    def __init__(self):
    global root
    root = None

    def find(self, title, uri):
    global root

    reader = HtmlLib.Reader()
    root = reader.fromUri(uri)

    # find what we're looking for
    ...
     
    ewan, Feb 1, 2004
    #1
    1. Advertising

  2. ewan

    John J. Lee Guest

    ewan <> writes:

    > I'm looping over a set of urls pulled from a database, fetching the
    > corresponding webpage, and building a DOM tree for it using
    > xml.dom.ext.reader.HtmlLib (then trying to match titles in a web library
    > catalogue).


    Hmm, if this is open-source and it's more than a quick hack, let me
    know when you have it working, I maintain a page on open-source stuff
    of this nature (bibliographic and cataloguing).


    > all the trees seem to be kept in memory,
    >
    > however, when I get through fifty or so iterations the program has used
    > about half my memory and slowed the system to a crawl.
    >
    > tried turning on all gc debugging flags. they produce lots of output, but it
    > all says 'collectable' - sounds fine to me.


    I've never had to resort to this... does it tell you what types /
    classes are involved? IIRC, there was some code posted to python-dev
    to give hints about this (though I guess that was mostly/always for
    debugging leaks at the C level).


    > I even tried doing gc.collect() at the end of every iteration. nothing.
    > everything seems to be being collected. so why does each iteration increase
    > the memory usage by several megabytes?
    >
    > below is some code (and by the way, do I have those 'global's in the right
    > places?)


    Yes, they're in the right places. Not sure a global is really needed,
    though...


    > any suggestions would be appreciated immeasurably...

    [...]
    > def find(self, title, uri):
    > global root
    >
    > reader = HtmlLib.Reader()
    > root = reader.fromUri(uri)
    >
    > # find what we're looking for
    > ...


    + reader.releaseNode(root)

    ?


    John
     
    John J. Lee, Feb 2, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. frankabel
    Replies:
    4
    Views:
    405
  2. Per B. Sederberg
    Replies:
    5
    Views:
    348
    Robert Kern
    Jan 22, 2007
  3. eating memory

    , Mar 15, 2007, in forum: C++
    Replies:
    6
    Views:
    391
    benben
    Mar 16, 2007
  4. venom00

    w3wp.exe (ASP .Net) eating memory

    venom00, Nov 17, 2009, in forum: ASP .Net
    Replies:
    3
    Views:
    5,373
    venom00
    Nov 18, 2009
  5. gencode

    javascript animation eating all memory

    gencode, Aug 28, 2006, in forum: Javascript
    Replies:
    1
    Views:
    103
    gencode
    Aug 28, 2006
Loading...

Share This Page