how not to run out of memory in cursor.execute

Discussion in 'Python' started by johnlichtenstein@gmail.com, Jun 4, 2006.

  1. Guest

    I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
    and I find that the cursor.execute method uses a lot of memory that
    never gets garbage collected. Using fetchmany instead of fetchall does
    not seem to make any difference, since it's the execute that uses
    memory. Breaking the query down to build lots of small tables doesn't
    help, since execute doesn't give its memory back, after reading enough
    small tables execute returns a memory error. What is the trick to get
    memory back from execute in cx_Oracle and MySQLdb?
    , Jun 4, 2006
    #1
    1. Advertising

  2. amberite Guest

    wrote:
    > I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
    > and I find that the cursor.execute method uses a lot of memory that
    > never gets garbage collected. Using fetchmany instead of fetchall does
    > not seem to make any difference, since it's the execute that uses
    > memory. Breaking the query down to build lots of small tables doesn't
    > help, since execute doesn't give its memory back, after reading enough
    > small tables execute returns a memory error. What is the trick to get
    > memory back from execute in cx_Oracle and MySQLdb?


    cx_Oracle and MySQLdb must be handled differently, due to the fact that
    MySQL does not actually have cursors (MySQLdb fakes them for you).

    To handle large resultsets efficiently in cx_Oracle simply use the
    cursor iteration idiom:

    for row in cursor:
    # do stuff with the row

    cx_Oracle takes care of the fetching for you, and your memory usage
    should remain fairly constant when using this idiom.

    To handle large resultsets in MySQLdb, you have to resort to multiple
    queries:

    l = 1000
    o = 0

    cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
    rows = cursor.fetchall()
    while len(rows) > 0:
    # process the fetched rows
    o += l
    cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
    rows = cursor.fetchall()

    cursor.close()

    As you can see, the MySQLdb version is more involved, due to the lack
    of real cursor support in the MySQL database. Any database with good
    cursor support will likely have good cursor iteration support in the
    corresponding DBAPI driver.

    Hope this helps,

    L. Daniel Burr
    amberite, Jun 5, 2006
    #2
    1. Advertising

  3. Steve Holden Guest

    amberite wrote:
    > wrote:
    >
    >>I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
    >>and I find that the cursor.execute method uses a lot of memory that
    >>never gets garbage collected. Using fetchmany instead of fetchall does
    >>not seem to make any difference, since it's the execute that uses
    >>memory. Breaking the query down to build lots of small tables doesn't
    >>help, since execute doesn't give its memory back, after reading enough
    >>small tables execute returns a memory error. What is the trick to get
    >>memory back from execute in cx_Oracle and MySQLdb?

    >
    >
    > cx_Oracle and MySQLdb must be handled differently, due to the fact that
    > MySQL does not actually have cursors (MySQLdb fakes them for you).
    >
    > To handle large resultsets efficiently in cx_Oracle simply use the
    > cursor iteration idiom:
    >
    > for row in cursor:
    > # do stuff with the row
    >
    > cx_Oracle takes care of the fetching for you, and your memory usage
    > should remain fairly constant when using this idiom.
    >
    > To handle large resultsets in MySQLdb, you have to resort to multiple
    > queries:
    >
    > l = 1000
    > o = 0
    >
    > cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
    > rows = cursor.fetchall()
    > while len(rows) > 0:
    > # process the fetched rows
    > o += l
    > cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
    > rows = cursor.fetchall()
    >
    > cursor.close()
    >
    > As you can see, the MySQLdb version is more involved, due to the lack
    > of real cursor support in the MySQL database. Any database with good
    > cursor support will likely have good cursor iteration support in the
    > corresponding DBAPI driver.
    >
    > Hope this helps,
    >
    > L. Daniel Burr
    >

    The MySQLdb solution you give is way more complicated than it needs to
    be, thereby skewing your opinion towards cx_Oracle unnecessarily.

    Look up the .fetchmany() method of cursors in the DB API. There is only
    any need to execute a single query no matter how large the result set:
    you simply need to keep calling .fetchmany(N) (where N is whatever
    you've decided by testing is your optimum chunk size) until it returns
    less than N rows, at which point you have exhausted the query.

    It's very little more effort to wrap this all up as a generator that
    effectively allows you to use the same solution as you quote for cx_Oracle.

    regards
    Steve

    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Love me, love my blog http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
    Steve Holden, Jun 5, 2006
    #3
  4. On Mon, Jun 05, 2006 at 07:34:05PM +0100, Steve Holden wrote:
    > amberite wrote:
    > > wrote:
    > >
    > >>I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
    > >>and I find that the cursor.execute method uses a lot of memory that
    > >>never gets garbage collected. Using fetchmany instead of fetchall does
    > >>not seem to make any difference, since it's the execute that uses
    > >>memory. Breaking the query down to build lots of small tables doesn't
    > >>help, since execute doesn't give its memory back, after reading enough
    > >>small tables execute returns a memory error. What is the trick to get
    > >>memory back from execute in cx_Oracle and MySQLdb?

    > >
    > >
    > > cx_Oracle and MySQLdb must be handled differently, due to the fact that
    > > MySQL does not actually have cursors (MySQLdb fakes them for you).
    > >
    > > To handle large resultsets efficiently in cx_Oracle simply use the
    > > cursor iteration idiom:
    > >
    > > for row in cursor:
    > > # do stuff with the row
    > >
    > > cx_Oracle takes care of the fetching for you, and your memory usage
    > > should remain fairly constant when using this idiom.
    > >
    > > To handle large resultsets in MySQLdb, you have to resort to multiple
    > > queries:
    > >
    > > l = 1000
    > > o = 0
    > >
    > > cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
    > > rows = cursor.fetchall()
    > > while len(rows) > 0:
    > > # process the fetched rows
    > > o += l
    > > cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
    > > rows = cursor.fetchall()
    > >
    > > cursor.close()
    > >
    > > As you can see, the MySQLdb version is more involved, due to the lack
    > > of real cursor support in the MySQL database. Any database with good
    > > cursor support will likely have good cursor iteration support in the
    > > corresponding DBAPI driver.
    > >
    > > Hope this helps,
    > >
    > > L. Daniel Burr
    > >

    > The MySQLdb solution you give is way more complicated than it needs to
    > be, thereby skewing your opinion towards cx_Oracle unnecessarily.
    >
    > Look up the .fetchmany() method of cursors in the DB API. There is only
    > any need to execute a single query no matter how large the result set:
    > you simply need to keep calling .fetchmany(N) (where N is whatever
    > you've decided by testing is your optimum chunk size) until it returns
    > less than N rows, at which point you have exhausted the query.
    >
    > It's very little more effort to wrap this all up as a generator that
    > effectively allows you to use the same solution as you quote for cx_Oracle.


    MySQL will keep table locks until the results are all fetched so even though
    the DB API allows fetchone() or fetchmany() using those with MySQLdb is
    dangerous.

    -Jack
    Jack Diederich, Jun 5, 2006
    #4
  5. amberite Guest

    Steve Holden wrote:
    >
    > The MySQLdb solution you give is way more complicated than it needs to
    > be, thereby skewing your opinion towards cx_Oracle unnecessarily.
    >


    I respectfully disagree with your assertion here. The code I presented
    for MySQLdb is what you *have* to do, to avoid using up too much
    memory. This thread is about how to avoid running out of memory when
    handling large resultsets, and the approach I outlined is really the
    only way to do that with MySQLdb's DBAPI support.

    > Look up the .fetchmany() method of cursors in the DB API. There is only
    > any need to execute a single query no matter how large the result set:
    > you simply need to keep calling .fetchmany(N) (where N is whatever
    > you've decided by testing is your optimum chunk size) until it returns
    > less than N rows, at which point you have exhausted the query.
    >


    ..fetchmany() buys you nothing in the case of large resultsets. The
    memory usage will continue to climb with each call to fetchmany. This
    isn't the fault of MySQLdb, it is just that MySQL doesn't support
    cursors, so MySQLdb has to fake it.

    > It's very little more effort to wrap this all up as a generator that
    > effectively allows you to use the same solution as you quote for cx_Oracle.
    >


    Again, I respectfully disagree. Your proposed solution, while
    resulting in shorter code, will continue to eat memory until the entire
    resultset has been delivered.

    L. Daniel Burr
    amberite, Jun 6, 2006
    #5
  6. John J. Lee Guest

    Jack Diederich <> writes:
    [...]
    > MySQL will keep table locks until the results are all fetched so even though
    > the DB API allows fetchone() or fetchmany() using those with MySQLdb is
    > dangerous.

    [...]

    That's not true of InnoDB tables.


    John
    John J. Lee, Jun 6, 2006
    #6
  7. Guest

    whenever you are using a package that leaks memory.
    it can be appropriate to use Rpyc (http://rpyc.wikispaces.com/) to run
    the leaking code in a different process, and restart it from time to
    time.
    I've been using this method to avoid the leaks of matplotlib.
    , Jun 7, 2006
    #7
  8. John Hunter Guest

    >>>>> "yairchu@gmail" == yairchu@gmail com <> writes:

    yairchu@gmail> whenever you are using a package that leaks memory.
    yairchu@gmail> it can be appropriate to use Rpyc
    yairchu@gmail> (http://rpyc.wikispaces.com/) to run the leaking
    yairchu@gmail> code in a different process, and restart it from
    yairchu@gmail> time to time. I've been using this method to avoid
    yairchu@gmail> the leaks of matplotlib.

    The only known leak in matplotlib is in the tkagg backend which we
    believe comes from tkinter and is not in matplotlib proper. There are
    a variety of ways to make it look like matplotlib is leaking memory,
    eg overplotting when you want to first clear the plot, or failing to
    close figures properly. We have unit tests to check for leaks, and
    they are passing. Perhaps you can post some code which exposes the
    problem.

    JDH
    John Hunter, Jun 7, 2006
    #8
  9. Guest

    wrote:
    > I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
    > and I find that the cursor.execute method uses a lot of memory that
    > never gets garbage collected. Using fetchmany instead of fetchall does
    > not seem to make any difference, since it's the execute that uses
    > memory. [...]


    For MySQLdb, the SSCursor class ("Server Side Cursor"), rather than the
    default cursor class, may do what you want: retrieve the result set
    row-by-row on demand, rather than all at once at the time of
    ..execute(). You'll need to remember to .fetch...() every row and call
    ..close(), however.

    See the docstrings for CursorUseResultMixIn and CursorStoreResultMixIn
    classes in MySQLdb.cursors for more information.

    > [...] Breaking the query down to build lots of small tables doesn't
    > help, since execute doesn't give its memory back, after reading enough
    > small tables execute returns a memory error. What is the trick to get
    > memory back from execute in cx_Oracle and MySQLdb?


    Generally, the trick is to avoid consuming the memory in the first
    place. :)

    Regards,
    Mike
    , Jun 11, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?VG9tYXMgS2VwaWM=?=

    Changing DEFAULT cursor to WAIT cursor in ASP

    =?Utf-8?B?VG9tYXMgS2VwaWM=?=, Apr 5, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    2,311
    Bruce Barker
    Apr 5, 2005
  2. Tim Williams
    Replies:
    2
    Views:
    638
    Tim Williams
    May 28, 2004
  3. William Gill

    help with mysql cursor.execute()

    William Gill, Aug 14, 2005, in forum: Python
    Replies:
    7
    Views:
    612
    Dennis Lee Bieber
    Aug 16, 2005
  4. invy
    Replies:
    4
    Views:
    386
    CBFalconer
    Dec 28, 2006
  5. dmaziuk
    Replies:
    3
    Views:
    568
    Chris Gonnerman
    Jan 25, 2011
Loading...

Share This Page