how not to run out of memory in cursor.execute

  • Thread starter johnlichtenstein
  • Start date
J

johnlichtenstein

I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
and I find that the cursor.execute method uses a lot of memory that
never gets garbage collected. Using fetchmany instead of fetchall does
not seem to make any difference, since it's the execute that uses
memory. Breaking the query down to build lots of small tables doesn't
help, since execute doesn't give its memory back, after reading enough
small tables execute returns a memory error. What is the trick to get
memory back from execute in cx_Oracle and MySQLdb?
 
A

amberite

I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
and I find that the cursor.execute method uses a lot of memory that
never gets garbage collected. Using fetchmany instead of fetchall does
not seem to make any difference, since it's the execute that uses
memory. Breaking the query down to build lots of small tables doesn't
help, since execute doesn't give its memory back, after reading enough
small tables execute returns a memory error. What is the trick to get
memory back from execute in cx_Oracle and MySQLdb?

cx_Oracle and MySQLdb must be handled differently, due to the fact that
MySQL does not actually have cursors (MySQLdb fakes them for you).

To handle large resultsets efficiently in cx_Oracle simply use the
cursor iteration idiom:

for row in cursor:
# do stuff with the row

cx_Oracle takes care of the fetching for you, and your memory usage
should remain fairly constant when using this idiom.

To handle large resultsets in MySQLdb, you have to resort to multiple
queries:

l = 1000
o = 0

cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()
while len(rows) > 0:
# process the fetched rows
o += l
cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()

cursor.close()

As you can see, the MySQLdb version is more involved, due to the lack
of real cursor support in the MySQL database. Any database with good
cursor support will likely have good cursor iteration support in the
corresponding DBAPI driver.

Hope this helps,

L. Daniel Burr
 
S

Steve Holden

amberite said:
cx_Oracle and MySQLdb must be handled differently, due to the fact that
MySQL does not actually have cursors (MySQLdb fakes them for you).

To handle large resultsets efficiently in cx_Oracle simply use the
cursor iteration idiom:

for row in cursor:
# do stuff with the row

cx_Oracle takes care of the fetching for you, and your memory usage
should remain fairly constant when using this idiom.

To handle large resultsets in MySQLdb, you have to resort to multiple
queries:

l = 1000
o = 0

cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()
while len(rows) > 0:
# process the fetched rows
o += l
cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()

cursor.close()

As you can see, the MySQLdb version is more involved, due to the lack
of real cursor support in the MySQL database. Any database with good
cursor support will likely have good cursor iteration support in the
corresponding DBAPI driver.

Hope this helps,

L. Daniel Burr
The MySQLdb solution you give is way more complicated than it needs to
be, thereby skewing your opinion towards cx_Oracle unnecessarily.

Look up the .fetchmany() method of cursors in the DB API. There is only
any need to execute a single query no matter how large the result set:
you simply need to keep calling .fetchmany(N) (where N is whatever
you've decided by testing is your optimum chunk size) until it returns
less than N rows, at which point you have exhausted the query.

It's very little more effort to wrap this all up as a generator that
effectively allows you to use the same solution as you quote for cx_Oracle.

regards
Steve
 
J

Jack Diederich

The MySQLdb solution you give is way more complicated than it needs to
be, thereby skewing your opinion towards cx_Oracle unnecessarily.

Look up the .fetchmany() method of cursors in the DB API. There is only
any need to execute a single query no matter how large the result set:
you simply need to keep calling .fetchmany(N) (where N is whatever
you've decided by testing is your optimum chunk size) until it returns
less than N rows, at which point you have exhausted the query.

It's very little more effort to wrap this all up as a generator that
effectively allows you to use the same solution as you quote for cx_Oracle.

MySQL will keep table locks until the results are all fetched so even though
the DB API allows fetchone() or fetchmany() using those with MySQLdb is
dangerous.

-Jack
 
A

amberite

Steve said:
The MySQLdb solution you give is way more complicated than it needs to
be, thereby skewing your opinion towards cx_Oracle unnecessarily.

I respectfully disagree with your assertion here. The code I presented
for MySQLdb is what you *have* to do, to avoid using up too much
memory. This thread is about how to avoid running out of memory when
handling large resultsets, and the approach I outlined is really the
only way to do that with MySQLdb's DBAPI support.
Look up the .fetchmany() method of cursors in the DB API. There is only
any need to execute a single query no matter how large the result set:
you simply need to keep calling .fetchmany(N) (where N is whatever
you've decided by testing is your optimum chunk size) until it returns
less than N rows, at which point you have exhausted the query.

..fetchmany() buys you nothing in the case of large resultsets. The
memory usage will continue to climb with each call to fetchmany. This
isn't the fault of MySQLdb, it is just that MySQL doesn't support
cursors, so MySQLdb has to fake it.
It's very little more effort to wrap this all up as a generator that
effectively allows you to use the same solution as you quote for cx_Oracle.

Again, I respectfully disagree. Your proposed solution, while
resulting in shorter code, will continue to eat memory until the entire
resultset has been delivered.

L. Daniel Burr
 
J

John J. Lee

Jack Diederich said:
MySQL will keep table locks until the results are all fetched so even though
the DB API allows fetchone() or fetchmany() using those with MySQLdb is
dangerous.
[...]

That's not true of InnoDB tables.


John
 
Y

yairchu

whenever you are using a package that leaks memory.
it can be appropriate to use Rpyc (http://rpyc.wikispaces.com/) to run
the leaking code in a different process, and restart it from time to
time.
I've been using this method to avoid the leaks of matplotlib.
 
J

John Hunter

yairchu@gmail> whenever you are using a package that leaks memory.
yairchu@gmail> it can be appropriate to use Rpyc
yairchu@gmail> (http://rpyc.wikispaces.com/) to run the leaking
yairchu@gmail> code in a different process, and restart it from
yairchu@gmail> time to time. I've been using this method to avoid
yairchu@gmail> the leaks of matplotlib.

The only known leak in matplotlib is in the tkagg backend which we
believe comes from tkinter and is not in matplotlib proper. There are
a variety of ways to make it look like matplotlib is leaking memory,
eg overplotting when you want to first clear the plot, or failing to
close figures properly. We have unit tests to check for leaks, and
they are passing. Perhaps you can post some code which exposes the
problem.

JDH
 
M

mjp

I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
and I find that the cursor.execute method uses a lot of memory that
never gets garbage collected. Using fetchmany instead of fetchall does
not seem to make any difference, since it's the execute that uses
memory. [...]

For MySQLdb, the SSCursor class ("Server Side Cursor"), rather than the
default cursor class, may do what you want: retrieve the result set
row-by-row on demand, rather than all at once at the time of
..execute(). You'll need to remember to .fetch...() every row and call
..close(), however.

See the docstrings for CursorUseResultMixIn and CursorStoreResultMixIn
classes in MySQLdb.cursors for more information.
[...] Breaking the query down to build lots of small tables doesn't
help, since execute doesn't give its memory back, after reading enough
small tables execute returns a memory error. What is the trick to get
memory back from execute in cx_Oracle and MySQLdb?

Generally, the trick is to avoid consuming the memory in the first
place. :)

Regards,
Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top