psycopg2 & large result set

J

Jon Clements

Hi All.

I'm using psycopg2 to retrieve results from a rather large query (it
returns 22m records); unsurprisingly this doesn't fit in memory all at
once. What I'd like to achieve is something similar to a .NET data
provider I have which allows you to set a 'FetchSize' property; it
then retrieves 'n' many rows at a time, and fetches the next 'chunk'
after you read past the end of the current chunk. I suppose I could
use Python for .NET or IronPython but I'd rather stick with CPython
2.5 if possible.

I'm not 100% sure if it's an interface or a server thing. Any ideas
are most welcome.

Cheers,

Jon.
 
R

Rob Wolfe

Jon said:
Hi All.

I'm using psycopg2 to retrieve results from a rather large query (it
returns 22m records); unsurprisingly this doesn't fit in memory all at
once. What I'd like to achieve is something similar to a .NET data
provider I have which allows you to set a 'FetchSize' property; it
then retrieves 'n' many rows at a time, and fetches the next 'chunk'
after you read past the end of the current chunk. I suppose I could
use Python for .NET or IronPython but I'd rather stick with CPython
2.5 if possible.

psycopg2 is DB-API 2.0 [1]_ compliant, so you can use
``fetchmany`` method and set ``cursor.arraysize`` accordingly.

[1] .. http://www.python.org/dev/peps/pep-0249/
 
P

Paul Boddie

I'm using psycopg2 to retrieve results from a rather large query (it
returns 22m records); unsurprisingly this doesn't fit in memory all at
once. What I'd like to achieve is something similar to a .NET data
provider I have which allows you to set a 'FetchSize' property; it
then retrieves 'n' many rows at a time, and fetches the next 'chunk'
after you read past the end of the current chunk. I suppose I could
use Python for .NET or IronPython but I'd rather stick with CPython
2.5 if possible.

I'm not 100% sure if it's an interface or a server thing. Any ideas
are most welcome.

It's an interface thing. The DB-API has fetchone, fetchmany and
(optionally) iteration methods on cursor objects; PostgreSQL supports
what you have in mind; pyPgSQL supports it at the interface level, but
psycopg2 only supports it if you use "named cursors", which is not
part of the DB-API specification as far as I recall, and not
particularly convenient if you're thinking of targeting more than one
database system with the same code. See this bug filed against
psycopg2 and the resulting discussion:

http://www.initd.org/tracker/psycopg/ticket/158

I've been running a patched version of psycopg2, but haven't developed
the patch further since it may be more convenient for me to switch
back to pyPgSQL eventually.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top