Geneator/Iterator Nesting Problem - Any Ideas? 2.4

C

ChaosKCW

Hi

Using Python 2.4 I am trying to procduce a generator which will return
the results of dbi SQL statement using fetchmany for performance.

So instead of fetching one record for each call, I will fetch batches
of X (eg 100) and yeild each record in turn.

For reasons of pure asthetics and my own learning I wanted it to look
like this:

</B>
def resultsetbatchgen(cursor, size=100):
for results in (recordbatch for recordbatch in
cursor.fetchmany(size)):
for rec in results:
yield rec
</B>

Problem is this this gives spurious results. To understand the problem
if I excute the following in the console I get the correct results:

1000
</B>

This is PERFECT! 1000 is what I expected. but now lets nest this inside
another for construct like so

cur.fetchmany(1000)):
.... print sum(map(lambda x: 1, (rec for rec in results)))
76
76
76
76
76
76
76
76
............
</B>

Now it thinks each batch size is 76 ... ? This completly wrong, and
baffling.

The commands are exactly the same as far as I can tell, the only
difference is that it is now nested wihtin another for loop?

Any help would be greatly aprpeciated.

PS a working but far less elegant version of this below but I would
like to understand why teh above doesnt work and doesnt work
consitantly:

def resultsetbatch(cursor, size=100):
results = cursor.fetchmany(size)
while results <> []:
for rec in results:
yield rec
results = cursor.fetchmany(size)
 
S

Steven Bethard

ChaosKCW said:
For reasons of pure asthetics and my own learning I wanted it to look
like this:

</B>
def resultsetbatchgen(cursor, size=100):
for results in (recordbatch for recordbatch in
cursor.fetchmany(size)):
for rec in results:
yield rec
</B>

Note that this is equivalent to:

def resultsetbatchgen(cursor, size=100):
for results in cursor.fetchmany(size):
for rec in results:
yield rec

That is, your generator expression isn't really doing anything.

This should be equivalent to
....
... print sum(map(lambda x: 1, (rec for rec in results)))

This is now equivalent to:.... print len(list(results))

Note that you're counting the number of elements in record, which I'm
guessing is 76?

I'm thinking you want to do something like:

def resultsetbatchgen(cursor, size=100):
while True:
results = cursor.fetchmany(size)
if not results:
break
for rec in results:
yield rec

I don't actually know how you tell when there isn't anything more to
fetch, but if it returns an empty list, you could also write this like:

def resultsetbatchgen(cursor, size=100):
for results in iter(lambda: cursor.fetchmany(size), [])
for rec in results:
yield rec


HTH,

STeVe
 
C

ChaosKCW

Hi

Thanks this was very helpfull.

Your final solution seems like the best (most elegant). I was trying to
avoid the ugly while loops with breaks and this final one certainly
does that.

Thanks for your help .
 
C

ChaosKCW

For those that are interested I ran a performance comparison of various
functions for wrapping sql results in a interables and generators. The
results are below and the code is at the very bottom.

Surprisinly (in a happy way) the functional version of the batch
retrieve comes in a very close second beating out two other common
iterative approaches. However the winner by a small margin is still an
iterative routine. The elegance of the more functional approach (thanks
to Steve) and its excellent performance makes it a clear winner in my
mind.

The other intersting conclusion is that batch fetching results gives a
fairly significant and real performance boost. Its not just academic.

The Winner:

def resultset_functional_batchgenerator(cursor, size=100):
for results in iter(lambda: cursor.fetchmany(size), []):
for rec in results:
yield rec


Test Results (P2.4 IBM T41 Thinkpad):
Ordered by: cumulative time
List reduced from 57 to 7 due to restriction <'test_'>

ncalls tottime percall cumtime percall filename:lineno(function)
1 2.140 2.140 54.002 54.002
PerformanceTestGenerators.py:102(test_resultset_functional_generator)
1 1.957 1.957 45.484 45.484
PerformanceTestGenerators.py:98(test_resultset_iterative_generator)
1 2.433 2.433 41.844 41.844
PerformanceTestGenerators.py:94(test_resultset_iterator)
1 1.930 1.930 39.793 39.793
PerformanceTestGenerators.py:110(test_resultset_iterative_batchgenerator_2)
1 1.734 1.734 35.561 35.561
PerformanceTestGenerators.py:114(test_resultset_iterative_batchgenerator_3)
1 1.980 1.980 34.579 34.579
PerformanceTestGenerators.py:118(test_resultset_functional_batchgenerator)
1 1.780 1.780 31.696 31.696
PerformanceTestGenerators.py:106(test_resultset_iterative_batchgenerator_1)

Code:

import unittest
import odbc
import profile
import pstats

class resultset_iterator:
"Iterate over the recordset and frees the cursor afterwards."
def __init__(self, cursor):
self.cursor = cursor
def __iter__(self): return self
def next(self):
rec = self.cursor.fetchone()
if not rec:
raise StopIteration
return rec

def resultset_iterative_generator(cursor):
rec = cursor.fetchone();
while rec:
yield rec
rec = cursor.fetchone();

def resultset_functional_generator(cursor):
for rec in iter(lambda: cursor.fetchone(), None):
yield rec

def resultset_iterative_batchgenerator_1(cursor, size=100):
results = cursor.fetchmany(size)
while results:
for rec in results:
yield rec
results = cursor.fetchmany(size)

def resultset_iterative_batchgenerator_2(cursor, arraysize=100):
'An iterator that uses fetchmany to keep memory usage down'
done = False
while not done:
results = cursor.fetchmany(arraysize)
if results == []:
done = True
for result in results:
yield result

def resultset_iterative_batchgenerator_3(cursor, size=100):
while True:
results = cursor.fetchmany(size)
if not results:
break
for rec in results:
yield rec

def resultset_functional_batchgenerator(cursor, size=100):
for results in iter(lambda: cursor.fetchmany(size), []):
for rec in results:
yield rec

class testResultSetGenerators(unittest.TestCase):

connectstring = "*REMOVED*"
sql = "*REMOVED*"

def setUp(self):
self.con = odbc.odbc(self.connectstring)
self.cur = self.con.cursor()
self.cur.execute(self.sql)

def tearDown(self):
self.cur.close()
self.con.close()

def test_resultset_iterator(self):
for row in resultset_iterator(self.cur):
pass

def test_resultset_iterative_generator(self):
for row in resultset_iterative_generator(self.cur):
pass

def test_resultset_functional_generator(self):
for row in resultset_functional_generator(self.cur):
pass

def test_resultset_iterative_batchgenerator_1(self):
for row in resultset_iterative_batchgenerator_1(self.cur):
pass

def test_resultset_iterative_batchgenerator_2(self):
for row in resultset_iterative_batchgenerator_2(self.cur):
pass

def test_resultset_iterative_batchgenerator_3(self):
for row in resultset_iterative_batchgenerator_3(self.cur):
pass

def test_resultset_functional_batchgenerator(self):
for row in resultset_functional_batchgenerator(self.cur):
pass

if __name__ == '__main__':
suite = unittest.makeSuite(testResultSetGenerators)
profile.run('unittest.TextTestRunner(verbosity=2).run(suite)',
'c:\\temp\\profile')

p = pstats.Stats('c:\\temp\\profile')
p.strip_dirs().sort_stats('cumulative').print_stats('test_')
 
P

Peter Otten

ChaosKCW said:
def resultset_functional_generator(cursor):
for rec in iter(lambda: cursor.fetchone(), None):
yield rec

This can be simplified to

def resultset_functional_generator(cursor):
return iter(cursor.fetchone, None)

It should be a bit faster, too.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top