performance of recursive generator

A

aurora

I love generator and I use it a lot. Lately I've been writing some
recursive generator to traverse tree structures. After taking closer look
I have some concern on its performance.

Let's take the inorder traversal from
http://www.python.org/peps/pep-0255.html as an example.

def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x

Consider a 4 level deep tree that has only a right child:

1
\
2
\
3
\
4


Using the recursive generator, the flow would go like this:

main gen1 gen2 gen3 gen4
----------------------------------------------------

inorder(1..4)

yield 1
inorder(2..4)
yield 2
yield 2
inorder(3..4)
yield 3
yield3
yield 3
inorder(4)
yield 4
yield 4
yield 4
yield 4


Note that there are 4 calls to inorder() and 10 yield. Indeed the
complexity of traversing this kind of tree would be O(n^2)!


Compare that with a similar recursive function using callback instead of
generator.

def inorder(t, foo):
if t:
inorder(t.left, foo):
foo(t.label)
inorder(t.right, foo):


The flow would go like this:

main stack1 stack2 stack3 stack4
----------------------------------------------------

inorder(1..4)
foo(1)
inorder(2..4)
foo(2)
inorder(3..4)
foo(3)
inorder(4)
foo(4)


There will be 4 calls to inorder() and 4 call to foo(), give a reasonable
O(n) performance.

Is it an inherent issue in the use of recursive generator? Is there any
compiler optimization possible?
 
M

Matt Hammond

Is it an inherent issue in the use of recursive generator? Is there any
compiler optimization possible?

Hi, I could be misunderstanding it myself, but I think the short answer to
your question is that its an inherent limitation.
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x

Using the generator you pass the tree node labels back up through each
nested generator - therefore the deeper you are, obviously, the more
yields it will take to pass the data back up.

As you seem to have realised, you are, in effect traversing back up the
tree from every node you visit, in order to deliver the data.
def inorder(t, foo):
if t:
inorder(t.left, foo):
foo(t.label)
inorder(t.right, foo):

Inherently, by using callbacks you are shortcutting this - delivering
direct. (I assume foo is a list you're building or something similar) Of
course there might be some cunning alternative formulation, but I can't
think of it :)

To optimise this away, I guess python would need to see this:
for x in inorder(t.left):
yield x

And understand that you're just asking it to pass on all yields from a
nested generator as if they were coming from this generator.

Perhaps if there existed some kind of syntax to hint this to python it
could optimise it away, eg:

yield *inorder(t.left)

.... but AFAIK there isn't :-( so I guess you'll have to avoid recursive
generators for this app!



Matt
 
D

Duncan Booth

aurora wrote:

Note that there are 4 calls to inorder() and 10 yield. Indeed the
complexity of traversing this kind of tree would be O(n^2)!
There will be 4 calls to inorder() and 4 call to foo(), give a
reasonable O(n) performance.

Is it an inherent issue in the use of recursive generator? Is there
any compiler optimization possible?

Do you actually have timing evidence that this is a problem with the data
you are processing? The complexity of the algorithm may be important for
very large data sets, but until you time it you won't know whether it is
likely to be an issue for realistic data.

Your O(n^2) for the generator example is only n^2 if the tree is, as in
your example, completely unbalanced. If you can keep your binary tree
balanced it is n log n. If the tree is likely to become as wildly
unbalanced as in your example then you should consider using a different
data structure (e.g. a btree) where O(n log n) would be the worst case.

If you make the more realistic comparison of n calls to foo, versus nlogn
yields the question is at what point the greater number of fast yields
outweighs the lesser number of slow Python function calls. Until you know
speed really is an issue, go with whichever method makes the programming
easier.
 
T

Terry Reedy

aurora said:
I love generator and I use it a lot. Lately I've been writing some
recursive generator to traverse tree structures. After taking closer look
I have some concern on its performance.

When the stacked yields become a measureable problem over the anticipated
future use of the code, then you can rewrite the obviously correct
recursive generator as a less-obviously correct iterative generator, using
the first to generate test results to check the second.

Terry J. Reedy
 
S

Steven Bethard

aurora said:
I love generator and I use it a lot. Lately I've been writing some
recursive generator to traverse tree structures. After taking closer
look I have some concern on its performance.

Let's take the inorder traversal from
http://www.python.org/peps/pep-0255.html as an example.

def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x

Consider a 4 level deep tree that has only a right child:

1
\
2
\
3
\
4


Using the recursive generator, the flow would go like this:

main gen1 gen2 gen3 gen4
----------------------------------------------------

inorder(1..4)

yield 1
inorder(2..4)
yield 2
yield 2
inorder(3..4)
yield 3
yield3
yield 3
inorder(4)
yield 4
yield 4
yield 4
yield 4


Note that there are 4 calls to inorder() and 10 yield. Indeed the
complexity of traversing this kind of tree would be O(n^2)!

You seem to be assuming that a yield statement and a function call are
equivalent. I'm not sure that's a valid assumption.

Anyway, here's some data to consider:

-------------------- test.py --------------------
def gen(n):
if n:
for i in gen(n/2):
yield i
yield n
for i in gen(n/2):
yield i

def gen_wrapper(n):
return list(gen(n))

def nongen(n, func):
if n:
nongen(n/2, func)
func(n)
nongen(n/2, func)

def nongen_wrapper(n):
result = []
nongen(n, result.append)
return result
-------------------------------------------------

D:\steve>python -m timeit -s "import test" "test.gen_wrapper(100)"
1000 loops, best of 3: 395 usec per loop

D:\steve>python -m timeit -s "import test" "test.nongen_wrapper(100)"
1000 loops, best of 3: 216 usec per loop

D:\steve>python -m timeit -s "import test" "test.gen_wrapper(1000)"
100 loops, best of 3: 3.58 msec per loop

D:\steve>python -m timeit -s "import test" "test.nongen_wrapper(1000)"
1000 loops, best of 3: 1.59 msec per loop

D:\steve>python -m timeit -s "import test" "test.gen_wrapper(10000)"
10 loops, best of 3: 70.5 msec per loop

D:\steve>python -m timeit -s "import test" "test.nongen_wrapper(10000)"
10 loops, best of 3: 26.6 msec per loop

The non-generator version is consistently faster, somewhere around twice
as fast.

Of course, I'll still be writing generator-based solutions util I'm
certain the recursion's the speed bottleneck in my application.

STeVe
 
A

aurora

Hi, I could be misunderstanding it myself, but I think the short answer
to your question is that its an inherent limitation.
....

Perhaps if there existed some kind of syntax to hint this to python it
could optimise it away, eg:

yield *inorder(t.left)

... but AFAIK there isn't :-( so I guess you'll have to avoid recursive
generators for this app!


That would be unfortunately. I think generator is most elegant in
traversing recursive structure. It is non-trivial to use most other
methods. But the O(n^2) price tag is a big caveat to keep in mind.

Of course I agree we should not optimize prematurely. I'm not about to
rewrite my recursive generators just yet. But O(n^2) complexity is
something important to bear in mind. It doesn't necessary cause problems
in practice. But it might.
 
A

aurora

You seem to be assuming that a yield statement and a function call are
equivalent. I'm not sure that's a valid assumption.

I don't know. I was hoping the compiler can optimize away the chain of
yields.
Anyway, here's some data to consider:

-------------------- test.py --------------------
def gen(n):
if n:
for i in gen(n/2):
yield i
yield n
for i in gen(n/2):
yield i

def gen_wrapper(n):
return list(gen(n))

def nongen(n, func):
if n:
nongen(n/2, func)
func(n)
nongen(n/2, func)

def nongen_wrapper(n):
result = []
nongen(n, result.append)
return result
-------------------------------------------------

This test somehow water down the n^2 issue. The problem is in the depth of
recursion, in this case it is only log(n). It is probably more interesting
to test:

def gen(n):
if n:
yield n
for i in gen(n-1):
yield i
 
S

Steven Bethard

aurora said:
This test somehow water down the n^2 issue. The problem is in the depth
of recursion, in this case it is only log(n). It is probably more
interesting to test:

def gen(n):
if n:
yield n
for i in gen(n-1):
yield i

You should be able to do this yourself ;) but here's the results anyway:

-------------------- test.py --------------------
def gen(n):
if n:
yield n
for i in gen(n-1):
yield i

def gen_wrapper(n):
return list(gen(n))

def nongen(n, func):
if n:
func(n)
nongen(n-1, func)

def nongen_wrapper(n):
result = []
nongen(n, result.append)
return result
-------------------------------------------------

D:\steve>python -m timeit -s "import test" "test.gen_wrapper(10)"
10000 loops, best of 3: 22.3 usec per loop

D:\steve>python -m timeit -s "import test" "test.nongen_wrapper(10)"
100000 loops, best of 3: 12 usec per loop

D:\steve>python -m timeit -s "import test" "test.gen_wrapper(20)"
10000 loops, best of 3: 60.8 usec per loop

D:\steve>python -m timeit -s "import test" "test.nongen_wrapper(20)"
10000 loops, best of 3: 20.9 usec per loop

D:\steve>python -m timeit -s "import test" "test.gen_wrapper(100)"
1000 loops, best of 3: 1.01 msec per loop

D:\steve>python -m timeit -s "import test" "test.nongen_wrapper(100)"
10000 loops, best of 3: 93.3 usec per loop

It does appear that you get O(N**2)-ish behavior, but the difference
isn't horribly noticeable at a depth of 10 or 20. How deep are your trees?

I also have to mention that, for this kind of problem, it's silly to use
any recursive solution, when there's a faster non-recursive solution:

-------------------- test.py --------------------
....
def nonrecur(n):
result = []
append = result.append
while n:
append(n)
n -= 1
return result
-------------------------------------------------

D:\steve>python -m timeit -s "import test" "test.nonrecur(10)"
100000 loops, best of 3: 6.26 usec per loop

D:\steve>python -m timeit -s "import test" "test.nonrecur(100)"
10000 loops, best of 3: 37.9 usec per loop

Sure, this only works on non-branching trees, but if that's what you're
worried about, use the solution designed for it. ;)

STeVe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top