Style question on recursive generators

C

Carlos Ribeiro

Hello all,

Here I am using some deeply nested, tree-like data structures. In some
situations I need to traverse the tree; the old-style way to do it is
to write a recursive method on the node class, as in:

def walk(self):
"""old-style recursive tree traversal"""
child.do_something
for child in childs:
child.walk()

Of course, "do_something" can be a configurable method or action on
the node, whose actual implementation doesn't matter for this
discussion. The recursive solution is clean and readable, and that's
why its frequently preferred over the iterative solution, which (a)
requires manual housekeeping (using a stack) to work; and (b) is not
truly 'object oriented', in the sense that it does not delegate node
traversal behavior to the node being traversed.

In my last projects I've been tending towards a greater use of
generators. However, when a generator is used in the situation
described above, the end result does not 'read' as naturally:

def walk(self):
"""generator-based recursive tree traversal"""
yield child
for child in childs:
for node in child.walk()
yield node

The problem is that I need to 'chain' the yields from the subtrees to
yield back the result to the caller. It does not seem to be as elegant
as the simple recursive version; it's confusing to read, because one
has to figure out what is being yielded in the inner loop, and it's
also error-prone -- it's easy to forget to include the 'yield' in the
inner loop.

I'm now wondering which is better: to keep using old-style recursive
tree traversal code, or to chained-yield generator solution. It's more
of a stylistic decision -- I'm not concerned about raw performance at
this point, but clarity and elegancy of design are more important.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
D

Daniel Dittmar

Carlos said:
Hello all,

Here I am using some deeply nested, tree-like data structures. In some
situations I need to traverse the tree; the old-style way to do it is
to write a recursive method on the node class, as in:

def walk(self):
"""old-style recursive tree traversal"""
child.do_something
for child in childs:
child.walk()
[...]

def walk(self):
"""generator-based recursive tree traversal"""
yield child
for child in childs:
for node in child.walk()
yield node
[...]
I'm now wondering which is better: to keep using old-style recursive
tree traversal code, or to chained-yield generator solution. It's more
of a stylistic decision -- I'm not concerned about raw performance at
this point, but clarity and elegancy of design are more important.

You must also consider the code that uses the iteration, at least if you
want to walk the structure for multiple purposes

# In the recursive case
def nodeHandler (node):
node.do_something ()

root.walk (nodeHandler)

# in the generator case
for node in root.walk ():
node.do_something ()

And the latter can also be fed into routines that expect an iterator.

Compare also os.path.walk with os.walk.

Daniel
 
C

Carlos Ribeiro

You must also consider the code that uses the iteration, at least if you
want to walk the structure for multiple purposes

<snip>

Compare also os.path.walk with os.walk.

Daniel

Yes, that's a good point, and it's one of the reasons I tend to prefer
the generator now. But anyway, it's just me, or does the 'chained
yield' look ugly to someone else? It may be just a question of getting
used to it, but nevertheless it's bugging me...


--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
A

Alex Martelli

Carlos Ribeiro said:
Yes, that's a good point, and it's one of the reasons I tend to prefer
the generator now. But anyway, it's just me, or does the 'chained
yield' look ugly to someone else? It may be just a question of getting
used to it, but nevertheless it's bugging me...

I'm not sure why it should look ugly to you. What alternatives would
you consider non-ugly for the common (in any generator) idiom:
for x in y: yield x
?

Perhaps some syntax hack such as a new statement keyword 'yieldall'
where the above idiom could be expressed as
yieldall y
? Or some other syntax hack such as 'yield<<y', or 'yield from y',
ditto? Personally, it would seem to me that such special syntax would
be just one more bit of Python to learn, without substantial benefits,
but that's because I see nothing ugly in today's idiom -- it does just
what it says, plainly and unpretentiously...


Alex
 
C

Carlos Ribeiro

I'm not sure why it should look ugly to you. What alternatives would
you consider non-ugly for the common (in any generator) idiom:
for x in y: yield x
?

Perhaps some syntax hack such as a new statement keyword 'yieldall'
where the above idiom could be expressed as
yieldall y
? Or some other syntax hack such as 'yield<<y', or 'yield from y',
ditto? Personally, it would seem to me that such special syntax would
be just one more bit of Python to learn, without substantial benefits,
but that's because I see nothing ugly in today's idiom -- it does just
what it says, plainly and unpretentiously...

Generally speaking, new keywords hardly solve any problem; in fact,
they create new problems of their own :) In this case, my main gripe
is that the code doesn't read as naturally as the simple recursive
version, because of the double for loop. Its also strange because I
often have to stop and think, "what am I yielding now". For deeply
nested recursive cases there is also the feeling that the yield chain
may prove to be a performance dog, but that's should not be the main
point here anyway.

I have a strange idea in the back of my mind that tells me that a
possible solution would be to have any of the nested yields to
immediately return the result to the original caller. In this case, I
would write my code this way:

def walk(self):
"""generator-based recursive tree traversal"""
yield child
for child in childs:
transfer child.walk()

It's quite weird -- a generator plus a direct flow-of-execution
transfer. It's not a goto, its not a co-routine... It requires a new
keyword ('transfer', in this case), which _is_ a real problem --
something like this can't be lightly proposed. In this case, it's just
a thought experiment, at this point; nothing serious, and far from a
proposal.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
S

Steve Holden

Carlos said:
Generally speaking, new keywords hardly solve any problem; in fact,
they create new problems of their own :) In this case, my main gripe
is that the code doesn't read as naturally as the simple recursive
version, because of the double for loop. Its also strange because I
often have to stop and think, "what am I yielding now". For deeply
nested recursive cases there is also the feeling that the yield chain
may prove to be a performance dog, but that's should not be the main
point here anyway.

I have a strange idea in the back of my mind that tells me that a
possible solution would be to have any of the nested yields to
immediately return the result to the original caller. In this case, I
would write my code this way:

def walk(self):
"""generator-based recursive tree traversal"""
yield child
for child in childs:
transfer child.walk()

It's quite weird -- a generator plus a direct flow-of-execution
transfer. It's not a goto, its not a co-routine... It requires a new
keyword ('transfer', in this case), which _is_ a real problem --
something like this can't be lightly proposed. In this case, it's just
a thought experiment, at this point; nothing serious, and far from a
proposal.
I can see why this might seem attractive, but in point of fact it would
do little more than the elimination of tail recursion, and would have
similar disadvantages - it wouldn't be possible to get useful tracebacks
in the event that problems occurred.

If a data structure is recursive then a recursive generator or a
recursive function is often the bext way to handle it, and yielding a
result to another yield statement (at however many levels) surely isn't
problematical enough to remove a desirable regularity from the language
and its implementations.

regards
Steve
 
S

Steve Holden

Carlos said:
Generally speaking, new keywords hardly solve any problem; in fact,
they create new problems of their own :) In this case, my main gripe
is that the code doesn't read as naturally as the simple recursive
version, because of the double for loop. Its also strange because I
often have to stop and think, "what am I yielding now". For deeply
nested recursive cases there is also the feeling that the yield chain
may prove to be a performance dog, but that's should not be the main
point here anyway.

I have a strange idea in the back of my mind that tells me that a
possible solution would be to have any of the nested yields to
immediately return the result to the original caller. In this case, I
would write my code this way:

def walk(self):
"""generator-based recursive tree traversal"""
yield child
for child in childs:
transfer child.walk()

It's quite weird -- a generator plus a direct flow-of-execution
transfer. It's not a goto, its not a co-routine... It requires a new
keyword ('transfer', in this case), which _is_ a real problem --
something like this can't be lightly proposed. In this case, it's just
a thought experiment, at this point; nothing serious, and far from a
proposal.
I can see why this might seem attractive, but in point of fact it would
do little more than the elimination of tail recursion, and would have
similar disadvantages - it wouldn't be possible to get useful tracebacks
in the event that problems occurred.

If a data structure is recursive then a recursive generator or a
recursive function is often the bext way to handle it, and yielding a
result to another yield statement (at however many levels) surely isn't
problematical enough to remove a desirable regularity from the language
and its implementations.

regards
Steve
 
S

Stefan Behnel

Carlos said:
def walk(self):
"""generator-based recursive tree traversal"""
yield child
for child in childs:
transfer child.walk()

It's quite weird -- a generator plus a direct flow-of-execution
transfer. It's not a goto, its not a co-routine... It requires a new
keyword ('transfer', in this case), which _is_ a real problem --
something like this can't be lightly proposed. In this case, it's just
a thought experiment, at this point; nothing serious, and far from a
proposal.


I know what it feels like. I had to implement the same thing for a
recursive parser once - looks somewhat inefficient to iterate over a
iterator only for yielding the iterator's results...

Maybe you know the itertools module?

http://www.python.org/dev/doc/devel/lib/itertools-functions.html

It has some nice functions that can make this sort of code more readable
(and maybe even more efficient). You may find itertoold.chain especially
useful.

Stefan
 
C

Carlos Ribeiro

I know what it feels like. I had to implement the same thing for a
recursive parser once - looks somewhat inefficient to iterate over a
iterator only for yielding the iterator's results...

Maybe you know the itertools module?

http://www.python.org/dev/doc/devel/lib/itertools-functions.html

It has some nice functions that can make this sort of code more readable
(and maybe even more efficient). You may find itertoold.chain especially
useful.

Thanks -- that's a really good reference. I think I should read more
of the documentation instead of banging up my own home made solution
:) Anyway, it does not solve the recursion problem per se, but it
gave me some ideas on own can I rewrite the code. I'll try it later...


--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
A

Alex Martelli

On 2004 Oct 18, at 16:57, Carlos Ribeiro wrote:
...
def walk(self):
"""generator-based recursive tree traversal"""
yield child
for child in childs:

Btw, what are this 'child' and 'childs', _globals_? I assume you must
mean something like self and self.children, but as that may become
important in further discussion it would be nice to be more precise in
the example.
transfer child.walk()

It's quite weird -- a generator plus a direct flow-of-execution
transfer. It's not a goto, its not a co-routine... It requires a new
keyword ('transfer', in this case), which _is_ a real problem --

How does the semantics of this differ from those of
for x in child.walk(): yield x
? I don't know what a "direct flow of execution transfer" _is_, since
when the 'transfer' is finished execution had better come back here.
So, I fail to appreciate the difference, if any, between:
for x in y: yield x
and
transfer y
which is just the same thing I had mentioned (under guise of 'yieldall'
in lieu of 'transfer', but that's mere sugar) in the post you're
replying to. And I think that a combination such as
yield from y
besides not requiring a new keyword might be even more readable. I'm
using y as just a placeholder here, it would be "yield from
child.walk()" in your case.

If the issue is strictly one of whether it's worth somehow making
for x in y: yield x
terser (by a new kw, combination of kw, and/or punctuation) I guess we
can discuss that (I just don't see the gain to offset the price of
introducing new syntax and one more language thing to learn for
Pythonistas and newcomers to Python). If you mean something different
for your 'transfer' then maybe you could clarify _what_...


Alex
 
C

Carlos Ribeiro

On 2004 Oct 18, at 16:57, Carlos Ribeiro wrote:
...

Btw, what are this 'child' and 'childs', _globals_? I assume you must
mean something like self and self.children, but as that may become
important in further discussion it would be nice to be more precise in
the example.

Sorry. It's pseudo code, but it's _so_ obviously broken :-( Should
have taken more care...

def walk(self):
"""generator-based recursive tree traversal"""
yield self
for child in self.childs:
transfer child.walk()
How does the semantics of this differ from those of
for x in child.walk(): yield x

I was thinking about something along the lines of a tail recursion
elimination -- sorts of, I mean. Something like "call the child.walk()
code, but yield results direct to the original caller". So you're
right, this is just synctatic sugar, but bad sugar at it :) (one with
a bitter taste, that is).

But rest assured that I'm now convinced that the original recursive
generator is the way to go, and the my discomfort with it is not
justified. It seemed weird at first, as if I was doing something wrong
-- having to loop over an iterator just to return its members. But
that's just fine.
If the issue is strictly one of whether it's worth somehow making
for x in y: yield x
terser (by a new kw, combination of kw, and/or punctuation) I guess we
can discuss that (I just don't see the gain to offset the price of
introducing new syntax and one more language thing to learn for
Pythonistas and newcomers to Python). If you mean something different
for your 'transfer' then maybe you could clarify _what_...

In the future, and _if_ this idiom (recursive generators) becomes
common enough, maybe it's justified to talk about a new idiom, method
or keyword to support/facilitate it. Not now, really.


--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
D

Dima Dorfman

Alex Martelli said:
If the issue is strictly one of whether it's worth somehow making
for x in y: yield x
terser (by a new kw, combination of kw, and/or punctuation) I guess we
can discuss that (I just don't see the gain to offset the price of
introducing new syntax and one more language thing to learn for
Pythonistas and newcomers to Python).

I find myself writing a lot of recursive generators, and a while ago I
implemented this:

yield *other

to mean:

for x in other:
yield x

No new keywords, and the asterisk is already used for a similar
purpose (iterator expansion) in the function call syntax.

Thoughts? "Neat!" or "Yuck!"?
 
A

Andrea Griffini

I find myself writing a lot of recursive generators, and a while ago I
implemented this:

yield *other

to mean:

for x in other:
yield x

No new keywords, and the asterisk is already used for a similar
purpose (iterator expansion) in the function call syntax.

Thoughts? "Neat!" or "Yuck!"?

An ignored "Neat" for me

Andrea
 
A

Alex Martelli

Dima Dorfman said:
I find myself writing a lot of recursive generators, and a while ago I
implemented this:

yield *other

to mean:

for x in other:
yield x

No new keywords, and the asterisk is already used for a similar
purpose (iterator expansion) in the function call syntax.

Thoughts? "Neat!" or "Yuck!"?

Looks to me roughly equivalent to such proposals as 'yield from other';
not totally out of whack but IMHO not warranted. Re specifically
expanding the use of prefix *, I think there might be other good uses
for that -- say:
a = b, *c, d
meaning the same as
a = (b,) + tuple(c) + (d,)
or the old, currently out of favour proposal
a, b, c, *d = e
where d gets (an iterator for) 'the rest of e' after the first three
items of e are unpacked to a, b, and c. If your idea to specialcase the
* after a yield can work together with these other possibilities, the
whole package might perhaps get in together.


Alex
 
D

Dima Dorfman

Alex said:
say:
a = b, *c, d
meaning the same as
a = (b,) + tuple(c) + (d,)
or the old, currently out of favour proposal
a, b, c, *d = e

Both of these are intuitive to me. I'm generally in favor of expanding
the use of '*' to mean aggregation in this manner, but someone else is
gonna have to organize the crusade if they want one.
where d gets (an iterator for) 'the rest of e' after the first three
items of e are unpacked to a, b, and c.

FWIW, the implementation for the unpacking case is mostly straight-
forward (I did one a while back). The only caveat is what to do with
*d. You suggest storing an iterator for the rest of e, but this is
unuseful if e is a real sequence, like a tuple. It's easy enough to
special-case tuples and lists, but what about other user-defined
types? Maybe slice off the rest? Real iterators (iter(x) is x) would
have to be treated specially in any case.
 
A

Alex Martelli

Dima Dorfman said:
Both of these are intuitive to me. I'm generally in favor of expanding
the use of '*' to mean aggregation in this manner, but someone else is
gonna have to organize the crusade if they want one.

It will take an able and subtle spokesman, for it IS currently out of
favour (and the prospect of no language changes in 2.4->2.5, just like
we had none in 2.2->2.3, appears quite possible).

FWIW, the implementation for the unpacking case is mostly straight-
forward (I did one a while back). The only caveat is what to do with
*d. You suggest storing an iterator for the rest of e, but this is
unuseful if e is a real sequence, like a tuple. It's easy enough to
special-case tuples and lists, but what about other user-defined
types? Maybe slice off the rest? Real iterators (iter(x) is x) would
have to be treated specially in any case.

I hate, detest, and abhor specialcasing anything. "Special cases are
not special enough to break the rules". 'filter' (which specialcases
some kinds of sequences, while returning a list for all others) is a
good example of the kind of thing I believe should be avoided.

Leaving d as an iterator over the rest of e is something that is
extremely simple to explain and implement. Explaning that 'd is a list,
string or tuple if that's the type of e; otherwise, if e supplies
__getitem__ AND that getitem accepts a slice object with None values for
stop and step, or alternatively if e supplies a __getslice__ method
that's callable with None as the stop argument, then d is e[3:]; in
other cases, d is an iterator over the rest of e' is just the kind of
thing that sends shivers through every fiber of my being. Maybe it's
because I've had to write enough of those GD 'caveats' in the Nutshell,
enough to show me that there are already too many special cases... I
want to _get rid_ of the existing ones (once backwards compatibility can
be broken), definitely *NOT* add more 'convenient' ones...!!!-)

Saving a once-in-a-blue-moon 'd = list(d)' followup statement, or the
like, is just not worth this kind of aggravation, IMHO.


Alex
 
C

Carlos Ribeiro

I find myself writing a lot of recursive generators, and a while ago I
implemented this:

yield *other

to mean:

for x in other:
yield x

Just a clarification. You _implemented_ it as a patch to Python, or
using some clever trick that I'm not aware of? Or did you just write a
specification?


--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
G

Greg Ewing

Alex said:
Looks to me roughly equivalent to such proposals as 'yield from other';
not totally out of whack but IMHO not warranted.

If it's a matter of whether a special syntax for this
is justified, it might open the way for some kind of
optimisation to bypass the overhead of the extra
for-loop.
 
A

Alex Martelli

Greg Ewing said:
If it's a matter of whether a special syntax for this
is justified, it might open the way for some kind of
optimisation to bypass the overhead of the extra
for-loop.

What "extra" for-loop? Sorry, must be missing something here.

Anyway, the bytecode that idiom makes is something like:

3 LOAD_FAST 0 (other)
6 GET_ITER 10 STORE_FAST 1 (x)
13 LOAD_FAST 1 (x)
16 YIELD_VALUE
17 JUMP_ABSOLUTE 7

where of course [3] is to be replaced by the computation of 'other' by
whatever means. The pattern in 6..17 doesn't seem hard to recognize if
we want to optimize commonly occurring blocks of bytecode into something
faster -- assuming this IS commonly occurring, of course, and that there
is a worthwhile optimization (maybe into a new bytecode, or something).

But introducing special syntax just to replace the need for an
optimizer's looking into bytecode patterns leaves me a bit dubious.
Also, I think there may be other more common patterns to optimize with
higher priority/returns than this one (just guessing).

I wouldn't mind if 2.4->2.5 focused (besides the library) on
optimization (perhaps conditional on -O, which right now doesn't _do_
much;-) -- and peephole optimizing the bytecode looks like something
that might well be worthwhile. I know of attempts to do that in the
past, and as I recall they weren't very fruitful, but the internals have
changed enough since then that this might now be different.


Alex
 
D

Dima Dorfman

Carlos Ribeiro said:
Just a clarification. You _implemented_ it as a patch to Python, or
using some clever trick that I'm not aware of? Or did you just write a
specification?

As an [incomplete] patch against CPython
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top