Walking deeply nested lists


D

donn

This is all about walking trees, recursion and generators. None of which
fit my brain at all!

From an XML tree (an SVG file) I build a bunch of Tag objects.

[I use lxml, but I am combining multiple svg files into a 'forest' of
trees, so I can't use the lxml walking methods because they all stop at
the 'end' of a branch where there is actually a 'link' over to another
tree.]

Each Tag has a flatwalk() method. The return from that is a list which
can be deeply nested. As an example, something like this:
L=[1, [2, 3, [4, [5, 6], 7], 8], [9, 10] ]

(The numbers in the example would actually be Tag Instances.)

Aim 1
---
I'm trying to write a generator around such lists so that I can 'walk' them:

for parent, child in mystery_generator(L):
print level, item

Right now, I am running a 'flatten' function on the entire list (which I
don't savvy, I found it online) and then I iterate over that flattened
list. This doesn't feel right to me. (Code at end.)

Aim 2
---
The Objects that represent the svg tags use that flatwalk() method to
build the nested list, I'd far prefer flatwalk() to be directly useable
in something like this way:

for container, child in some_tag.flatwalk():
print container, child

The flatwalk() function is (and this is another puzzle see *) kind of
recursive. "For each of my children, tell that child to go flatwalk()".
(Code at end.)
I am not sure how to turn such a thing into a generator.

I keep wondering how os.walk() does its thing. My hierarchy of Tag
objects can be thought of as directories (tags:g, symbol, svg), files
(path, circle, etc.) and soft-links (use tags).

* If an Instance calls a method on *another* Instance of the *same*
class, is this still recursion? And how does this 'stack up'? I mean,
literally, on the stack. Does each instance get its own stack, or does
all the push, call, pop stuff happen in one main stack?
(I worry about recursion depth limits because svg trees can get quite deep.)


The walking code so far:

## Found code.
def flatten(input):
output = []
stack = []
stack.extend(reversed(input))
while stack:
top = stack.pop()
if isinstance(top, list):
stack.extend(reversed(top))
else:
output.append(top)
return output

## My walker
def test_walk(e): #lxml Element comes in
obj = ebag_get(e)['obj'] #get a tag object
l=obj.flatwalk()
ll= flatten(l)
#No current solution to get 'parent' in this iterator
#ie. for parent, child in ...
for tag in ll:
print tag

Here's one of my Tag objects:

class Brancher(object):
def __init__(self, elem):
self.elem = elem
self.children = []

def flatwalk(self):
l=[self]
for child in self.children:
l.append( child.flatwalk() ) #recur(ish)ion here
return l

class G( Brancher ): #Object for <g> tags.
pass

\d
 
Ad

Advertisements

P

Peter Otten

donn said:
* If an Instance calls a method on another Instance of the same
class, is this still recursion? And how does this 'stack up'? I mean,
literally, on the stack. Does each instance get its own stack, or does
all the push, call, pop stuff happen in one main stack?
(I worry about recursion depth limits because svg trees can get quite
deep.)

If you call functions within functions (or methods, it doesn't matter) they
consume stack space, e. g:
.... return beta()
........ return gamma()
........ return random.choice([alpha, beta, gamma])()
....Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in alpha
File "<stdin>", line 2, in beta
File "<stdin>", line 2, in gamma
File "<stdin>", line 2, in gamma
File "<stdin>", line 2, in alpha
File "<stdin>", line 2, in beta
File "<stdin>", line 2, in gamma
File "<stdin>", line 2, in beta
File "<stdin>", line 2, in gamma
RuntimeError: maximum recursion depth exceeded

The normal recursion limit is 1000, I'm reducing it to give you a smaller
traceback.

Peter
 
A

Arnaud Delobelle

donn said:
This is all about walking trees, recursion and generators. None of
which fit my brain at all!

From an XML tree (an SVG file) I build a bunch of Tag objects.

[I use lxml, but I am combining multiple svg files into a 'forest' of
trees, so I can't use the lxml walking methods because they all stop
at the 'end' of a branch where there is actually a 'link' over to
another tree.]

Each Tag has a flatwalk() method. The return from that is a list which
can be deeply nested. As an example, something like this:
L=[1, [2, 3, [4, [5, 6], 7], 8], [9, 10] ]

(The numbers in the example would actually be Tag Instances.)

Aim 1
---
I'm trying to write a generator around such lists so that I can 'walk' them:

for parent, child in mystery_generator(L):
print level, item

Right now, I am running a 'flatten' function on the entire list (which
I don't savvy, I found it online) and then I iterate over that
flattened list. This doesn't feel right to me. (Code at end.)

Aim 2
---
The Objects that represent the svg tags use that flatwalk() method to
build the nested list, I'd far prefer flatwalk() to be directly
useable in something like this way:

for container, child in some_tag.flatwalk():
print container, child

The flatwalk() function is (and this is another puzzle see *) kind of
recursive. "For each of my children, tell that child to go
flatwalk()".
(Code at end.)
I am not sure how to turn such a thing into a generator.

I keep wondering how os.walk() does its thing. My hierarchy of Tag
objects can be thought of as directories (tags:g, symbol, svg), files
(path, circle, etc.) and soft-links (use tags).

* If an Instance calls a method on *another* Instance of the *same*
class, is this still recursion? And how does this 'stack up'? I mean,
literally, on the stack. Does each instance get its own stack, or does
all the push, call, pop stuff happen in one main stack?
(I worry about recursion depth limits because svg trees can get quite deep.)


The walking code so far:

## Found code.
def flatten(input):
output = []
stack = []
stack.extend(reversed(input))
while stack:
top = stack.pop()
if isinstance(top, list):
stack.extend(reversed(top))
else:
output.append(top)
return output

not a bad idea. I would rather write it as:

def flatten(input):
output = []
stack = list(input)
while stack:
top = stack.pop()
if isinstance(top, list):
stack.extend(top)
else:
output.append(top)
output.reverse()
return output

If you want to make it a generator function though, the initial version
is better. All you need to do is:

* change the line "output.append(top)" to "yield top"
* delete the line "return output"

Or you can go for the simple recursive approach:

def flatten(lst):
for el in lst:
if isinstance(el, list):
for x in flatten(el):
yield x
else:
yield el
## My walker
def test_walk(e): #lxml Element comes in
obj = ebag_get(e)['obj'] #get a tag object
l=obj.flatwalk()
ll= flatten(l)
#No current solution to get 'parent' in this iterator
#ie. for parent, child in ...
for tag in ll:
print tag

Here's one of my Tag objects:

class Brancher(object):
def __init__(self, elem):
self.elem = elem
self.children = []

def flatwalk(self):
l=[self]
for child in self.children:
l.append( child.flatwalk() ) #recur(ish)ion here
return l

This flattens the list in the flatwalk method (which IMHO it should do
given its name!):

def flatwalk(self):
flatlist = [self]
for child in self.children:
for el is child.flatwalk():
flatlist.append(el)
return flatlist

This turns it into a generator method:

def flatwalk(self):
yield self
for child in self.children:
for el is child.flatwalk():
yield el

This turns it into a generator method which yields parents as well:

def flatwalk(self, parent=None):
yield self, parent
for child in self.children:
for el is child.flatwalk(self):
yield el

Then you can write:

for tag, parent in mytag.flatwalk():
...

Of course, for the above to work, "leaf" objects need a modified
flatwalk method, e.g.:

Class LeafTag:
def flatwalk(self, parent=None):
yield self, parent

HTH (warning: all code untested).
 
D

donn

If you call functions within functions (or methods, it doesn't matter) they
consume stack space

Right, got it. Darn, but at least there's that setrecursionlimit call.

Thanks,
\e
 
D

donn

This flattens the list in the flatwalk method (which IMHO it should
do given its name!):
Heh, I often name things ahead of my actual capacity to implement them!
el is child.flatwalk():
Ah, I see what you mean. I think 'is' is 'in', but I kind of get the idea.
This turns it into a generator method:
And thanks for the generator versions too. I shall hack them in and poke
them with a stick.
Of course, for the above to work, "leaf" objects need a modified
flatwalk method, e.g.:
Yes, My 'stubs' (leaves) do have such, but I will edit to use yield.

Thanks a mill.
\d
 
C

Carl Banks

Each Tag has a flatwalk() method. The return from that is a list which
can be deeply nested. As an example, something like this:
L=[1, [2, 3, [4, [5, 6], 7], 8], [9, 10] ]

(The numbers in the example would actually be Tag Instances.)

Aim 1
---
I'm trying to write a generator around such lists so that I can 'walk' them:

for parent, child in mystery_generator(L):
  print level, item

Right now, I am running a 'flatten' function on the entire list (which I
don't savvy, I found it online) and then I iterate over that flattened
list. This doesn't feel right to me. (Code at end.)


Hmm.

In the past I've argued that iterative techniques are superior to
recursive approaches in terms of readability, understandability, and
conciseness, and thus Python made the right decision to stress
iteration over the Lisp/functional preference for recursion.

I did consider recursion to be superior to operate on branched
structures like trees. However, lately I've started thinking it's
better to use iterative techniques even for situations like that. I
say that as someone with no problem getting my head around recursion.

Even if you disagree, I think there's value in learning iterative
approaches to nested problems, in the same way that there's value to
learning recursive approaches to linear problems. So here it is:


def flatten_iter(s):
stack = list()
stack.extend(reversed(s))
while stack:
item = stack.pop()
if isinstance(item,list):
stack.extend(reversed(item))
else:
yield item


It's simple. Copy the object to flatten onto your stack. Pop one item
off the stack. If the item you popped is a list, push each item of
that list onto the stack. Otherwise yield the value. Loop until stack
is empty.

There's many advantages to iterative approaches:
1. Less function call overhead (in time and space, I'd think)
2. Opportunity to optimize by scanning through the stack, something
you can't do *at all* with recursion
3. Might be able to avoid things like passing around a namespace
4. Iteration is more readable, understandable, and concise in
general (though I'd expect recursion is more refactorable than
iteration so as the system grows the refactorability of recursion will
start to outweigh other factors)

The main advantage of recursion is if you have baggage associated with
processing a node which does needed to be passed around. In the
iterative approach that state has to be stored on the stack. So in
those cases recursion is better.

So I'm not suggesting that recursion be avoided (I've probably written
a dozen recursive functions in the last week), I'm just saying
sometimes it makes sense to use iteration even for problems recursion
is tailor-made for.


Carl Banks
 
Ad

Advertisements

P

Peter Otten

donn said:
Right, got it. Darn, but at least there's that setrecursionlimit call.

But be warned that if you set the limit too high instead of giving you a
RuntimeError your program will segfault.

Peter
 
C

Carl Banks

But be warned that if you set the limit too high instead of giving you a
RuntimeError your program will segfault.

Ah, an advantage of iteration I forgot: no recursion limits or stack
overflows.


Carl Banks
 
D

donn

It's simple. Copy the object to flatten onto your stack. Pop one item
off the stack. If the item you popped is a list, push each item of
that list onto the stack. Otherwise yield the value. Loop until stack
is empty.
Nice. The reversed thing was throwing me, but I think it's so that what
comes first in a list will thus come first (on the end) of the stack.
So I'm not suggesting that recursion be avoided (I've probably written
a dozen recursive functions in the last week), I'm just saying
sometimes it makes sense to use iteration even for problems recursion
is tailor-made for.
Thanks for that. In parsing the XML (using lxml) I actually did my own
stack thing with while loops, to build the entire Tag object 'tree' —
just because I wanted to see how to do it sans recursion. I still get
cold shakes when I scroll past that monster!

On the subject of recursion, and looking at my OP object model: the Tag
objects that model the tags in an SVG file; how would I 'walk' the
object tree without employing recursion?
I am stuck on the eventual need to call child.flatwalk() and bang!
there's recursion.

I get the sense that it would be an external walk() function that does
some stackery-trickery and reuturns/yields the tree/branch — all
divorced from the actual objects. So, no flatwalk() methods needed at
all. This kind of bothers me because it's nice to have objects 'know'
what to do and addressing their siblings and children seems a vital part
of that.

Look at the case of asking a Tag for its XML source:
Say g is a G() instance: print g.get_xml(), would have to do some string
churning (specific to a g tag) and then start walking its children and
ask them to do specific string stuff (in their contexts).
This means I have short methods in each Tag instance that "know" how
to represent themselves as XML and they can return that value.
If I divorce the thing, it becomes a large loop with a lot of
switchy-ifs to engage various blocks of string-fu.

I hope that made sense. I can post my code if that would help.

\d
 
D

donn

But be warned that if you set the limit too high instead of giving you a
RuntimeError your program will segfault.
Silly question: is there any way to tell the future in this case? I
mean, ask for X recursion limit, and catch an error (or something) if
that won't fly.

\d
 
P

Peter Otten

donn said:
Silly question: is there any way to tell the future in this case? I
mean, ask for X recursion limit, and catch an error (or something) if
that won't fly.

I don't think it's a silly question, and I don't think there is an exact
answer, even for a given platform. You'll want to stay well below the
number calculated by the following script:

import os
import subprocess

SCRIPT = "tmp_check.py"

def ok(n):
return subprocess.call(["python", SCRIPT, str(n)]) == 0

if __name__ == "__main__":
if not os.path.exists(SCRIPT):
with open(SCRIPT, "w") as out:
out.write("""\
import sys

def f():
return f()

if __name__ == "__main__":
n = int(sys.argv[1])
sys.setrecursionlimit(n)
try:
f()
except RuntimeError:
pass
""")

low = 1000
while True:
new_low = 2*low
if not ok(new_low):
high = new_low
break
low = new_low

while high - low > 1:
mid = (low + high) // 2
if ok(mid):
low = mid
else:
high = mid
print "max recursion limit", low, high

BTW, I didn't expect it but I get different results on different runs.

Peter
 
Ad

Advertisements

D

donn

BTW, I didn't expect it but I get different results on different
runs.
Clever code. I will give it a go soonest. Elec off for the next 24 hours
in my neck of the woods. Urgh. Python can't "import electricity" just yet :)

\d
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top