Walking deeply nested lists

Discussion in 'Python' started by donn, Aug 28, 2010.

  1. donn

    donn Guest

    This is all about walking trees, recursion and generators. None of which
    fit my brain at all!

    From an XML tree (an SVG file) I build a bunch of Tag objects.

    [I use lxml, but I am combining multiple svg files into a 'forest' of
    trees, so I can't use the lxml walking methods because they all stop at
    the 'end' of a branch where there is actually a 'link' over to another
    tree.]

    Each Tag has a flatwalk() method. The return from that is a list which
    can be deeply nested. As an example, something like this:
    L=[1, [2, 3, [4, [5, 6], 7], 8], [9, 10] ]

    (The numbers in the example would actually be Tag Instances.)

    Aim 1
    ---
    I'm trying to write a generator around such lists so that I can 'walk' them:

    for parent, child in mystery_generator(L):
    print level, item

    Right now, I am running a 'flatten' function on the entire list (which I
    don't savvy, I found it online) and then I iterate over that flattened
    list. This doesn't feel right to me. (Code at end.)

    Aim 2
    ---
    The Objects that represent the svg tags use that flatwalk() method to
    build the nested list, I'd far prefer flatwalk() to be directly useable
    in something like this way:

    for container, child in some_tag.flatwalk():
    print container, child

    The flatwalk() function is (and this is another puzzle see *) kind of
    recursive. "For each of my children, tell that child to go flatwalk()".
    (Code at end.)
    I am not sure how to turn such a thing into a generator.

    I keep wondering how os.walk() does its thing. My hierarchy of Tag
    objects can be thought of as directories (tags:g, symbol, svg), files
    (path, circle, etc.) and soft-links (use tags).

    * If an Instance calls a method on *another* Instance of the *same*
    class, is this still recursion? And how does this 'stack up'? I mean,
    literally, on the stack. Does each instance get its own stack, or does
    all the push, call, pop stuff happen in one main stack?
    (I worry about recursion depth limits because svg trees can get quite deep.)


    The walking code so far:

    ## Found code.
    def flatten(input):
    output = []
    stack = []
    stack.extend(reversed(input))
    while stack:
    top = stack.pop()
    if isinstance(top, list):
    stack.extend(reversed(top))
    else:
    output.append(top)
    return output

    ## My walker
    def test_walk(e): #lxml Element comes in
    obj = ebag_get(e)['obj'] #get a tag object
    l=obj.flatwalk()
    ll= flatten(l)
    #No current solution to get 'parent' in this iterator
    #ie. for parent, child in ...
    for tag in ll:
    print tag

    Here's one of my Tag objects:

    class Brancher(object):
    def __init__(self, elem):
    self.elem = elem
    self.children = []

    def flatwalk(self):
    l=[self]
    for child in self.children:
    l.append( child.flatwalk() ) #recur(ish)ion here
    return l

    class G( Brancher ): #Object for <g> tags.
    pass

    \d
    donn, Aug 28, 2010
    #1
    1. Advertising

  2. donn

    Peter Otten Guest

    donn wrote:

    > * If an Instance calls a method on another Instance of the same
    > class, is this still recursion? And how does this 'stack up'? I mean,
    > literally, on the stack. Does each instance get its own stack, or does
    > all the push, call, pop stuff happen in one main stack?
    > (I worry about recursion depth limits because svg trees can get quite
    > deep.)


    If you call functions within functions (or methods, it doesn't matter) they
    consume stack space, e. g:

    >>> def alpha():

    .... return beta()
    ....
    >>> def beta():

    .... return gamma()
    ....
    >>> import random
    >>> def gamma():

    .... return random.choice([alpha, beta, gamma])()
    ....
    >>> import sys
    >>> sys.setrecursionlimit(10)
    >>> alpha()

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 2, in alpha
    File "<stdin>", line 2, in beta
    File "<stdin>", line 2, in gamma
    File "<stdin>", line 2, in gamma
    File "<stdin>", line 2, in alpha
    File "<stdin>", line 2, in beta
    File "<stdin>", line 2, in gamma
    File "<stdin>", line 2, in beta
    File "<stdin>", line 2, in gamma
    RuntimeError: maximum recursion depth exceeded

    The normal recursion limit is 1000, I'm reducing it to give you a smaller
    traceback.

    Peter
    Peter Otten, Aug 28, 2010
    #2
    1. Advertising

  3. donn <> writes:

    > This is all about walking trees, recursion and generators. None of
    > which fit my brain at all!
    >
    > From an XML tree (an SVG file) I build a bunch of Tag objects.
    >
    > [I use lxml, but I am combining multiple svg files into a 'forest' of
    > trees, so I can't use the lxml walking methods because they all stop
    > at the 'end' of a branch where there is actually a 'link' over to
    > another tree.]
    >
    > Each Tag has a flatwalk() method. The return from that is a list which
    > can be deeply nested. As an example, something like this:
    > L=[1, [2, 3, [4, [5, 6], 7], 8], [9, 10] ]
    >
    > (The numbers in the example would actually be Tag Instances.)
    >
    > Aim 1
    > ---
    > I'm trying to write a generator around such lists so that I can 'walk' them:
    >
    > for parent, child in mystery_generator(L):
    > print level, item
    >
    > Right now, I am running a 'flatten' function on the entire list (which
    > I don't savvy, I found it online) and then I iterate over that
    > flattened list. This doesn't feel right to me. (Code at end.)
    >
    > Aim 2
    > ---
    > The Objects that represent the svg tags use that flatwalk() method to
    > build the nested list, I'd far prefer flatwalk() to be directly
    > useable in something like this way:
    >
    > for container, child in some_tag.flatwalk():
    > print container, child
    >
    > The flatwalk() function is (and this is another puzzle see *) kind of
    > recursive. "For each of my children, tell that child to go
    > flatwalk()".
    > (Code at end.)
    > I am not sure how to turn such a thing into a generator.
    >
    > I keep wondering how os.walk() does its thing. My hierarchy of Tag
    > objects can be thought of as directories (tags:g, symbol, svg), files
    > (path, circle, etc.) and soft-links (use tags).
    >
    > * If an Instance calls a method on *another* Instance of the *same*
    > class, is this still recursion? And how does this 'stack up'? I mean,
    > literally, on the stack. Does each instance get its own stack, or does
    > all the push, call, pop stuff happen in one main stack?
    > (I worry about recursion depth limits because svg trees can get quite deep.)
    >
    >
    > The walking code so far:
    >
    > ## Found code.
    > def flatten(input):
    > output = []
    > stack = []
    > stack.extend(reversed(input))
    > while stack:
    > top = stack.pop()
    > if isinstance(top, list):
    > stack.extend(reversed(top))
    > else:
    > output.append(top)
    > return output


    not a bad idea. I would rather write it as:

    def flatten(input):
    output = []
    stack = list(input)
    while stack:
    top = stack.pop()
    if isinstance(top, list):
    stack.extend(top)
    else:
    output.append(top)
    output.reverse()
    return output

    If you want to make it a generator function though, the initial version
    is better. All you need to do is:

    * change the line "output.append(top)" to "yield top"
    * delete the line "return output"

    Or you can go for the simple recursive approach:

    def flatten(lst):
    for el in lst:
    if isinstance(el, list):
    for x in flatten(el):
    yield x
    else:
    yield el

    > ## My walker
    > def test_walk(e): #lxml Element comes in
    > obj = ebag_get(e)['obj'] #get a tag object
    > l=obj.flatwalk()
    > ll= flatten(l)
    > #No current solution to get 'parent' in this iterator
    > #ie. for parent, child in ...
    > for tag in ll:
    > print tag
    >
    > Here's one of my Tag objects:
    >
    > class Brancher(object):
    > def __init__(self, elem):
    > self.elem = elem
    > self.children = []
    >
    > def flatwalk(self):
    > l=[self]
    > for child in self.children:
    > l.append( child.flatwalk() ) #recur(ish)ion here
    > return l


    This flattens the list in the flatwalk method (which IMHO it should do
    given its name!):

    def flatwalk(self):
    flatlist = [self]
    for child in self.children:
    for el is child.flatwalk():
    flatlist.append(el)
    return flatlist

    This turns it into a generator method:

    def flatwalk(self):
    yield self
    for child in self.children:
    for el is child.flatwalk():
    yield el

    This turns it into a generator method which yields parents as well:

    def flatwalk(self, parent=None):
    yield self, parent
    for child in self.children:
    for el is child.flatwalk(self):
    yield el

    Then you can write:

    for tag, parent in mytag.flatwalk():
    ...

    Of course, for the above to work, "leaf" objects need a modified
    flatwalk method, e.g.:

    Class LeafTag:
    def flatwalk(self, parent=None):
    yield self, parent

    HTH (warning: all code untested).

    --
    Arnaud
    Arnaud Delobelle, Aug 28, 2010
    #3
  4. donn

    donn Guest

    On 28/08/2010 08:43, Peter Otten wrote:
    > If you call functions within functions (or methods, it doesn't matter) they
    > consume stack space


    Right, got it. Darn, but at least there's that setrecursionlimit call.

    Thanks,
    \e
    donn, Aug 28, 2010
    #4
  5. donn

    donn Guest

    On 28/08/2010 09:21, Arnaud Delobelle wrote:

    > This flattens the list in the flatwalk method (which IMHO it should
    > do given its name!):

    Heh, I often name things ahead of my actual capacity to implement them!

    > el is child.flatwalk():

    Ah, I see what you mean. I think 'is' is 'in', but I kind of get the idea.

    > This turns it into a generator method:

    And thanks for the generator versions too. I shall hack them in and poke
    them with a stick.

    > Of course, for the above to work, "leaf" objects need a modified
    > flatwalk method, e.g.:

    Yes, My 'stubs' (leaves) do have such, but I will edit to use yield.

    Thanks a mill.
    \d
    donn, Aug 28, 2010
    #5
  6. donn

    Carl Banks Guest

    On Aug 27, 8:21 pm, donn <> wrote:

    > Each Tag has a flatwalk() method. The return from that is a list which
    > can be deeply nested. As an example, something like this:
    > L=[1, [2, 3, [4, [5, 6], 7], 8], [9, 10] ]
    >
    > (The numbers in the example would actually be Tag Instances.)
    >
    > Aim 1
    > ---
    > I'm trying to write a generator around such lists so that I can 'walk' them:
    >
    > for parent, child in mystery_generator(L):
    >   print level, item
    >
    > Right now, I am running a 'flatten' function on the entire list (which I
    > don't savvy, I found it online) and then I iterate over that flattened
    > list. This doesn't feel right to me. (Code at end.)



    Hmm.

    In the past I've argued that iterative techniques are superior to
    recursive approaches in terms of readability, understandability, and
    conciseness, and thus Python made the right decision to stress
    iteration over the Lisp/functional preference for recursion.

    I did consider recursion to be superior to operate on branched
    structures like trees. However, lately I've started thinking it's
    better to use iterative techniques even for situations like that. I
    say that as someone with no problem getting my head around recursion.

    Even if you disagree, I think there's value in learning iterative
    approaches to nested problems, in the same way that there's value to
    learning recursive approaches to linear problems. So here it is:


    def flatten_iter(s):
    stack = list()
    stack.extend(reversed(s))
    while stack:
    item = stack.pop()
    if isinstance(item,list):
    stack.extend(reversed(item))
    else:
    yield item


    It's simple. Copy the object to flatten onto your stack. Pop one item
    off the stack. If the item you popped is a list, push each item of
    that list onto the stack. Otherwise yield the value. Loop until stack
    is empty.

    There's many advantages to iterative approaches:
    1. Less function call overhead (in time and space, I'd think)
    2. Opportunity to optimize by scanning through the stack, something
    you can't do *at all* with recursion
    3. Might be able to avoid things like passing around a namespace
    4. Iteration is more readable, understandable, and concise in
    general (though I'd expect recursion is more refactorable than
    iteration so as the system grows the refactorability of recursion will
    start to outweigh other factors)

    The main advantage of recursion is if you have baggage associated with
    processing a node which does needed to be passed around. In the
    iterative approach that state has to be stored on the stack. So in
    those cases recursion is better.

    So I'm not suggesting that recursion be avoided (I've probably written
    a dozen recursive functions in the last week), I'm just saying
    sometimes it makes sense to use iteration even for problems recursion
    is tailor-made for.


    Carl Banks
    Carl Banks, Aug 28, 2010
    #6
  7. donn

    Peter Otten Guest

    donn wrote:

    > On 28/08/2010 08:43, Peter Otten wrote:
    >> If you call functions within functions (or methods, it doesn't matter)
    >> they consume stack space

    >
    > Right, got it. Darn, but at least there's that setrecursionlimit call.


    But be warned that if you set the limit too high instead of giving you a
    RuntimeError your program will segfault.

    Peter
    Peter Otten, Aug 28, 2010
    #7
  8. donn

    Carl Banks Guest

    On Aug 28, 3:03 am, Peter Otten <> wrote:
    > donn wrote:
    > > On 28/08/2010 08:43, Peter Otten wrote:
    > >> If you call functions within functions (or methods, it doesn't matter)
    > >> they consume stack space

    >
    > > Right, got it. Darn, but at least there's that setrecursionlimit call.

    >
    > But be warned that if you set the limit too high instead of giving you a
    > RuntimeError your program will segfault.


    Ah, an advantage of iteration I forgot: no recursion limits or stack
    overflows.


    Carl Banks
    Carl Banks, Aug 28, 2010
    #8
  9. donn

    donn Guest

    On 28/08/2010 11:17, Carl Banks wrote:
    > It's simple. Copy the object to flatten onto your stack. Pop one item
    > off the stack. If the item you popped is a list, push each item of
    > that list onto the stack. Otherwise yield the value. Loop until stack
    > is empty.

    Nice. The reversed thing was throwing me, but I think it's so that what
    comes first in a list will thus come first (on the end) of the stack.

    > So I'm not suggesting that recursion be avoided (I've probably written
    > a dozen recursive functions in the last week), I'm just saying
    > sometimes it makes sense to use iteration even for problems recursion
    > is tailor-made for.
    >

    Thanks for that. In parsing the XML (using lxml) I actually did my own
    stack thing with while loops, to build the entire Tag object 'tree' —
    just because I wanted to see how to do it sans recursion. I still get
    cold shakes when I scroll past that monster!

    On the subject of recursion, and looking at my OP object model: the Tag
    objects that model the tags in an SVG file; how would I 'walk' the
    object tree without employing recursion?
    I am stuck on the eventual need to call child.flatwalk() and bang!
    there's recursion.

    I get the sense that it would be an external walk() function that does
    some stackery-trickery and reuturns/yields the tree/branch — all
    divorced from the actual objects. So, no flatwalk() methods needed at
    all. This kind of bothers me because it's nice to have objects 'know'
    what to do and addressing their siblings and children seems a vital part
    of that.

    Look at the case of asking a Tag for its XML source:
    Say g is a G() instance: print g.get_xml(), would have to do some string
    churning (specific to a g tag) and then start walking its children and
    ask them to do specific string stuff (in their contexts).
    This means I have short methods in each Tag instance that "know" how
    to represent themselves as XML and they can return that value.
    If I divorce the thing, it becomes a large loop with a lot of
    switchy-ifs to engage various blocks of string-fu.

    I hope that made sense. I can post my code if that would help.

    \d
    donn, Aug 28, 2010
    #9
  10. donn

    donn Guest

    On 28/08/2010 12:03, Peter Otten wrote:
    > But be warned that if you set the limit too high instead of giving you a
    > RuntimeError your program will segfault.

    Silly question: is there any way to tell the future in this case? I
    mean, ask for X recursion limit, and catch an error (or something) if
    that won't fly.

    \d
    donn, Aug 28, 2010
    #10
  11. donn

    Peter Otten Guest

    donn wrote:

    > On 28/08/2010 12:03, Peter Otten wrote:
    >> But be warned that if you set the limit too high instead of giving you a
    >> RuntimeError your program will segfault.

    > Silly question: is there any way to tell the future in this case? I
    > mean, ask for X recursion limit, and catch an error (or something) if
    > that won't fly.


    I don't think it's a silly question, and I don't think there is an exact
    answer, even for a given platform. You'll want to stay well below the
    number calculated by the following script:

    import os
    import subprocess

    SCRIPT = "tmp_check.py"

    def ok(n):
    return subprocess.call(["python", SCRIPT, str(n)]) == 0

    if __name__ == "__main__":
    if not os.path.exists(SCRIPT):
    with open(SCRIPT, "w") as out:
    out.write("""\
    import sys

    def f():
    return f()

    if __name__ == "__main__":
    n = int(sys.argv[1])
    sys.setrecursionlimit(n)
    try:
    f()
    except RuntimeError:
    pass
    """)

    low = 1000
    while True:
    new_low = 2*low
    if not ok(new_low):
    high = new_low
    break
    low = new_low

    while high - low > 1:
    mid = (low + high) // 2
    if ok(mid):
    low = mid
    else:
    high = mid
    print "max recursion limit", low, high

    BTW, I didn't expect it but I get different results on different runs.

    Peter
    Peter Otten, Aug 28, 2010
    #11
  12. donn

    donn Guest

    On 28/08/2010 14:41, Peter Otten wrote:
    > BTW, I didn't expect it but I get different results on different
    > runs.

    Clever code. I will give it a go soonest. Elec off for the next 24 hours
    in my neck of the woods. Urgh. Python can't "import electricity" just yet :)

    \d
    donn, Aug 28, 2010
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Edward C. Jones

    Pickle vs. eval for deeply nested objects

    Edward C. Jones, Feb 18, 2004, in forum: Python
    Replies:
    0
    Views:
    484
    Edward C. Jones
    Feb 18, 2004
  2. Ori Y
    Replies:
    1
    Views:
    402
    Bob Ippolito
    Feb 28, 2004
  3. Kirk Strauser
    Replies:
    1
    Views:
    298
    Kirk Strauser
    Jun 11, 2004
  4. Kay Schluehr
    Replies:
    7
    Views:
    341
    Kay Schluehr
    Jan 20, 2006
  5. Replies:
    3
    Views:
    106
Loading...

Share This Page