Graph algorithms - DFS, generators callbacks, and optimisation

Discussion in 'Python' started by Paul Moore, Nov 29, 2003.

  1. Paul Moore

    Paul Moore Guest

    I'm trying to check a graph (represented in the usual Python adjacency
    list format) for loops. This is dead simple - you use a depth-first
    search, and look out for "back edges" (edges from a vertex "back" to
    one you've already seen).

    I already have a DFS algorithm which generates each edge in turn. It's
    great for what it does, but it ignores back edges.

    def DFS(graph, start, seen = None):
    if not seen:
    seen = {start: 1}
    for v in graph[start]:
    if v not in seen:
    seen[v] = 1
    yield start, v
    if v in graph:
    for e in DFS(graph, v, seen):
    yield e

    I could code a variation on this, just for this one problem, but that
    sort of cut and paste coding offends me (and DFS is *just* fiddly
    enough that I'm not confident of getting it right each time I
    reimplement it). So what I'd rather do is enhance my existing code to
    be a bit more general.

    I have a C++ reference which implements a highly general DFS
    algorithm, using the visitor pattern (pass a visitor object to the
    function, and various callback methods are called at points in the
    algorithm - there are discover_vertex, discover_edge, back_edge, etc
    callbacks).

    I could code something like this (the pseudocode for the algorithm is
    almost executable in Python!) but it doesn't feel particularly
    "pythonic". Having to write a visitor class for each use case feels
    unwieldy (as well as being perceptibly slower than the generator
    implementation). Having the various callbacks as arguments (pass a
    callable or None) isn't much faster or cleaner.

    Can anybody think of a good way of making the DFS code above a little
    more generic, without losing the efficient and simple (from the
    caller's POV) approach of the current version?

    Paul.
     
    Paul Moore, Nov 29, 2003
    #1
    1. Advertisements

  2. Paul Moore

    Robin Becker Guest

    Well in this case couldn't you make seen do the detection for you.
    If I understand your analysis you could make the seen dictionary a
    special kind of dictionary.

    ie

    class DFSSeen(dict):
    def __contains__(self,x):
    r = super(DFSSeen,self).__contains__(x)
    if r:
    print 'back link to',x
    return r



    G = {
    1: (2,3),
    2: (3,5),
    3: (4,),
    4: (6,),
    5: (2,6),
    6: (1,),
    }
    for v in DFS(G,1,DFSSeen({1:1})):
    print v

    using the above I get
    C:\tmp>dfs.py
    (1, 2)
    (2, 3)
    (3, 4)
    (4, 6)
    back link to 1
    (2, 5)
    back link to 2
    back link to 6
    back link to 3

    so cycle detection is possible without altering the original code, but
    it probably isn't sufficient for your purposes in that we don't have the
    start vertex.

    writes

     
    Robin Becker, Nov 29, 2003
    #2
    1. Advertisements

  3. Paul Moore

    Paul Moore Guest

    Interesting trick. I hadn't thought of this, mainly because "seen" is
    only in the parameter list to allow the recursive call to work - I'd
    never intended it to be passed by the user. (I have a non-recursive
    version of DFS without a "seen" parameter).

    More general DFS algorithms have a "colour" dictionary, where vertices
    can be "White" (not seen) "Black" (seen) or "Grey" (essentially,
    "still being looked at"). I'm not sure how that would interact with
    this trick.

    Hmm. I wish I had a "big" use case where issues like this matter. But
    all of my uses are small - the sort of thing where a general library
    helps a lot, because the whole project is too small to justify
    (re-)implementing the algorithm. But with just small examples, you
    never get the breadth of use cases to make the optimal design obvious.

    Or maybe I'm just a lousy library designer :)

    Time to stop rambling and do something productive...

    Paul.
     
    Paul Moore, Nov 29, 2003
    #3
  4. Not very tested, but how about (borrowing Robin's example data):

    ====< PaulMoore.py >=============================================
    def DFS2(graph, start, rlevel=0, seen = None):
    if seen is None: seen = {}
    seen[start] = 1 # allow for jumping into arbitrary subtrees with
    # same seen dict in new generator ?
    for v in graph[start]:
    is_back = v in seen
    seen[v] = True # ok if redundant
    yield (start, v), is_back, rlevel
    if not is_back:
    if v in graph:
    for e in DFS2(graph, v, rlevel+1, seen):
    yield e

    if __name__ == '__main__':
    G = {
    1: (2,3),
    2: (3,5),
    3: (4,),
    4: (6,),
    5: (2,6),
    6: (1,),
    }
    for e, is_back, level in DFS2(G, 1):
    print '%s%s -> %s%s' %(level*' ',e[0], e[1], ('',' (back)')[is_back])
    =================================================================

    The output is:

    [15:15] C:\pywk\clp>PaulMoore.py
    1 -> 2
    2 -> 3
    3 -> 4
    4 -> 6
    6 -> 1 (back)
    2 -> 5
    5 -> 2 (back)
    5 -> 6 (back)
    1 -> 3 (back)
    My .02 above ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Nov 29, 2003
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.