Graph algorithms - DFS, generators callbacks, and optimisation

P

Paul Moore

I'm trying to check a graph (represented in the usual Python adjacency
list format) for loops. This is dead simple - you use a depth-first
search, and look out for "back edges" (edges from a vertex "back" to
one you've already seen).

I already have a DFS algorithm which generates each edge in turn. It's
great for what it does, but it ignores back edges.

def DFS(graph, start, seen = None):
if not seen:
seen = {start: 1}
for v in graph[start]:
if v not in seen:
seen[v] = 1
yield start, v
if v in graph:
for e in DFS(graph, v, seen):
yield e

I could code a variation on this, just for this one problem, but that
sort of cut and paste coding offends me (and DFS is *just* fiddly
enough that I'm not confident of getting it right each time I
reimplement it). So what I'd rather do is enhance my existing code to
be a bit more general.

I have a C++ reference which implements a highly general DFS
algorithm, using the visitor pattern (pass a visitor object to the
function, and various callback methods are called at points in the
algorithm - there are discover_vertex, discover_edge, back_edge, etc
callbacks).

I could code something like this (the pseudocode for the algorithm is
almost executable in Python!) but it doesn't feel particularly
"pythonic". Having to write a visitor class for each use case feels
unwieldy (as well as being perceptibly slower than the generator
implementation). Having the various callbacks as arguments (pass a
callable or None) isn't much faster or cleaner.

Can anybody think of a good way of making the DFS code above a little
more generic, without losing the efficient and simple (from the
caller's POV) approach of the current version?

Paul.
 
R

Robin Becker

Well in this case couldn't you make seen do the detection for you.
If I understand your analysis you could make the seen dictionary a
special kind of dictionary.

ie

class DFSSeen(dict):
def __contains__(self,x):
r = super(DFSSeen,self).__contains__(x)
if r:
print 'back link to',x
return r



G = {
1: (2,3),
2: (3,5),
3: (4,),
4: (6,),
5: (2,6),
6: (1,),
}
for v in DFS(G,1,DFSSeen({1:1})):
print v

using the above I get
C:\tmp>dfs.py
(1, 2)
(2, 3)
(3, 4)
(4, 6)
back link to 1
(2, 5)
back link to 2
back link to 6
back link to 3

so cycle detection is possible without altering the original code, but
it probably isn't sufficient for your purposes in that we don't have the
start vertex.

writes

I'm trying to check a graph (represented in the usual Python adjacency
list format) for loops. This is dead simple - you use a depth-first
search, and look out for "back edges" (edges from a vertex "back" to
one you've already seen).

I already have a DFS algorithm which generates each edge in turn. It's
great for what it does, but it ignores back edges.

def DFS(graph, start, seen = None):
if not seen:
seen = {start: 1}
for v in graph[start]:
if v not in seen:
seen[v] = 1
yield start, v
if v in graph:
for e in DFS(graph, v, seen):
yield e

I could code a variation on this, just for this one problem, but that
sort of cut and paste coding offends me (and DFS is *just* fiddly
enough that I'm not confident of getting it right each time I
reimplement it). So what I'd rather do is enhance my existing code to
be a bit more general.

I have a C++ reference which implements a highly general DFS
algorithm, using the visitor pattern (pass a visitor object to the
function, and various callback methods are called at points in the
algorithm - there are discover_vertex, discover_edge, back_edge, etc
callbacks).

I could code something like this (the pseudocode for the algorithm is
almost executable in Python!) but it doesn't feel particularly
"pythonic". Having to write a visitor class for each use case feels
unwieldy (as well as being perceptibly slower than the generator
implementation). Having the various callbacks as arguments (pass a
callable or None) isn't much faster or cleaner.

Can anybody think of a good way of making the DFS code above a little
more generic, without losing the efficient and simple (from the
caller's POV) approach of the current version?

Paul.
 
P

Paul Moore

Robin Becker said:
Well in this case couldn't you make seen do the detection for you.
If I understand your analysis you could make the seen dictionary a
special kind of dictionary.

Interesting trick. I hadn't thought of this, mainly because "seen" is
only in the parameter list to allow the recursive call to work - I'd
never intended it to be passed by the user. (I have a non-recursive
version of DFS without a "seen" parameter).

More general DFS algorithms have a "colour" dictionary, where vertices
can be "White" (not seen) "Black" (seen) or "Grey" (essentially,
"still being looked at"). I'm not sure how that would interact with
this trick.

Hmm. I wish I had a "big" use case where issues like this matter. But
all of my uses are small - the sort of thing where a general library
helps a lot, because the whole project is too small to justify
(re-)implementing the algorithm. But with just small examples, you
never get the breadth of use cases to make the optimal design obvious.

Or maybe I'm just a lousy library designer :)

Time to stop rambling and do something productive...

Paul.
 
B

Bengt Richter

I'm trying to check a graph (represented in the usual Python adjacency
list format) for loops. This is dead simple - you use a depth-first
search, and look out for "back edges" (edges from a vertex "back" to
one you've already seen).

I already have a DFS algorithm which generates each edge in turn. It's
great for what it does, but it ignores back edges.

def DFS(graph, start, seen = None):
if not seen:
seen = {start: 1}
for v in graph[start]:
if v not in seen:
seen[v] = 1
yield start, v
if v in graph:
for e in DFS(graph, v, seen):
yield e

I could code a variation on this, just for this one problem, but that
sort of cut and paste coding offends me (and DFS is *just* fiddly
enough that I'm not confident of getting it right each time I
reimplement it). So what I'd rather do is enhance my existing code to
be a bit more general.
Not very tested, but how about (borrowing Robin's example data):

====< PaulMoore.py >=============================================
def DFS2(graph, start, rlevel=0, seen = None):
if seen is None: seen = {}
seen[start] = 1 # allow for jumping into arbitrary subtrees with
# same seen dict in new generator ?
for v in graph[start]:
is_back = v in seen
seen[v] = True # ok if redundant
yield (start, v), is_back, rlevel
if not is_back:
if v in graph:
for e in DFS2(graph, v, rlevel+1, seen):
yield e

if __name__ == '__main__':
G = {
1: (2,3),
2: (3,5),
3: (4,),
4: (6,),
5: (2,6),
6: (1,),
}
for e, is_back, level in DFS2(G, 1):
print '%s%s -> %s%s' %(level*' ',e[0], e[1], ('',' (back)')[is_back])
=================================================================

The output is:

[15:15] C:\pywk\clp>PaulMoore.py
1 -> 2
2 -> 3
3 -> 4
4 -> 6
6 -> 1 (back)
2 -> 5
5 -> 2 (back)
5 -> 6 (back)
1 -> 3 (back)
Can anybody think of a good way of making the DFS code above a little
more generic, without losing the efficient and simple (from the
caller's POV) approach of the current version?
My .02 above ;-)

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top