perl graph theory (using Graph::Directed)

I

IRR

Hi all,
I'm fairly new to perl and graph theory, so hopefully this question isn't
too ignorant on both counts. I'm trying to input basically an edge list for
a graph, and get an output of the connected graphs that all of these edges
comprise. So for example:
Edge list input:
A B
A C
B A
D E
Perl script output:
A+B+C,D+E
using strongly_connected_graph (described in Graph::Base). Admittedly,
strongly_connected_graph is a black box to me -- I'm not sure if its doing a
DFS versus BFS or ?.
But what is clear is that this scales exponentially -- O(n^2.3) or so, based
on a few test sets) -- in time versus the total number of input edges. This
is fine for a few smaller edge lists that I have, but my eventual goal is to
do this for a set with ~50,000 vertices and ~2 million edges total. Based
on those test sets I mentioned, this size dataset will take ~7000 hours to
run on my current system, which obviously isn't reasonable -- and that's not
even accounting for things getting choked up in inputing that amount of data
with add_edge!
So my question is, am I approaching this problem with the right tools? Is
there another way within the Graph module to do this more efficiently?
Thanks,
I.R.
 
G

Greg Bacon

: I'm fairly new to perl and graph theory, so hopefully this question
: isn't too ignorant on both counts. I'm trying to input basically an
: edge list for a graph, and get an output of the connected graphs that
: all of these edges comprise. [...]

Do you want connected subgraphs or strongly connected? You say
connected here, but you're searching for strongly connected subgraphs
with Graph::Base::strongly_connected_graph.

You say you're new to graph theory, so I'll give quick overviews in
case you don't know the difference. A connected graph (perhaps called
weakly connected for emphasis) is a graph such that for each vertex v,
v is reachable from some other vertex w. A strongly connected graph is
one such that for each pair of vertices v and w, there's a path from v
to w and a path from w to v.

Below is a rendering in Perl of the algorithm from Baase that
enumerates weakly connected subgraphs. The author reports time
complexity of Theta(max(n,m)) with space Theta(n+m), where n is the
number of vertices and m is the number of edges.

#! /usr/local/bin/perl

use warnings;
use strict;

# from Baase's *Computer Algorithms*, 2nd ed., pg. 180

sub dfs {
my $v = shift;
my $adj = shift;
my $mark = shift;
my $tag = shift;

$mark->{$v} = $tag;

while ($adj->{$v} && @{ $adj->{$v} }) {
my $w = shift @{ $adj->{$v} };

dfs($w,$adj,$mark,$tag) unless $mark->{$w};
}
}

sub concomps {
# destructive, so copy
my %adj = map ref($_) ? [ @$_ ] : $_, %{ shift @_ };
my @v = @_;

my %mark;

my $componentNumber = 0;

foreach my $v (@v) {
unless ($mark{$v}) {
++$componentNumber;

dfs $v, \%adj, \%mark, $componentNumber;
}
}

my %subgraph;
for (keys %mark) {
push @{ $subgraph{$mark{$_}} }, $_;
}

values %subgraph;
}

## main
my %adj;
my %v;

while (<DATA>) {
my($v,$w) = split;

# treat graph as undirected
push @{ $adj{$v} } => $w;
push @{ $adj{$w} } => $v;

++$v{$v};
++$v{$w};
}

print join("," => map join("+", sort @$_), concomps \%adj, keys %v), "\n";

__DATA__
A B
A C
B A
D E

The author used arrays, and the code above uses hashes to be more
Perlish but with time and space penalties. If the size of your input
makes you consider switching back to arrays, you might also consider
switching to a lower-level language.

Hope this helps,
Greg
 
I

IRR

Thanks Greg,
You're exactly right, and I can see how strongly_connected_graph could
really eat up a substantial amount of time. That script you gave works
seamlessly and gave exactly what I was after with about 2 lines of
modification -- and it did so around 100x faster than my previous attempt!
Also picked up a copy of the book at the local library (1st edition, but it
still has an informative section on graph algorithms).
Hugely obliged,
I.R.


Greg Bacon said:
: I'm fairly new to perl and graph theory, so hopefully this question
: isn't too ignorant on both counts. I'm trying to input basically an
: edge list for a graph, and get an output of the connected graphs that
: all of these edges comprise. [...]

Do you want connected subgraphs or strongly connected? You say
connected here, but you're searching for strongly connected subgraphs
with Graph::Base::strongly_connected_graph.

You say you're new to graph theory, so I'll give quick overviews in
case you don't know the difference. A connected graph (perhaps called
weakly connected for emphasis) is a graph such that for each vertex v,
v is reachable from some other vertex w. A strongly connected graph is
one such that for each pair of vertices v and w, there's a path from v
to w and a path from w to v.

Below is a rendering in Perl of the algorithm from Baase that
enumerates weakly connected subgraphs. The author reports time
complexity of Theta(max(n,m)) with space Theta(n+m), where n is the
number of vertices and m is the number of edges.

#! /usr/local/bin/perl

use warnings;
use strict;

# from Baase's *Computer Algorithms*, 2nd ed., pg. 180

sub dfs {
my $v = shift;
my $adj = shift;
my $mark = shift;
my $tag = shift;

$mark->{$v} = $tag;

while ($adj->{$v} && @{ $adj->{$v} }) {
my $w = shift @{ $adj->{$v} };

dfs($w,$adj,$mark,$tag) unless $mark->{$w};
}
}

sub concomps {
# destructive, so copy
my %adj = map ref($_) ? [ @$_ ] : $_, %{ shift @_ };
my @v = @_;

my %mark;

my $componentNumber = 0;

foreach my $v (@v) {
unless ($mark{$v}) {
++$componentNumber;

dfs $v, \%adj, \%mark, $componentNumber;
}
}

my %subgraph;
for (keys %mark) {
push @{ $subgraph{$mark{$_}} }, $_;
}

values %subgraph;
}

## main
my %adj;
my %v;

while (<DATA>) {
my($v,$w) = split;

# treat graph as undirected
push @{ $adj{$v} } => $w;
push @{ $adj{$w} } => $v;

++$v{$v};
++$v{$w};
}

print join("," => map join("+", sort @$_), concomps \%adj, keys %v), "\n";

__DATA__
A B
A C
B A
D E

The author used arrays, and the code above uses hashes to be more
Perlish but with time and space penalties. If the size of your input
makes you consider switching back to arrays, you might also consider
switching to a lower-level language.

Hope this helps,
Greg
--
Those who wish to preserve freedom should recognize, however, that inflation
is probably the most important single factor in that vicious circle wherein
one kind of government action makes more and more government control necessary.
-- F. A. Hayek
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top