perl graph theory (using Graph::Directed)

Discussion in 'Perl Misc' started by IRR, Jul 26, 2004.

  1. IRR

    IRR Guest

    Hi all,
    I'm fairly new to perl and graph theory, so hopefully this question isn't
    too ignorant on both counts. I'm trying to input basically an edge list for
    a graph, and get an output of the connected graphs that all of these edges
    comprise. So for example:
    Edge list input:
    A B
    A C
    B A
    D E
    Perl script output:
    A+B+C,D+E
    using strongly_connected_graph (described in Graph::Base). Admittedly,
    strongly_connected_graph is a black box to me -- I'm not sure if its doing a
    DFS versus BFS or ?.
    But what is clear is that this scales exponentially -- O(n^2.3) or so, based
    on a few test sets) -- in time versus the total number of input edges. This
    is fine for a few smaller edge lists that I have, but my eventual goal is to
    do this for a set with ~50,000 vertices and ~2 million edges total. Based
    on those test sets I mentioned, this size dataset will take ~7000 hours to
    run on my current system, which obviously isn't reasonable -- and that's not
    even accounting for things getting choked up in inputing that amount of data
    with add_edge!
    So my question is, am I approaching this problem with the right tools? Is
    there another way within the Graph module to do this more efficiently?
    Thanks,
    I.R.
     
    IRR, Jul 26, 2004
    #1
    1. Advertising

  2. IRR

    Greg Bacon Guest

    In article <z1YMc.454$>,
    IRR <> wrote:

    : I'm fairly new to perl and graph theory, so hopefully this question
    : isn't too ignorant on both counts. I'm trying to input basically an
    : edge list for a graph, and get an output of the connected graphs that
    : all of these edges comprise. [...]

    Do you want connected subgraphs or strongly connected? You say
    connected here, but you're searching for strongly connected subgraphs
    with Graph::Base::strongly_connected_graph.

    You say you're new to graph theory, so I'll give quick overviews in
    case you don't know the difference. A connected graph (perhaps called
    weakly connected for emphasis) is a graph such that for each vertex v,
    v is reachable from some other vertex w. A strongly connected graph is
    one such that for each pair of vertices v and w, there's a path from v
    to w and a path from w to v.

    Below is a rendering in Perl of the algorithm from Baase that
    enumerates weakly connected subgraphs. The author reports time
    complexity of Theta(max(n,m)) with space Theta(n+m), where n is the
    number of vertices and m is the number of edges.

    #! /usr/local/bin/perl

    use warnings;
    use strict;

    # from Baase's *Computer Algorithms*, 2nd ed., pg. 180

    sub dfs {
    my $v = shift;
    my $adj = shift;
    my $mark = shift;
    my $tag = shift;

    $mark->{$v} = $tag;

    while ($adj->{$v} && @{ $adj->{$v} }) {
    my $w = shift @{ $adj->{$v} };

    dfs($w,$adj,$mark,$tag) unless $mark->{$w};
    }
    }

    sub concomps {
    # destructive, so copy
    my %adj = map ref($_) ? [ @$_ ] : $_, %{ shift @_ };
    my @v = @_;

    my %mark;

    my $componentNumber = 0;

    foreach my $v (@v) {
    unless ($mark{$v}) {
    ++$componentNumber;

    dfs $v, \%adj, \%mark, $componentNumber;
    }
    }

    my %subgraph;
    for (keys %mark) {
    push @{ $subgraph{$mark{$_}} }, $_;
    }

    values %subgraph;
    }

    ## main
    my %adj;
    my %v;

    while (<DATA>) {
    my($v,$w) = split;

    # treat graph as undirected
    push @{ $adj{$v} } => $w;
    push @{ $adj{$w} } => $v;

    ++$v{$v};
    ++$v{$w};
    }

    print join("," => map join("+", sort @$_), concomps \%adj, keys %v), "\n";

    __DATA__
    A B
    A C
    B A
    D E

    The author used arrays, and the code above uses hashes to be more
    Perlish but with time and space penalties. If the size of your input
    makes you consider switching back to arrays, you might also consider
    switching to a lower-level language.

    Hope this helps,
    Greg
    --
    Those who wish to preserve freedom should recognize, however, that inflation
    is probably the most important single factor in that vicious circle wherein
    one kind of government action makes more and more government control necessary.
    -- F. A. Hayek
     
    Greg Bacon, Jul 26, 2004
    #2
    1. Advertising

  3. IRR

    IRR Guest

    Thanks Greg,
    You're exactly right, and I can see how strongly_connected_graph could
    really eat up a substantial amount of time. That script you gave works
    seamlessly and gave exactly what I was after with about 2 lines of
    modification -- and it did so around 100x faster than my previous attempt!
    Also picked up a copy of the book at the local library (1st edition, but it
    still has an informative section on graph algorithms).
    Hugely obliged,
    I.R.


    "Greg Bacon" <> wrote in message
    news:...
    > In article <z1YMc.454$>,
    > IRR <> wrote:
    >
    > : I'm fairly new to perl and graph theory, so hopefully this question
    > : isn't too ignorant on both counts. I'm trying to input basically an
    > : edge list for a graph, and get an output of the connected graphs that
    > : all of these edges comprise. [...]
    >
    > Do you want connected subgraphs or strongly connected? You say
    > connected here, but you're searching for strongly connected subgraphs
    > with Graph::Base::strongly_connected_graph.
    >
    > You say you're new to graph theory, so I'll give quick overviews in
    > case you don't know the difference. A connected graph (perhaps called
    > weakly connected for emphasis) is a graph such that for each vertex v,
    > v is reachable from some other vertex w. A strongly connected graph is
    > one such that for each pair of vertices v and w, there's a path from v
    > to w and a path from w to v.
    >
    > Below is a rendering in Perl of the algorithm from Baase that
    > enumerates weakly connected subgraphs. The author reports time
    > complexity of Theta(max(n,m)) with space Theta(n+m), where n is the
    > number of vertices and m is the number of edges.
    >
    > #! /usr/local/bin/perl
    >
    > use warnings;
    > use strict;
    >
    > # from Baase's *Computer Algorithms*, 2nd ed., pg. 180
    >
    > sub dfs {
    > my $v = shift;
    > my $adj = shift;
    > my $mark = shift;
    > my $tag = shift;
    >
    > $mark->{$v} = $tag;
    >
    > while ($adj->{$v} && @{ $adj->{$v} }) {
    > my $w = shift @{ $adj->{$v} };
    >
    > dfs($w,$adj,$mark,$tag) unless $mark->{$w};
    > }
    > }
    >
    > sub concomps {
    > # destructive, so copy
    > my %adj = map ref($_) ? [ @$_ ] : $_, %{ shift @_ };
    > my @v = @_;
    >
    > my %mark;
    >
    > my $componentNumber = 0;
    >
    > foreach my $v (@v) {
    > unless ($mark{$v}) {
    > ++$componentNumber;
    >
    > dfs $v, \%adj, \%mark, $componentNumber;
    > }
    > }
    >
    > my %subgraph;
    > for (keys %mark) {
    > push @{ $subgraph{$mark{$_}} }, $_;
    > }
    >
    > values %subgraph;
    > }
    >
    > ## main
    > my %adj;
    > my %v;
    >
    > while (<DATA>) {
    > my($v,$w) = split;
    >
    > # treat graph as undirected
    > push @{ $adj{$v} } => $w;
    > push @{ $adj{$w} } => $v;
    >
    > ++$v{$v};
    > ++$v{$w};
    > }
    >
    > print join("," => map join("+", sort @$_), concomps \%adj, keys %v),

    "\n";
    >
    > __DATA__
    > A B
    > A C
    > B A
    > D E
    >
    > The author used arrays, and the code above uses hashes to be more
    > Perlish but with time and space penalties. If the size of your input
    > makes you consider switching back to arrays, you might also consider
    > switching to a lower-level language.
    >
    > Hope this helps,
    > Greg
    > --
    > Those who wish to preserve freedom should recognize, however, that

    inflation
    > is probably the most important single factor in that vicious circle

    wherein
    > one kind of government action makes more and more government control

    necessary.
    > -- F. A. Hayek
     
    IRR, Jul 27, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Barak
    Replies:
    0
    Views:
    814
    Barak
    Aug 7, 2003
  2. Altu
    Replies:
    2
    Views:
    1,359
    Dimitre Novatchev
    Nov 14, 2007
  3. bukzor

    Directed Graph Traversal

    bukzor, Apr 1, 2008, in forum: Python
    Replies:
    3
    Views:
    1,719
    bukzor
    Apr 3, 2008
  4. Noé Alejandro

    Finding all cycles in a directed graph

    Noé Alejandro, Oct 9, 2010, in forum: Ruby
    Replies:
    2
    Views:
    316
    Jeremy Bopp
    Oct 10, 2010
  5. Emilio Mayorga
    Replies:
    6
    Views:
    342
    Martien Verbruggen
    Oct 8, 2003
Loading...

Share This Page