Advise on implementation of custum graph

Discussion in 'C++' started by Jef Driesen, Jan 6, 2006.

1. Jef DriesenGuest

I'm working on an image segmentation algorithm. An essential part of the
algorithm is a graph to keep track of the connectivity between regions.
At the moment I have a working implementation, but it's not flexible
enough anymore and I need something more advanced. I already have a data
structure in mind, but I don't know how to implement that nicely in C++.

// Main class which holds the lists with
// edges and vertices. These lists are
class graph {
// Edge list
edge_list edges;
// Vertex list
vertex_list vertices;
};
// Class which maintains a link between two
// vertices (neighboring regions of pixels
// in my application).
class edge {
// Source and target vertices.
vertex *source, *target;
};
// Class which represent a single region of
// pixels in my application. Each region has
// a list of incoming and outgoing edges.
// (For my application both lists contains the same
// elements but with source and target swapped.)
// Those lists are special because they consists
// of a subset of the elements of the edge list
// of the graph.
class vertex {
// (Where for each element in the list,
// the source can be any other vertex and
// the target is always this vertex)
// (Where for each element in the list,
// the source is always this vertex and
// the target can be any other vertex)
};

To implement this, i was thinking of using 4 kins of custom lists
type of node (one for a vertex and for an edge) and differ only in the
way they are traversed.

struct vertex_node {
// Vertex data
vertex v;

// Successor and predecessor nodes (vertex list)
vertex_node *next, *prev;
};

struct edge_node {
// Edge data
edge e;

// Successor and predecessor nodes (adjacency in list)
edge_node *in_edges_next, *in_edges_prev;

// Successor and predecessor nodes (adjacency out list)
edge_node *out_edges_next, *out_edges_prev;

// Successor and predecessor nodes (edge list)
edge_node *next, *prev;
};

Is this a workable implementation or will I encounter some problems
which I'm unaware of at the moment. One problem I don't know how to
solve is the constructors of the vertex and edge classes because they
have to know to which graph it belongs. (An edge between vertices of two
different graphs is an error and a vertex can only have edge which are
part of the same graph.) And I also don't see how I should implement the
copy constructor too.

All suggestions are welcome.

Jef Driesen, Jan 6, 2006

2. Ben PopeGuest

Jef Driesen wrote:
> I'm working on an image segmentation algorithm. An essential part of the
> algorithm is a graph to keep track of the connectivity between regions.
> At the moment I have a working implementation, but it's not flexible
> enough anymore and I need something more advanced. I already have a data
> structure in mind, but I don't know how to implement that nicely in C++.

<snip>

> All suggestions are welcome.

Have you looked at the Boost Graph Library?
http://www.boost.org/libs/graph/doc/

Apologies if this is not relevant, I haven't studied your example or the
following library, but perhaps it will be of use.

Boost is a very widely used, very high quality library, some of it is
going into the new additions to the C++ standard library of TR1.

Ben Pope
--
I'm not just a number. To many, I'm known as a string...

Ben Pope, Jan 6, 2006

3. Jef DriesenGuest

Ben Pope wrote:
> Jef Driesen wrote:
>> I'm working on an image segmentation algorithm. An essential part of
>> the algorithm is a graph to keep track of the connectivity between
>> regions. At the moment I have a working implementation, but it's not
>> flexible enough anymore and I need something more advanced. I already
>> have a data structure in mind, but I don't know how to implement that
>> nicely in C++.

>
> <snip>
>
>> All suggestions are welcome.

>
> Have you looked at the Boost Graph Library?
> http://www.boost.org/libs/graph/doc/
>
> Apologies if this is not relevant, I haven't studied your example or the
> following library, but perhaps it will be of use.
>
> Boost is a very widely used, very high quality library, some of it is
> going into the new additions to the C++ standard library of TR1.

I have looked at BGL in the past, but found it quite complicated to use
(i only need the datastructure and the possibility to update the graph,
no graph algorithms,...) and also very different from my current code.
But I'll have a second look at it.

Jef Driesen, Jan 6, 2006
4. Luke MeyersGuest

Jef Driesen wrote:
> I'm working on an image segmentation algorithm. An essential part of the
> algorithm is a graph to keep track of the connectivity between regions.
> At the moment I have a working implementation, but it's not flexible
> enough anymore and I need something more advanced. I already have a data
> structure in mind, but I don't know how to implement that nicely in C++.
>
> // Main class which holds the lists with
> // edges and vertices. These lists are
> // "normal" lists. (read further)
> class graph {
> // Edge list
> edge_list edges;
> // Vertex list
> vertex_list vertices;
> };
> // Class which maintains a link between two
> // vertices (neighboring regions of pixels
> // in my application).
> class edge {
> // Source and target vertices.
> vertex *source, *target;
> };
> // Class which represent a single region of
> // pixels in my application. Each region has
> // a list of incoming and outgoing edges.
> // (For my application both lists contains the same
> // elements but with source and target swapped.)
> // Those lists are special because they consists
> // of a subset of the elements of the edge list
> // of the graph.
> class vertex {
> // (Where for each element in the list,
> // the source can be any other vertex and
> // the target is always this vertex)
> // (Where for each element in the list,
> // the source is always this vertex and
> // the target can be any other vertex)
> };
>
> To implement this, i was thinking of using 4 kins of custom lists
> type of node (one for a vertex and for an edge) and differ only in the
> way they are traversed.
>
> struct vertex_node {
> // Vertex data
> vertex v;
>
> // Successor and predecessor nodes (vertex list)
> vertex_node *next, *prev;
> };
>
> struct edge_node {
> // Edge data
> edge e;
>
> // Successor and predecessor nodes (adjacency in list)
> edge_node *in_edges_next, *in_edges_prev;
>
> // Successor and predecessor nodes (adjacency out list)
> edge_node *out_edges_next, *out_edges_prev;
>
> // Successor and predecessor nodes (edge list)
> edge_node *next, *prev;
> };
>
> Is this a workable implementation or will I encounter some problems
> which I'm unaware of at the moment. One problem I don't know how to
> solve is the constructors of the vertex and edge classes because they
> have to know to which graph it belongs. (An edge between vertices of two
> different graphs is an error and a vertex can only have edge which are
> part of the same graph.) And I also don't see how I should implement the
> copy constructor too.
>
> All suggestions are welcome.

Jef,

Unfortunately, it's somewhat difficult to evaluate the suitability of a
data structure without a better idea of its intended use. That said,
the one thing that strikes me is that you seem to have some redundant
information in your implementation, which in almost all cases results

A graph is (as you know) fundamentally just a 2-tuple (V,E), where V is
a set of vertices and E is a binary relation on V. If you're mostly
interested in traversing the set of edges in various ways, your
implementation should focus on making E easy to work with, and not
worry much about V. The list of edges would just be some container of
std:air<Vertex, Vertex> which supports the traversals you want. You
might take a look at boost::multi_index_container, which has a bit of a
learning curve but would probably let you do what you want by defining
custom indices. That is, assuming that the BGL isn't suitable for you
(which I'd doubt -- I'd give that another try, first. Buy the book,
it's like \$20).

Luke

Luke Meyers, Jan 6, 2006
5. Dietmar KuehlGuest

Jef Driesen wrote:
> I have looked at BGL in the past, but found it quite complicated to use
> (i only need the datastructure and the possibility to update the graph,
> no graph algorithms,...) and also very different from my current code.

The reason why BGL looks very different from typical code using graphs
is that the abstractions are not really that obvious although once you
have understood them, they make working with graphs much easier than
essentially any obvious approach. Whether or not you need existing
graph algorithms, does not really matter too much, either: you apparently
need the graph and if you want to do anything with it, you need some
form of graph algorithms, be it existing ones or own ones. The BGL
abstractions make graph algorithms a great deal easier to cope with!

Of course, I should mention that I'm biased: I essentially came up
with basically the same abstractions as those used by Jeremy for BGL.
--
<mailto:> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Dietmar Kuehl, Jan 6, 2006
6. Jef DriesenGuest

Luke Meyers wrote:
> Jef Driesen wrote:
>
>>I'm working on an image segmentation algorithm. An essential part of the
>>algorithm is a graph to keep track of the connectivity between regions.
>>At the moment I have a working implementation, but it's not flexible
>>enough anymore and I need something more advanced. I already have a data
>>structure in mind, but I don't know how to implement that nicely in C++.
>>
>>// Main class which holds the lists with
>>// edges and vertices. These lists are
>>class graph {
>> // Edge list
>> edge_list edges;
>> // Vertex list
>> vertex_list vertices;
>>};
>>// Class which maintains a link between two
>>// vertices (neighboring regions of pixels
>>// in my application).
>>class edge {
>> // Source and target vertices.
>> vertex *source, *target;
>>};
>>// Class which represent a single region of
>>// pixels in my application. Each region has
>>// a list of incoming and outgoing edges.
>>// (For my application both lists contains the same
>>// elements but with source and target swapped.)
>>// Those lists are special because they consists
>>// of a subset of the elements of the edge list
>>// of the graph.
>>class vertex {
>> // (Where for each element in the list,
>> // the source can be any other vertex and
>> // the target is always this vertex)
>> // (Where for each element in the list,
>> // the source is always this vertex and
>> // the target can be any other vertex)
>>};
>>
>>To implement this, i was thinking of using 4 kins of custom lists
>>type of node (one for a vertex and for an edge) and differ only in the
>>way they are traversed.
>>
>>struct vertex_node {
>> // Vertex data
>> vertex v;
>>
>> // Successor and predecessor nodes (vertex list)
>> vertex_node *next, *prev;
>>};
>>
>>struct edge_node {
>> // Edge data
>> edge e;
>>
>> // Successor and predecessor nodes (adjacency in list)
>> edge_node *in_edges_next, *in_edges_prev;
>>
>> // Successor and predecessor nodes (adjacency out list)
>> edge_node *out_edges_next, *out_edges_prev;
>>
>> // Successor and predecessor nodes (edge list)
>> edge_node *next, *prev;
>>};
>>
>>Is this a workable implementation or will I encounter some problems
>>which I'm unaware of at the moment. One problem I don't know how to
>>solve is the constructors of the vertex and edge classes because they
>>have to know to which graph it belongs. (An edge between vertices of two
>>different graphs is an error and a vertex can only have edge which are
>>part of the same graph.) And I also don't see how I should implement the
>>copy constructor too.
>>
>>All suggestions are welcome.

>
>
> Jef,
>
> Unfortunately, it's somewhat difficult to evaluate the suitability of a
> data structure without a better idea of its intended use. That said,
> the one thing that strikes me is that you seem to have some redundant
> information in your implementation, which in almost all cases results

First I want to make clear that I have not done the implementation yet.
The above is only a design (on paper) I came up with after some
thinking. Before i started the implementation, I wanted some advice on
this idea.

I don't see the redundancy you are talking about. Are talking about the
incoming and outgoing edges (which contains the same elements, but with
source and target swapped)? In that case I need both edges (A,B) and
(B,A) because if region A is connected to region B, the opposited is
also always true.

What I need is basically this:
* Easy access to all edges, because once the datastructure is created, I
need to be able to scan through all adjacent regions and perform some
calculations (based on the properties associated with each region).
(That's why I have the edge_list for a graph.)
* Easy addition and removal of a region and update all edges which refer
to that region. If a certain pair of regions matches my criteria I need
to merge them and add the result to the graph as a new region. All edges
referring to the old regions need to be pointed towards that region.

Maybe you are referring with the redundancy to the multiple lists? In my
code, all nodes are contained in the main graph (edge_list for the edges
and vertex_list for the vertices). The to other adj_*_list do not
contain new elements, but are only 'viewing' a subset of the main list
in another way. Hope this makes it a little more clear.

> A graph is (as you know) fundamentally just a 2-tuple (V,E), where V is
> a set of vertices and E is a binary relation on V. If you're mostly
> interested in traversing the set of edges in various ways, your
> implementation should focus on making E easy to work with, and not
> worry much about V. The list of edges would just be some container of
> std:air<Vertex, Vertex> which supports the traversals you want. You
> might take a look at boost::multi_index_container, which has a bit of a
> learning curve but would probably let you do what you want by defining
> custom indices. That is, assuming that the BGL isn't suitable for you
> (which I'd doubt -- I'd give that another try, first. Buy the book,
> it's like \$20).

If I understand you correctly, you are saying I should get rid of the
lists inside my vertex class? That would really simplify my problem. But
how do I keep track of the incoming/outgoing edges for a vertex now?

I'll also have a second look at the BGL. Now I have some more knowledge
about graphs then with my first try, so maybe I have more luck with it
this time.

Jef Driesen, Jan 7, 2006
7. Luke MeyersGuest

Jef Driesen wrote:
> I don't see the redundancy you are talking about. Are talking about the
> incoming and outgoing edges (which contains the same elements, but with
> source and target swapped)? In that case I need both edges (A,B) and
> (B,A) because if region A is connected to region B, the opposited is
> also always true.

Since your edges are directional, it is not redundant to have both
(A,B) and (B,A), of course. That's not what I was referring to.

> Maybe you are referring with the redundancy to the multiple lists? In my
> code, all nodes are contained in the main graph (edge_list for the edges
> and vertex_list for the vertices). The to other adj_*_list do not
> contain new elements, but are only 'viewing' a subset of the main list
> in another way. Hope this makes it a little more clear.

Right, that's the issue right there. The multiple lists mean that
you're keeping the same information in more than one place. That means
any time you update one list, you have to update the others. That's
not necessarily a deal-breaker, but this is why I said that it depends
on what you're doing with the graph. If updates are infrequent, as
compared to reads which require the different "views," then it may be
acceptable to keep this information redundantly. But even in that
case, I think it's important to see this as a form of caching -- a
performance optimization -- rather than a part of the data structure
proper. A data structure should organize data, not duplicate it,
generally speaking.

So, getting back to your requirements (I'm quoting you a little out of
order here):

> What I need is basically this:
> * Easy access to all edges, because once the datastructure is created, I
> need to be able to scan through all adjacent regions and perform some
> calculations (based on the properties associated with each region).
> (That's why I have the edge_list for a graph.)

So you're iterating across the whole set of edges, is that correct?
Versus iterating across the edges for each vertex in turn? This is
likely to be a deciding factor, so I want to make sure I understand you
correctly.

> * Easy addition and removal of a region and update all edges which refer
> to that region. If a certain pair of regions matches my criteria I need
> to merge them and add the result to the graph as a new region. All edges
> referring to the old regions need to be pointed towards that region.
> (That's why I have the adj_in_list and adj_out_list for a vertex.)

See, I don't think that this requirement is sufficient to justify
maintaining a separate list for each vertex. What about the following
(pseudocode)?

To remove all edges referring to vertex Foo:
E = list of all edges (V1, V2)
Sort E by V1
Get the sublist S1 of E where V1 == Foo
Update each edge in S1
Sort E by V2
Get the sublist S2 of E where V2 == Foo
Update each edge in S2

> If I understand you correctly, you are saying I should get rid of the
> lists inside my vertex class? That would really simplify my problem. But
> how do I keep track of the incoming/outgoing edges for a vertex now?

By taking a sublist of the sorted edge list, as in the pseudocode
above. If you find that the sorts are too expensive, you can always
cache the results. The nice thing about that approach is that you can
evaluate it lazily, rather than computing results you may never use.

> I'll also have a second look at the BGL. Now I have some more knowledge
> about graphs then with my first try, so maybe I have more luck with it
> this time.

I heartily agree.

HTH,
Luke

Luke Meyers, Jan 8, 2006
8. Jef DriesenGuest

Luke Meyers wrote:
> Jef Driesen wrote:
>> I don't see the redundancy you are talking about. Are talking about the
>> incoming and outgoing edges (which contains the same elements, but with
>> source and target swapped)? In that case I need both edges (A,B) and
>> (B,A) because if region A is connected to region B, the opposited is
>> also always true.

>
> Since your edges are directional, it is not redundant to have both
> (A,B) and (B,A), of course. That's not what I was referring to.

They are not really directional, since (A,B) and (B,A) are functional
equivalent for my purpose, but leaving these kind of duplicates was
easier to work with in my code. (But at the expense of maintaining both

>> Maybe you are referring with the redundancy to the multiple lists? In my
>> code, all nodes are contained in the main graph (edge_list for the edges
>> and vertex_list for the vertices). The to other adj_*_list do not
>> contain new elements, but are only 'viewing' a subset of the main list
>> in another way. Hope this makes it a little more clear.

>
> Right, that's the issue right there. The multiple lists mean that
> you're keeping the same information in more than one place. That means
> any time you update one list, you have to update the others. That's
> not necessarily a deal-breaker, but this is why I said that it depends
> on what you're doing with the graph. If updates are infrequent, as
> compared to reads which require the different "views," then it may be
> acceptable to keep this information redundantly. But even in that
> case, I think it's important to see this as a form of caching -- a
> performance optimization -- rather than a part of the data structure
> proper. A data structure should organize data, not duplicate it,
> generally speaking.

The information is not really duplicated in my approach, because all
edge lists share the internal node type and only use a different set of
pointers to traverse through the nodes.

struct edge_node {
// Edge data
edge e;

// Successor and predecessor nodes (adjacency in list)
edge_node *in_edges_next, *in_edges_prev;

// Successor and predecessor nodes (adjacency out list)
edge_node *out_edges_next, *out_edges_prev;

// Successor and predecessor nodes (edge list)
edge_node *next, *prev;
};

> So, getting back to your requirements (I'm quoting you a little out of
> order here):
>
>> What I need is basically this:
>> * Easy access to all edges, because once the datastructure is created, I
>> need to be able to scan through all adjacent regions and perform some
>> calculations (based on the properties associated with each region).
>> (That's why I have the edge_list for a graph.)

>
> So you're iterating across the whole set of edges, is that correct?
> Versus iterating across the edges for each vertex in turn? This is
> likely to be a deciding factor, so I want to make sure I understand you
> correctly.

That is correct. The outline of one iteration of the main algorithm is
something like this pseudo code:

for each edge e in G
if (criteria(e) > threshold) tag e
sort tagged edges (by the criteria)
for each tagged edge e(A,B)
new_region = merge(A,B)

>> * Easy addition and removal of a region and update all edges which refer
>> to that region. If a certain pair of regions matches my criteria I need
>> to merge them and add the result to the graph as a new region. All edges
>> referring to the old regions need to be pointed towards that region.
>> (That's why I have the adj_in_list and adj_out_list for a vertex.)

>
> See, I don't think that this requirement is sufficient to justify
> maintaining a separate list for each vertex. What about the following
> (pseudocode)?
>
> To remove all edges referring to vertex Foo:
> E = list of all edges (V1, V2)
> Sort E by V1
> Get the sublist S1 of E where V1 == Foo
> Update each edge in S1
> Sort E by V2
> Get the sublist S2 of E where V2 == Foo
> Update each edge in S2

That would work, but i think the sorting will be too slow (I did not
test that). A typical (initial) graph for a 512x512 image contains
20.000 regions (typically 10 à 20 pixels in size) and more than 100.000
edges (counting (A,B) and (B,A) as different).

If I want to reduce the number of regions by merging them 2 by 2 (to
something like 1.000 for instance), I would have to sort that list many
times. And sorting is not a very cheap operation.

>> If I understand you correctly, you are saying I should get rid of the
>> lists inside my vertex class? That would really simplify my problem. But
>> how do I keep track of the incoming/outgoing edges for a vertex now?

>
> By taking a sublist of the sorted edge list, as in the pseudocode
> above. If you find that the sorts are too expensive, you can always
> cache the results. The nice thing about that approach is that you can
> evaluate it lazily, rather than computing results you may never use.

It seems to me that caching the sorting result is more or less
equivalent to maintaining multiple lists? Each time the graph is
updated, the cache needs to be updated also. What do you mean by
"evaluate it lazily"?

>> I'll also have a second look at the BGL. Now I have some more knowledge
>> about graphs then with my first try, so maybe I have more luck with it
>> this time.

>
> I heartily agree.

I'm experimenting with BGL at the moment. I'm trying to implementation
some parts of my algorithm.

Jef Driesen, Jan 9, 2006
9. JustBooGuest

On 8 Jan 2006 09:36:02 -0800, "Luke Meyers" <>
wrote:

[Snipped 79 lines that have NOTHING to do with C++. Nothing.]

<quote>
From: "Luke Meyers" <>
Newsgroups: comp.lang.c++
Subject: Re: Start when windows starts.
Date: 8 Jan 2006 17:18:14 -0800
Lines: 3
Message-ID: <>
References: <op.s20e9vn6vpyu6i@celsius>

This thread has nothing whatsoever to do with the C++ language.
</quote>

That would be: comp.graphics.algorithms

Which this group is NOT.

>I heartily agree.

I thought you would.

>Luke

The nerve of some people eh Luke. We'll show them though eh, Luke, yes
we'll show them how superior we can be, you betcha'.

"Luke, I am your fa... Oh forget it!"

JustBoo, Jan 9, 2006