Advise on implementation of custum graph

J

Jef Driesen

I'm working on an image segmentation algorithm. An essential part of the
algorithm is a graph to keep track of the connectivity between regions.
At the moment I have a working implementation, but it's not flexible
enough anymore and I need something more advanced. I already have a data
structure in mind, but I don't know how to implement that nicely in C++.

// Main class which holds the lists with
// edges and vertices. These lists are
// "normal" lists. (read further)
class graph {
// Edge list
edge_list edges;
// Vertex list
vertex_list vertices;
};
// Class which maintains a link between two
// vertices (neighboring regions of pixels
// in my application).
class edge {
// Source and target vertices.
vertex *source, *target;
};
// Class which represent a single region of
// pixels in my application. Each region has
// a list of incoming and outgoing edges.
// (For my application both lists contains the same
// elements but with source and target swapped.)
// Those lists are special because they consists
// of a subset of the elements of the edge list
// of the graph.
class vertex {
// Adjacency in list
// (Where for each element in the list,
// the source can be any other vertex and
// the target is always this vertex)
adj_in_list in_edges;
// Adjacency out list
// (Where for each element in the list,
// the source is always this vertex and
// the target can be any other vertex)
adj_out_list out_edges;
};

To implement this, i was thinking of using 4 kins of custom lists
(vertex_list, edge_list, adj_in_list, adj_out_list) which share the same
type of node (one for a vertex and for an edge) and differ only in the
way they are traversed.

struct vertex_node {
// Vertex data
vertex v;

// Successor and predecessor nodes (vertex list)
vertex_node *next, *prev;
};

struct edge_node {
// Edge data
edge e;

// Successor and predecessor nodes (adjacency in list)
edge_node *in_edges_next, *in_edges_prev;

// Successor and predecessor nodes (adjacency out list)
edge_node *out_edges_next, *out_edges_prev;

// Successor and predecessor nodes (edge list)
edge_node *next, *prev;
};

Is this a workable implementation or will I encounter some problems
which I'm unaware of at the moment. One problem I don't know how to
solve is the constructors of the vertex and edge classes because they
have to know to which graph it belongs. (An edge between vertices of two
different graphs is an error and a vertex can only have edge which are
part of the same graph.) And I also don't see how I should implement the
copy constructor too.

All suggestions are welcome.
 
B

Ben Pope

Jef said:
I'm working on an image segmentation algorithm. An essential part of the
algorithm is a graph to keep track of the connectivity between regions.
At the moment I have a working implementation, but it's not flexible
enough anymore and I need something more advanced. I already have a data
structure in mind, but I don't know how to implement that nicely in C++.

All suggestions are welcome.

Have you looked at the Boost Graph Library?
http://www.boost.org/libs/graph/doc/

Apologies if this is not relevant, I haven't studied your example or the
following library, but perhaps it will be of use.

Boost is a very widely used, very high quality library, some of it is
going into the new additions to the C++ standard library of TR1.

Ben Pope
 
J

Jef Driesen

Ben said:
Have you looked at the Boost Graph Library?
http://www.boost.org/libs/graph/doc/

Apologies if this is not relevant, I haven't studied your example or the
following library, but perhaps it will be of use.

Boost is a very widely used, very high quality library, some of it is
going into the new additions to the C++ standard library of TR1.

I have looked at BGL in the past, but found it quite complicated to use
(i only need the datastructure and the possibility to update the graph,
no graph algorithms,...) and also very different from my current code.
But I'll have a second look at it.
 
L

Luke Meyers

Jef said:
I'm working on an image segmentation algorithm. An essential part of the
algorithm is a graph to keep track of the connectivity between regions.
At the moment I have a working implementation, but it's not flexible
enough anymore and I need something more advanced. I already have a data
structure in mind, but I don't know how to implement that nicely in C++.

// Main class which holds the lists with
// edges and vertices. These lists are
// "normal" lists. (read further)
class graph {
// Edge list
edge_list edges;
// Vertex list
vertex_list vertices;
};
// Class which maintains a link between two
// vertices (neighboring regions of pixels
// in my application).
class edge {
// Source and target vertices.
vertex *source, *target;
};
// Class which represent a single region of
// pixels in my application. Each region has
// a list of incoming and outgoing edges.
// (For my application both lists contains the same
// elements but with source and target swapped.)
// Those lists are special because they consists
// of a subset of the elements of the edge list
// of the graph.
class vertex {
// Adjacency in list
// (Where for each element in the list,
// the source can be any other vertex and
// the target is always this vertex)
adj_in_list in_edges;
// Adjacency out list
// (Where for each element in the list,
// the source is always this vertex and
// the target can be any other vertex)
adj_out_list out_edges;
};

To implement this, i was thinking of using 4 kins of custom lists
(vertex_list, edge_list, adj_in_list, adj_out_list) which share the same
type of node (one for a vertex and for an edge) and differ only in the
way they are traversed.

struct vertex_node {
// Vertex data
vertex v;

// Successor and predecessor nodes (vertex list)
vertex_node *next, *prev;
};

struct edge_node {
// Edge data
edge e;

// Successor and predecessor nodes (adjacency in list)
edge_node *in_edges_next, *in_edges_prev;

// Successor and predecessor nodes (adjacency out list)
edge_node *out_edges_next, *out_edges_prev;

// Successor and predecessor nodes (edge list)
edge_node *next, *prev;
};

Is this a workable implementation or will I encounter some problems
which I'm unaware of at the moment. One problem I don't know how to
solve is the constructors of the vertex and edge classes because they
have to know to which graph it belongs. (An edge between vertices of two
different graphs is an error and a vertex can only have edge which are
part of the same graph.) And I also don't see how I should implement the
copy constructor too.

All suggestions are welcome.

Jef,

Unfortunately, it's somewhat difficult to evaluate the suitability of a
data structure without a better idea of its intended use. That said,
the one thing that strikes me is that you seem to have some redundant
information in your implementation, which in almost all cases results
in headaches.

A graph is (as you know) fundamentally just a 2-tuple (V,E), where V is
a set of vertices and E is a binary relation on V. If you're mostly
interested in traversing the set of edges in various ways, your
implementation should focus on making E easy to work with, and not
worry much about V. The list of edges would just be some container of
std::pair<Vertex, Vertex> which supports the traversals you want. You
might take a look at boost::multi_index_container, which has a bit of a
learning curve but would probably let you do what you want by defining
custom indices. That is, assuming that the BGL isn't suitable for you
(which I'd doubt -- I'd give that another try, first. Buy the book,
it's like $20).

Luke
 
D

Dietmar Kuehl

Jef said:
I have looked at BGL in the past, but found it quite complicated to use
(i only need the datastructure and the possibility to update the graph,
no graph algorithms,...) and also very different from my current code.

The reason why BGL looks very different from typical code using graphs
is that the abstractions are not really that obvious although once you
have understood them, they make working with graphs much easier than
essentially any obvious approach. Whether or not you need existing
graph algorithms, does not really matter too much, either: you apparently
need the graph and if you want to do anything with it, you need some
form of graph algorithms, be it existing ones or own ones. The BGL
abstractions make graph algorithms a great deal easier to cope with!

Of course, I should mention that I'm biased: I essentially came up
with basically the same abstractions as those used by Jeremy for BGL.
 
J

Jef Driesen

Luke said:
Jef,

Unfortunately, it's somewhat difficult to evaluate the suitability of a
data structure without a better idea of its intended use. That said,
the one thing that strikes me is that you seem to have some redundant
information in your implementation, which in almost all cases results
in headaches.

First I want to make clear that I have not done the implementation yet.
The above is only a design (on paper) I came up with after some
thinking. Before i started the implementation, I wanted some advice on
this idea.

I don't see the redundancy you are talking about. Are talking about the
incoming and outgoing edges (which contains the same elements, but with
source and target swapped)? In that case I need both edges (A,B) and
(B,A) because if region A is connected to region B, the opposited is
also always true.

What I need is basically this:
* Easy access to all edges, because once the datastructure is created, I
need to be able to scan through all adjacent regions and perform some
calculations (based on the properties associated with each region).
(That's why I have the edge_list for a graph.)
* Easy addition and removal of a region and update all edges which refer
to that region. If a certain pair of regions matches my criteria I need
to merge them and add the result to the graph as a new region. All edges
referring to the old regions need to be pointed towards that region.
(That's why I have the adj_in_list and adj_out_list for a vertex.)

Maybe you are referring with the redundancy to the multiple lists? In my
code, all nodes are contained in the main graph (edge_list for the edges
and vertex_list for the vertices). The to other adj_*_list do not
contain new elements, but are only 'viewing' a subset of the main list
in another way. Hope this makes it a little more clear.
A graph is (as you know) fundamentally just a 2-tuple (V,E), where V is
a set of vertices and E is a binary relation on V. If you're mostly
interested in traversing the set of edges in various ways, your
implementation should focus on making E easy to work with, and not
worry much about V. The list of edges would just be some container of
std::pair<Vertex, Vertex> which supports the traversals you want. You
might take a look at boost::multi_index_container, which has a bit of a
learning curve but would probably let you do what you want by defining
custom indices. That is, assuming that the BGL isn't suitable for you
(which I'd doubt -- I'd give that another try, first. Buy the book,
it's like $20).

If I understand you correctly, you are saying I should get rid of the
lists inside my vertex class? That would really simplify my problem. But
how do I keep track of the incoming/outgoing edges for a vertex now?

I'll also have a second look at the BGL. Now I have some more knowledge
about graphs then with my first try, so maybe I have more luck with it
this time.
 
L

Luke Meyers

Jef said:
I don't see the redundancy you are talking about. Are talking about the
incoming and outgoing edges (which contains the same elements, but with
source and target swapped)? In that case I need both edges (A,B) and
(B,A) because if region A is connected to region B, the opposited is
also always true.

Since your edges are directional, it is not redundant to have both
(A,B) and (B,A), of course. That's not what I was referring to.
Maybe you are referring with the redundancy to the multiple lists? In my
code, all nodes are contained in the main graph (edge_list for the edges
and vertex_list for the vertices). The to other adj_*_list do not
contain new elements, but are only 'viewing' a subset of the main list
in another way. Hope this makes it a little more clear.

Right, that's the issue right there. The multiple lists mean that
you're keeping the same information in more than one place. That means
any time you update one list, you have to update the others. That's
not necessarily a deal-breaker, but this is why I said that it depends
on what you're doing with the graph. If updates are infrequent, as
compared to reads which require the different "views," then it may be
acceptable to keep this information redundantly. But even in that
case, I think it's important to see this as a form of caching -- a
performance optimization -- rather than a part of the data structure
proper. A data structure should organize data, not duplicate it,
generally speaking.

So, getting back to your requirements (I'm quoting you a little out of
order here):
What I need is basically this:
* Easy access to all edges, because once the datastructure is created, I
need to be able to scan through all adjacent regions and perform some
calculations (based on the properties associated with each region).
(That's why I have the edge_list for a graph.)

So you're iterating across the whole set of edges, is that correct?
Versus iterating across the edges for each vertex in turn? This is
likely to be a deciding factor, so I want to make sure I understand you
correctly.
* Easy addition and removal of a region and update all edges which refer
to that region. If a certain pair of regions matches my criteria I need
to merge them and add the result to the graph as a new region. All edges
referring to the old regions need to be pointed towards that region.
(That's why I have the adj_in_list and adj_out_list for a vertex.)

See, I don't think that this requirement is sufficient to justify
maintaining a separate list for each vertex. What about the following
(pseudocode)?

To remove all edges referring to vertex Foo:
E = list of all edges (V1, V2)
Sort E by V1
Get the sublist S1 of E where V1 == Foo
Update each edge in S1
Sort E by V2
Get the sublist S2 of E where V2 == Foo
Update each edge in S2
If I understand you correctly, you are saying I should get rid of the
lists inside my vertex class? That would really simplify my problem. But
how do I keep track of the incoming/outgoing edges for a vertex now?

By taking a sublist of the sorted edge list, as in the pseudocode
above. If you find that the sorts are too expensive, you can always
cache the results. The nice thing about that approach is that you can
evaluate it lazily, rather than computing results you may never use.
I'll also have a second look at the BGL. Now I have some more knowledge
about graphs then with my first try, so maybe I have more luck with it
this time.

I heartily agree.

HTH,
Luke
 
J

Jef Driesen

Luke said:
Since your edges are directional, it is not redundant to have both
(A,B) and (B,A), of course. That's not what I was referring to.

They are not really directional, since (A,B) and (B,A) are functional
equivalent for my purpose, but leaving these kind of duplicates was
easier to work with in my code. (But at the expense of maintaining both
an adj_in_list and adj_out_list in my new design.)
Right, that's the issue right there. The multiple lists mean that
you're keeping the same information in more than one place. That means
any time you update one list, you have to update the others. That's
not necessarily a deal-breaker, but this is why I said that it depends
on what you're doing with the graph. If updates are infrequent, as
compared to reads which require the different "views," then it may be
acceptable to keep this information redundantly. But even in that
case, I think it's important to see this as a form of caching -- a
performance optimization -- rather than a part of the data structure
proper. A data structure should organize data, not duplicate it,
generally speaking.

The information is not really duplicated in my approach, because all
edge lists share the internal node type and only use a different set of
pointers to traverse through the nodes.

struct edge_node {
// Edge data
edge e;

// Successor and predecessor nodes (adjacency in list)
edge_node *in_edges_next, *in_edges_prev;

// Successor and predecessor nodes (adjacency out list)
edge_node *out_edges_next, *out_edges_prev;

// Successor and predecessor nodes (edge list)
edge_node *next, *prev;
};
So, getting back to your requirements (I'm quoting you a little out of
order here):


So you're iterating across the whole set of edges, is that correct?
Versus iterating across the edges for each vertex in turn? This is
likely to be a deciding factor, so I want to make sure I understand you
correctly.

That is correct. The outline of one iteration of the main algorithm is
something like this pseudo code:

for each edge e in G
if (criteria(e) > threshold) tag e
sort tagged edges (by the criteria)
for each tagged edge e(A,B)
if !already_merged(A) && !already_merged(B)
new_region = merge(A,B)
See, I don't think that this requirement is sufficient to justify
maintaining a separate list for each vertex. What about the following
(pseudocode)?

To remove all edges referring to vertex Foo:
E = list of all edges (V1, V2)
Sort E by V1
Get the sublist S1 of E where V1 == Foo
Update each edge in S1
Sort E by V2
Get the sublist S2 of E where V2 == Foo
Update each edge in S2

That would work, but i think the sorting will be too slow (I did not
test that). A typical (initial) graph for a 512x512 image contains
20.000 regions (typically 10 à 20 pixels in size) and more than 100.000
edges (counting (A,B) and (B,A) as different).

If I want to reduce the number of regions by merging them 2 by 2 (to
something like 1.000 for instance), I would have to sort that list many
times. And sorting is not a very cheap operation.
By taking a sublist of the sorted edge list, as in the pseudocode
above. If you find that the sorts are too expensive, you can always
cache the results. The nice thing about that approach is that you can
evaluate it lazily, rather than computing results you may never use.

It seems to me that caching the sorting result is more or less
equivalent to maintaining multiple lists? Each time the graph is
updated, the cache needs to be updated also. What do you mean by
"evaluate it lazily"?
I heartily agree.

I'm experimenting with BGL at the moment. I'm trying to implementation
some parts of my algorithm.
 
J

JustBoo

On 8 Jan 2006 09:36:02 -0800, "Luke Meyers" <[email protected]>
wrote:

[Snipped 79 lines that have NOTHING to do with C++. Nothing.]

<quote>
From: "Luke Meyers" <[email protected]>
Newsgroups: comp.lang.c++
Subject: Re: Start when windows starts.
Date: 8 Jan 2006 17:18:14 -0800
Organization: http://groups.google.com
Lines: 3
Message-ID: <[email protected]>
References: <op.s20e9vn6vpyu6i@celsius>

This thread has nothing whatsoever to do with the C++ language.
Please go find a more appropriate forum for your question.
</quote>

That would be: comp.graphics.algorithms

Which this group is NOT.
I heartily agree.

I thought you would.

The nerve of some people eh Luke. We'll show them though eh, Luke, yes
we'll show them how superior we can be, you betcha'.

"Luke, I am your fa... Oh forget it!"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top