# Algorithmic vs design complexity

Discussion in 'C++' started by Vidar Hasfjord, Mar 27, 2011.

1. ### Vidar HasfjordGuest

Hi all,

In my field of work I am often faced with algorithms where the
traversal of one data structure (or series) involves the traversal of
a secondary structure. For example:

// Calculate heights for bus stops along a route.
BOOST_FOREACH (const Stop& s, stops)
output (s.distance, profile.height (s.distance));

Here the query "height" makes a linear traversal of the profile
structure to calculate the height at the given distance. This leads to
an algorithmic complexity of O(n*m) where n and m is the size of the
structures.

The interesting point is that very often the traversal is dependent
(ordered) and could progress in parallel for both structures, leading
to linear complexity O(n+m). E.g. in the example above, bus stops are
sorted on distance.

But parallel traversal involves complicating the design; either in
effect reimplementing the inner traversal in-place, inverting the
calling sequence (callback), or in some other way exchanging the
parallel state of the traversal.

Any advice on how to attack this problem, suitable design patterns and
C++ idioms, would be very much appreciated.

Regards,
Vidar Hasfjord

Vidar Hasfjord, Mar 27, 2011

2. ### ralphGuest

On Sun, 27 Mar 2011 08:35:54 -0700 (PDT), Vidar Hasfjord
<> wrote:

>Hi all,
>
>In my field of work I am often faced with algorithms where the
>traversal of one data structure (or series) involves the traversal of
>a secondary structure. For example:
>
> // Calculate heights for bus stops along a route.
> BOOST_FOREACH (const Stop& s, stops)
> output (s.distance, profile.height (s.distance));
>
>Here the query "height" makes a linear traversal of the profile
>structure to calculate the height at the given distance. This leads to
>an algorithmic complexity of O(n*m) where n and m is the size of the
>structures.
>
>The interesting point is that very often the traversal is dependent
>(ordered) and could progress in parallel for both structures, leading
>to linear complexity O(n+m). E.g. in the example above, bus stops are
>sorted on distance.
>
>But parallel traversal involves complicating the design; either in
>effect reimplementing the inner traversal in-place, inverting the
>calling sequence (callback), or in some other way exchanging the
>parallel state of the traversal.
>
>Any advice on how to attack this problem, suitable design patterns and
>C++ idioms, would be very much appreciated.
>

I'm not sure I understand what you are trying to do... Well... strike
that. I'm positive I don't understand what you are trying to do. But
since I've never let not knowing all the facts deter me from providing
an opinion ...

It appears you have some kind of View thingy that needs to render Stop
thingies differently based on their position known by the View thingy.

The first solution that presents itself is to have each Stop thingy
provide a Rendering method that accepts positon information to return
a properly scaled attribute for rendering. Then all the View thingy
needs to do is reach into its bag of Stops and call each of their
Rendering methods. It is as simple as that.

If these thingies need to traverse complex data structures, or in
anyway get involved with "exchanging the parallel state of the
traversal" or other mystical processes, then I suggest you might have
wonderful structures and algorithms but you don't have any first-class
objects nor thought-out rules for working with them.

Go back to the whiteboard.

-ralph

ralph, Mar 29, 2011

3. ### Michael DoubezGuest

On 27 mar, 17:35, Vidar Hasfjord <> wrote:
> Hi all,
>
> In my field of work I am often faced with algorithms where the
> traversal of one data structure (or series) involves the traversal of
> a secondary structure. For example:
>
>   // Calculate heights for bus stops along a route.
>   BOOST_FOREACH (const Stop& s, stops)
>     output (s.distance, profile.height (s.distance));
>
> Here the query "height" makes a linear traversal of the profile
> structure to calculate the height at the given distance. This leads to
> an algorithmic complexity of O(n*m) where n and m is the size of the
> structures.
>
> The interesting point is that very often the traversal is dependent
> (ordered) and could progress in parallel for both structures, leading
> to linear complexity O(n+m). E.g. in the example above, bus stops are
> sorted on distance.
>
> But parallel traversal involves complicating the design; either in
> effect reimplementing the inner traversal in-place, inverting the
> calling sequence (callback), or in some other way exchanging the
> parallel state of the traversal.
>
> Any advice on how to attack this problem, suitable design patterns and
> C++ idioms, would be very much appreciated.

You could use a proxy for computing the height that memoize the result/
state of the previous computation and call

ProfileHeight height(profile);
BOOST_FOREACH (const Stop& s, stops)
output (s.distance, height (s.distance));

The ProfileHeight class would simply keep an iterator on its current
position in height data with relevant associated partial sums
(distance, heigh).

Another solution is to have profile provides the calculation of
distance from a sorted iterator range, written in output iterator (or
functor):

template<class TFwdItInput, class TOutputIt>
TOutputIt Profile::height(TFwdItInput it, TFwdItInput end, TOutputIt
out )
{
for( ; it != end ; ++it )
{
// compute height for it keeping context for next iteration
*out++ = std::make_pair(*it,height);
// functor: out(*it;height);
}
return out;
}

--
Michael

Michael Doubez, Mar 29, 2011
4. ### Vidar HasfjordGuest

Hi Ralph,

On Mar 29, 3:20 am, ralph <> wrote:
> I'm not sure I understand what you are trying to do... Well... strike
> that. I'm positive I don't understand what you are trying to do. But
> since I've never let not knowing all the facts deter me from providing
> an opinion ...

Thank you for your reply; it is very kind of you to take the time,
despite my failure to clearly put across the problem. I was concious
of making my post too long-winded and complex, so in my attempt to
simplify, I probably made it unclear and too abstract.

> It appears you have some kind of View thingy that needs to render Stop
> thingies differently based on their position known by the View thingy.

Actually, the bus-stop example is just theoretical. My field involves
similar things though; such as calculating road centre-lines on the
map, longitudinal sections (peaks and troughs), cross-sections,
earthwork and that kind of stuff. So it involves a lot of work with
geometrical objects and queries on these objects; such as the height
of a point on a profile given the horizontal distance along it. To do
various calculations I write code that combine these queries, and a
lot of the time the functions combine in the way described in my
original post.

> If these thingies need to traverse complex data structures, or in
> anyway get involved with "exchanging the parallel state of the
> traversal" or other mystical processes, then I suggest you might have
> wonderful structures and algorithms but you don't have any first-class
> objects nor thought-out rules for working with them.

The thing is that I have very nicely encapsulated object-oriented
structures with simple implementations of key queries; such as the
height of a profile. The problem is one of efficiency, and it occurs
when one function runs in a loop (n) where each iteration makes a
query that itself needs to loop (m) to calculate the result. The
result is a quadratic number of iterations (n*m).

But, sometimes there is an opportunity for the loops to progress in
parallel (along distance in my example), thus reducing the number of
iterations drastically (n+m). The problem is: my nice encapsulated
design doesn't support that, and it is unclear what the cleanest
solution to the problem is. In my bus-stop example, the brute-force
solution would be to re-implement the height function within the bus-
stop loop, keeping track of where we are on the profile along the way.
This totally breaks the encapsulation of the profile height
calculation.

To make things more concrete, I've included a complete code example
below based on a simpler scenario. Note the messy implementation of
the optimized parallel traversal. The aim is to somehow structure the
code so that the query remains encapsulated and usage remains simple
and clean; while running just as fast.

> Go back to the whiteboard.

In spirit, that has been my location throughout.

My aim is to rewrite and improve a lot of my code, so thanks again for
your reply. It has forced me to more clearly formulate the problem,
and it has helped me better characterize it and recognize when it
applies. In fact, a better subject title might be "Optimizing ordered
queries" or something like that.

Here is an attempt at a more concise characterization of the problem:

* Function A makes a series of calls (queries) to function B.
* The queries are ordered in terms of a query argument x.
* B iterates in the same domain and order as x.

The above characteristics indicate that the code can be optimized by
using a single traversal over the domain of the query argument and
keeping parallel track of the state of A and B.

Here's the code example:

<code>
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
using namespace std;

struct Floor
{
int height;
};

struct Building
{
typedef vector <Floor> Floors;
Floors floors;

int floor (int height)
{
int hf = 0;
const int nf = floors.size ();
for (int i = 0; i != nf; ++i)
{
hf += floors .height;
if (height < hf) return i;
}
return nf;
}
};

struct Object
{
int height;
};

bool less_height (const Object& a, const Object& b)
{
return a.height < b.height;
}

void output (int height, int floor)
{
cout << "object height: " << height
<< ", floor: " << floor << "\n";
}

int main ()
{
Building building;
const int nf = 6;
Floor floors [nf] = {5, 4, 4, 4, 3, 3};
building.floors.assign (floors, floors + nf);

const int no = 5;
Object objects [no] = {21, 7, 14, 11, 8};

cout << "Unordered traversal:\n\n";
for (int i = 0; i != no; ++i)
{
int h = objects .height;
output (h, building.floor (h));
}

// Now let's sort the objects on height.

sort (objects, objects + no, &less_height);
cout << "\nOrdered traversal:\n\n";
for (int i = 0; i != no; ++i)
{
int h = objects .height;
output (h, building.floor (h));
}

// Keeping the objects sorted opens an opportunity
// to optimize algorithms that traverse them.

cout << "\nParallel traversal:\n\n";
{
int hf = 0; // absolute height of floor
int io = 0; // object index
for (int i = 0; i != nf; ++i)
{
hf += floors .height;
for (; io != no; ++io)
{
int h = objects [io].height;
if (h > hf) break;
output (h, i);
}
}
}

cin.get (); // pause
return 0;
}
</code>

Regards,
Vidar Hasfjord

Vidar Hasfjord, Mar 30, 2011
5. ### Vidar HasfjordGuest

Hi Michael,

On Mar 29, 9:20 am, Michael Doubez <> wrote:
> You could use a proxy for computing the height that memoize the result/
> state of the previous computation and call
>
> ProfileHeight height(profile);
> BOOST_FOREACH (const Stop& s, stops)
> output (s.distance, height (s.distance));

Nice. This is the solution I'm leaning towards. The most important
characteristics are encapsulation and scalability. It scores better in
the latter respect compared to your alternative suggestion (iterator/
callback). E.g. if there is a second query involved in the calculation
which also iterates in the same domain, this solution will support it
nicely.

Here's an implementation for the example I posted to Ralph:

<code>
struct FloorQuery
{
Building* b;
int floor;
int height;

FloorQuery (Building& br)
: b (&br), floor (0), height (0)
{height += b->floors [0].height;}

int operator () (int h)
{
while (h > height)
{
++floor;
height = (floor == b->floors.size ()) ?
numeric_limits <int>::max () :
height + b->floors [floor].height;
}
return floor;
}
};

cout << "\nParallel traversal 2:\n\n";
{
FloorQuery get_floor (building);
for (int i = 0; i != no; ++i)
{
int h = objects .height;
output (h, get_floor (h));
}
}
</code>

> Another solution is to have profile provides the calculation of
> distance from a sorted iterator range, written in output iterator (or
> functor):

You mean "calculation of height for a sorted sequence of distances
given by an iterator range", I presume. It is an interesting
alternative, but it seems less scalable than your previous suggestion.
For example, what if our function involved querying both the height
and curvature of the profile? You could use result vectors to hold the
results of the queries and then compute on those, but this requires
extra memory and seems less elegant than your previous suggestion. The
callback alternative would not scale at all, as far as I can see,
although the outer function could be implemented as a functor and
passed to a generic algorithm such as for_each. For example,

for_each(objects, objects + no, CalcFloor (building));