Framework for iterating over product types?

B

Ben Rudiak-Gould

Background: I have some structs containing std::strings and std::vectors of
other structs containing std::strings and std::vectors of .... I'd like to
make a std::vector of these. Unfortunately the overhead of the useless
copies made each time the vector is resized is too large for me to ignore. I
know that rvalue references fix this problem, but I don't think they'll be
widely available for years and I need something that works now. There's no
sensible value I can pass to reserve(). vector<shared_ptr> is faster, but
the per-item allocation overhead is still significant, and the interface is
different. I could wrap it with the interface I wanted, but that might be
error-prone since it would violate std::vector's memory layout guarantee.

I've seriously thought about writing a linear_vector which moves instead of
copying its elements, perhaps calling

relocate(T* dst, T& src)

which could be overloaded for each type. But writing relocate() for every
type I care about is a hassle; worse, there's no good place to put the
definitions. Putting them in the header with the struct itself feels like an
abstraction violation (why should classes need to anticipate that they might
be put in a linear_vector?), and putting it anywhere else is out of the
question since it will inevitably get out of sync with the struct, leading
to nasty bugs.

I would feel happier about writing boilerplate in each class if it were more
widely applicable. I thought about piggybacking on Boost's serialize()
functions, which often look like this:

struct Foo {
int a, b, c;
std::vector< std::map<std::string,int> > d;

// ...

template<class Archive>
void serialize(Archive& ar, unsigned version) {
ar & a & b & c & d;
}
};

but since you can only call serialize on one instance at once, you'd have to
make separate passes over the source struct and the destination memory. This
is unpleasant at the least, and I'm not sure it could be made to work at
all. But a static method supplying pointers-to-members avoids this problem:

struct Foo {
// ...

template<class T>
static void enum_children(T& has) {
has (&Foo::a) (&Foo::b) (&Foo::c) (&Foo::d);
}
};

Then I could write something like:

template<class T, class X>
void relocate_member(T* dst, T& src, X T::* mbr) {
relocate(&dst->*mbr, src.*mbr);
}

// SFINAE magic omitted
template<class T>
void relocate(T* dst, T& src) {
T::enum_children(bind(&relocate_member, dst, src));
}

and furthermore I could write generic implementations of a lot of other
useful functions, like

* a better default swap() than the standard library's

* componentwise operator==() and lexicographical operator<()

* Boost serialize()

* iostream inserters and extracters using (e.g.) a notation
resembling initializer lists

* the usual (deep) visitor pattern

all of which are pretty commonly needed and annoying to write by hand. A
similar idea is implemented in Haskell and described in the "scrap your
boilerplate" papers [1], which have additional motivating examples.

[1] http://research.microsoft.com/~simonpj/papers/hmap/

Of course, I don't want to invent my own private version of this technique;
I want to use a standardized library. I can't be the first person to propose
this for C++, but I can't find anything like it in Boost, to my surprise. Am
I missing something?

Addendum: you probably noticed (it took me annoyingly long to notice) that
my relocate() interface is not a very good one, since there is, to my
knowledge, no guarantee in the standard that any non-POD type can be
correctly relocated by relocating its members. I don't think the proposed
C++0x changes fix this. Couldn't there be a guarantee that a struct like

struct X { A a; B b; };

can be placement-constructed by placement-constructing its members,
regardless of the POD-ness of those members? This is a kind of "shallow POD"
as distinguished from the usual "deep" POD. Has this been discussed?

-- Ben
 
A

Alf P. Steinbach

* Ben Rudiak-Gould:
Background: I have some structs containing std::strings and std::vectors
of other structs containing std::strings and std::vectors of .... I'd
like to make a std::vector of these. Unfortunately the overhead of the
useless copies made each time the vector is resized is too large for me
to ignore.

Indirection.

Cheers, & hth.,

- Alf
 
K

Kai-Uwe Bux

Ben said:
Background: I have some structs containing std::strings and std::vectors
of other structs containing std::strings and std::vectors of .... I'd like
to make a std::vector of these. Unfortunately the overhead of the useless
copies made each time the vector is resized is too large for me to ignore.
I know that rvalue references fix this problem, but I don't think they'll
be widely available for years and I need something that works now. There's
no sensible value I can pass to reserve(). vector<shared_ptr> is faster,
but the per-item allocation overhead is still significant, and the
interface is different. I could wrap it with the interface I wanted, but
that might be error-prone since it would violate std::vector's memory
layout guarantee.

I've seriously thought about writing a linear_vector which moves instead
of copying its elements, perhaps calling

relocate(T* dst, T& src)

which could be overloaded for each type. But writing relocate() for every
type I care about is a hassle; worse, there's no good place to put the
definitions. Putting them in the header with the struct itself feels like
an abstraction violation (why should classes need to anticipate that they
might be put in a linear_vector?),

Because that is a conceptual requirement of linear_vector. It makes perfect
sense to put the relocate function with the class so that it is found
through ADL. This is just the same as all other conceptual requirements
like CopyConstructible or Swappable (in the current draft for C++0X).

and putting it anywhere else is out of
the question since it will inevitably get out of sync with the struct,
leading to nasty bugs.

Agreed.


[snip]


Best

Kai-Uwe Bux
 
K

Kai-Uwe Bux

Roland said:
Have you really understood what he means? I haven't. Or have you
merely read the first paragraph?

I read further, but the OP lost me somewhere around the lines

struct Foo {
// ...

template<class T>
static void enum_children(T& has) {
has (&Foo::a) (&Foo::b) (&Foo::c) (&Foo::d);
}
};


Best

Kai-Uwe Bux
 
A

Alf P. Steinbach

* Roland Pibinger:
Have you really understood what he means?

Enough of it.

I haven't.

Neither have I, the article is not a great example of clarity. However,
disregarding the proposed solutions discussed later on, and second-level
problems arising from that, the main problem seems to be summarized by
what I quoted. A desire to avoid copying when using a std::vector.

Or have you
merely read the first paragraph?

No, I read it all and quoted what was relevant.

Cheers,

- Alf
 
M

Marco Manfredini

The whole boilerplate idea looks suspiciously like runtime reflection to
me and while it may be challenging to develop a system within C++ only,
in practice it turns out to be just annoying (at least to me) and error
prone (again...me).

However, people had success with mining the required information from
source code itself. Generating code or data structures from the output
of gccxml is a very popular choice, for example in the SEAL-REFLEX
project: <http://seal-reflex.web.cern.ch/seal-reflex/index.html>

The boost::python binding library also enumerates the members (to be
used by python) and can also utliize gccxml to generate enumeration
code. You might want to have a look at it.

Regarding the vector-resize problem. I would simply build a custom
vector type, which holds a vector of pointers to small arrays of
constant size (say 16 or 256). Resizing such a vector of N elements,
only requires copying N/16 pointers, access by an integer index is a bit
more expensive of course (two indirections by [i/16] and [i%16]) but not
really much.
 
K

Kai-Uwe Bux

Ben said:
Background: I have some structs containing std::strings and std::vectors
of other structs containing std::strings and std::vectors of .... I'd like
to make a std::vector of these. Unfortunately the overhead of the useless
copies made each time the vector is resized is too large for me to ignore.

Another thought: if you don't need the contiguity guarantee of std::vector,
you could use std::deque instead.


Best

Kai-Uwe Bux
 
B

Ben Rudiak-Gould

Alf said:
Neither have I, the article is not a great example of clarity.

I seriously regret writing it in the bottom-up way that I did. The initial
motivating example was almost irrelevant. Here's a shorter version.

There are many classes for which swap() can be defined like this:

swap(this->a, other.a);
swap(this->b, other.b);
swap(this->c, other.c);
// ...

and serialize() can be defined like this:

ar & a;
ar & b;
ar & c;
// ...

and operator==() can be defined like this:

if (!(this->a == other.a)) return false;
if (!(this->b == other.b)) return false;
if (!(this->c == other.c)) return false;
// ...

and operator<() can be defined like this:

if (this->a < other.a) return true;
if (other.a < this->a) return false;
if (this->b < other.b) return true;
if (other.b < this->b) return false;
if (this->c < other.c) return true;
if (other.c < this->c) return false;
// ...

and so on. It's a hassle to write this boilerplate for a large number of
user-defined classes. All of the above boilerplate functions can be
auto-generated from a single per-class boilerplate method looking something
like this:

template<class Visitor>
static void enum_children(Visitor& has) {
has (&Foo::a) (&Foo::b) (&Foo::c) /* ... */ ;
}

for example, here's how to do it for swap():

template<class T>
class swap_member {
public:
swap_member(T& a, T& b) : a(a), b(b) {}

template<class X>
void operator()(X T::* p) {
swap(a.*p, b.*p);
}

private:
T& a; T& b;
};

template<class T>
void generic_swap(T& a, T& b) {
T::enum_children(swap_member(a,b));
}

For a system with a lot of boilerplate methods of this kind, this technique
looks like it could save a lot of time and avoid a lot of bugs. I was
surprised not to find support for it in Boost. I can write my own
implementation, but I'd rather use someone else's, or at least use a
standardized name for the method (instead of "enum_children") for future
compatibility. So, has anyone implemented this before or chosen a naming
convention for it?

-- Ben
 
V

Victor Bazarov

Ben said:
[..]
There are many classes for which swap() can be defined like this:

swap(this->a, other.a);
swap(this->b, other.b);
swap(this->c, other.c);
// ...

and serialize() can be defined like this:

ar & a;
ar & b;
ar & c;
// ...

and operator==() can be defined like this:

if (!(this->a == other.a)) return false;
if (!(this->b == other.b)) return false;
if (!(this->c == other.c)) return false;
// ...

and operator<() can be defined like this:

if (this->a < other.a) return true;
if (other.a < this->a) return false;
if (this->b < other.b) return true;
if (other.b < this->b) return false;
if (this->c < other.c) return true;
if (other.c < this->c) return false;
// ...

and so on. It's a hassle to write this boilerplate for a large number
of user-defined classes. All of the above boilerplate functions can be
auto-generated from a single per-class boilerplate method looking
something like this:

template<class Visitor>
static void enum_children(Visitor& has) {
has (&Foo::a) (&Foo::b) (&Foo::c) /* ... */ ;
}

for example, here's how to do it for swap():

template<class T>
class swap_member {
public:
swap_member(T& a, T& b) : a(a), b(b) {}

template<class X>
void operator()(X T::* p) {
swap(a.*p, b.*p);
}

private:
T& a; T& b;
};

template<class T>
void generic_swap(T& a, T& b) {
T::enum_children(swap_member(a,b));
}

For a system with a lot of boilerplate methods of this kind, this
technique looks like it could save a lot of time and avoid a lot of
bugs. I was surprised not to find support for it in Boost. I can
write my own implementation, but I'd rather use someone else's, or at
least use a standardized name for the method (instead of
"enum_children") for future compatibility. So, has anyone implemented
this before or chosen a naming convention for it?

In all of my programming career the need for a struct/class like that
have not arisen even once. Forgive my bluntness, but it looks like
an exemplary representation of a pure numerical (and rather table-
oriented) data chunk. For such data it is advised NOT to write
operator<, operator==, etc., as member functions. Use stand-alone
functions instead.

I honestly doubt that you have "a large number of user-defined classes"
that all look like the one you presented here. Two, three, maybe, but
"a large number"?

As to whether it's been done before, I am not sure, but if it hasn't,
write it (since you claim you can) and submit it to Boost. That's
how Boost grows. You will undoubtedly find those who also need the
functionality; they might even be interested in helping you write it,
and if not, no big deal, they'll tell you what's wrong with yours.
If you don't find people who need something like that, well, you may
be alone, but they you're stuck implementing it for yourself anyway.

And to simplify the task I'd probably put those 'a', 'b', 'c', etc.
members in arrays of the respective types (hopefully they are all
pretty much the same, aren't they?), and to access them I'd have the
member functions that would simply returned the array elements (by
ref or by value)...

V
 
B

Ben Rudiak-Gould

Victor said:
In all of my programming career the need for a struct/class like that
have not arisen even once.

What kind of struct/class do you mean? Obviously I don't have a class with
members named 'a', 'b', 'c', etc. and all of those functions defined as
members. The letters of the alphabet were just placeholders for actual
member names. All that's relevant for the purposes of this idiom is the
pattern of code that repeats once for each data member of a class. It
doesn't matter if we're talking about member operator< or nonmember
operator< or member "less" or nonmember "less". I'm also not saying that I
have a bunch of classes defining every single one of the above methods, just
that I often find myself writing function bodies that look like this.
As to whether it's been done before, I am not sure, but if it hasn't,
write it (since you claim you can) and submit it to Boost.

I don't think I can write it well. Writing generic code that seems to work
most of the time is a lot easier than writing generic code that works. Also,
I just have a hard time believing that no one has formalized this pattern in
decades of C++ use. But if it really is new, I might indeed submit it to
Boost and hope their code review catches my mistakes. It would be a good
learning experience.

-- Ben
 
C

coal

I seriously regret writing it in the bottom-up way that I did. The initial
motivating example was almost irrelevant. Here's a shorter version.

There are many classes for which swap() can be defined like this:

swap(this->a, other.a);
swap(this->b, other.b);
swap(this->c, other.c);
// ...

and serialize() can be defined like this:

ar & a;
ar & b;
ar & c;
// ...

and operator==() can be defined like this:

if (!(this->a == other.a)) return false;
if (!(this->b == other.b)) return false;
if (!(this->c == other.c)) return false;
// ...

and operator<() can be defined like this:

if (this->a < other.a) return true;
if (other.a < this->a) return false;
if (this->b < other.b) return true;
if (other.b < this->b) return false;
if (this->c < other.c) return true;
if (other.c < this->c) return false;
// ...

and so on. It's a hassle to write this boilerplate for a large number of
user-defined classes. All of the above boilerplate functions can be
auto-generated from a single per-class boilerplate method looking something
like this:

template<class Visitor>
static void enum_children(Visitor& has) {
has (&Foo::a) (&Foo::b) (&Foo::c) /* ... */ ;
}

Thanks for reworking the post... I guess by the above
you mean a function will be applied to each member?

That would probably work in some cases, but what if you
want to swap all the members but not marshall them all?
I think that comes up sometimes.

In my efforts to address similar problems,
www.webebenezer.net, I want to avoid users having to
duplicate members even once. In my opinion, the
language/compilers have to change to better support
the automation of functions like you mention.

Brian Wood
Ebenezer Enterprises
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,058
Latest member
QQXCharlot

Latest Threads

Top