Rationale behind copy semantics in STL containers.

dragoncoder · Oct 25, 2006

Just a simple theoritical question to the experts.

What was the rationale behind making STL containers follow copy
semantics rather than reference semantics. References almost always
make things easier without much of overhead. Then why not reference ?

Thanks
/P

Mark P · Oct 25, 2006

dragoncoder said:
Just a simple theoritical question to the experts.

What was the rationale behind making STL containers follow copy
semantics rather than reference semantics. References almost always
make things easier without much of overhead. Then why not reference ?

Thanks
/P

Because to have a valid reference you must somewhere else have an object
and you must therefore be aware of the lifetime of that object. This
leaves open the possibilities of dangling references and memory leaks.
STL containers own their objects and clean up after themselves when the
container is destructed, essentially solving both of these problems.

If you really want the equivalent of a stored reference, you can make a
container of pointers-- likely that's what a reference is "under the
hood" anyway.

persenaama · Oct 26, 2006

Mark P kirjoitti:

If you really want the equivalent of a stored reference, you can make a
container of pointers-- likely that's what a reference is "under the
hood" anyway.

Or more precisely, a memory address.

peter koch · Oct 26, 2006

dragoncoder said:
Just a simple theoritical question to the experts.

What was the rationale behind making STL containers follow copy
semantics rather than reference semantics. References almost always
make things easier without much of overhead. Then why not reference ?

Because C and C++ always has been value-based and always has allowed
you to use pointers in case you really wanted "reference" semantics. (I
believe the wording is from Java - thus it is misleading in a C++
context). And references do have quite a lot of overhead: that is why
"simple objects" such as integers are value-based in languages such as
Java (needing "boxing" to work as real objects).

/Peter

bjarne · Oct 30, 2006

peter said:
Because C and C++ always has been value-based and always has allowed
you to use pointers in case you really wanted "reference" semantics. (I
believe the wording is from Java - thus it is misleading in a C++
context). And references do have quite a lot of overhead: that is why
"simple objects" such as integers are value-based in languages such as
Java (needing "boxing" to work as real objects).

/Peter

and containers of built-in types (e.g. int and double) and value types
(e.g., complex and pair) are very common and important.

-- Bjarne Stroustrup; http://www.reasearch.att.com/~bs

Tony · Dec 13, 2006

Peter:

Can you quantify/explain, "references do have quite a lot of overhead" a
bit?

Tony

eriwik · Dec 13, 2006

Can you quantify/explain, "references do have quite a lot of overhead" a
bit?

When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance.

Tony · Dec 13, 2006

message

Can you quantify/explain, "references do have quite a lot of overhead" a
bit?

"When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance."

OK. I thought he (you?) meant that references had some kind of overhead
over that which pointers have (something behind the scenes). The above
is obvious. "references do have quite a lot of overhead" sounds like
something more than just the extra level of indirection.

Tony

eriwik · Dec 13, 2006

memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance."

OK. I thought he (you?) meant that references had some kind of overhead
over that which pointers have (something behind the scenes). The above
is obvious. "references do have quite a lot of overhead" sounds like
something more than just the extra level of indirection.

Maybe, I don't know what he was thinking of, but as others have pointed
out reference semantics is not clearly defined in C++ since we have
both pointers and references and neither of them are the same thing as
in Java or C#. <speculation> In those languages the references are more
than just pointers since they are also used to manage garbage
collection, this might have some kind of impact on performance too.
</speculation>

Notice that in certain applications the overhead of the extra
indirection can be quite a large part of the total time needed to
perform an operation, say for example if you have a really large
std::vector<int*> and you go through it sum up all the elements, since
the actual operation is quite fast the overhead could be quite
noticeable, if you on the other hand performed a complicated operation
on each element the impact can be relatively small compared to the
total running time.

Roland Pibinger · Dec 13, 2006

STL uses 'value semantics' because Alexander Stepanov's view of
programming is rooted in the functional programming 'paradigm' which
is value based (Stepanov is the designer of STL). Anyone interested in
STL should know the basics of functional programming.
Generic programming (programing with templates) per se is not bound to
(i.e. independant of) value semantics. Generic containers and
functions (algorithms) can be designed with reference semantics alike
(just not by Stepanov).
Interestingly, the the theoretical background of STL is never
explained, at least not in the available books and (online) articles I
know. The probably best-selling STL-book, Scott Meyer's 'Effective
STL', not even mentions value semantics. Some information can be found
at the collection of Stepanov's papers: http://www.stepanovpapers.com/

Best wishes,
Roland Pibinger

Mirek Fidler · Dec 13, 2006

When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance.

Yes. Great. Means we have container library able to store just values.

Translates in container library really efficient only for fundamental
and trivial concrete types.

Therefore you have to store anything non-copyable (or hard to copyable)
as pointer.

Means we are getting all troubles described above, plus the problem of
that container is no more able to manage object lifetime.

Regards,

Mirek

Mirek Fidler · Dec 13, 2006

Generic programming (programing with templates) per se is not bound to
(i.e. independant of) value semantics. Generic containers and
functions (algorithms) can be designed with reference semantics alike
(just not by Stepanov).

BTW, lately I am less sure about permutating algorithms. Maybe things
like "sort" or "remove_if" should rather be implemented as container
methods. (I will most likely do that in next U++ Core iteration).

In fact, even current value based STL has to deal with this to some
degree.

Mirek

peter koch · Dec 13, 2006

Tony skrev:

"When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance."

OK. I thought he (you?) meant that references had some kind of overhead
over that which pointers have (something behind the scenes). The above
is obvious. "references do have quite a lot of overhead" sounds like
something more than just the extra level of indirection.

Well, I do not believe you should call it "just one extra level of
indirection". It is something that easily equates to a factor of 10 (or
more) in performance on modern processors (due to cache-misses).

/Peter

Roland Pibinger · Dec 13, 2006

Yes. Great. Means we have container library able to store just values.

Yes, because it was designed only for values but not for objects. Not
for some obscure performance reasons.

Translates in container library really efficient only for fundamental
and trivial concrete types.

Therefore you have to store anything non-copyable (or hard to copyable)
as pointer.

You have to store pointers anyway (unless you use some inplace
creation method). The user interface need not be a 'pointer
interface'. But i guess you know that already

Means we are getting all troubles described above, plus the problem of
that container is no more able to manage object lifetime.

Don't fall into the same trap as Boost. Containment and ownership are
distinct and independant responsibilities.

Best wishes,
Roland Pibinger

Roland Pibinger · Dec 13, 2006

BTW, lately I am less sure about permutating algorithms. Maybe things
like "sort" or "remove_if" should rather be implemented as container
methods. (I will most likely do that in next U++ Core iteration).

'sort' should be a container function, at least for convenience. Other
mutating algorithms must somehow know that they have to swap
(underlying) pointers instead of pointed-to objects.
Any fundamental changes in the next U++ Core version?

Best wishes,
Roland Pibinger

Mirek Fidler · Dec 13, 2006

Roland said:
'sort' should be a container function, at least for convenience. Other
mutating algorithms must somehow know that they have to swap
(underlying) pointers instead of pointed-to objects.

Unfortunately, the problem is that swapping is often not enough... E.g.
you cannot implement stable_sort or remove_if using the swap (AFAIK!

.

Any fundamental changes in the next U++ Core version?

No, nothing really fundamental. Maybe getting rid of last traces of
iterators and this one.

U++ Core works as expected (means great) for years now... Actually, the
only trouble is exactly this - stable sort of polymorphic containers.
Easy to be implemented as method, impossible as algorithm. OTOH, in
reality, I never really needed stable sort of polymorphic container

Mirek

Mirek Fidler · Dec 13, 2006

Don't fall into the same trap as Boost. Containment and ownership are

distinct and independant responsibilities.

Well, IME it is really a good idea to merge two.

Mirek

Evan · Dec 13, 2006

Tony said:
"When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance."

OK. I thought he (you?) meant that references had some kind of overhead
over that which pointers have (something behind the scenes). The above
is obvious. "references do have quite a lot of overhead" sounds like
something more than just the extra level of indirection.

There's also a hidden cost of pretty much destroying optimization
opportunities on the part of traditional compilers.

AFAIK, compilers won't register allocate pointed-to values. Not really
a problem in containers, but to efficently use primitive types if they
were accessed through an indirection would require compilers to get
smarter.

Evan

Tony · Dec 13, 2006

peter koch said:
Tony skrev:

Well, I do not believe you should call it "just one extra level of
indirection". It is something that easily equates to a factor of 10 (or
more) in performance on modern processors (due to cache-misses).

The cache misses are a container implementation issue and not something
inherent to references.

Tony

peter koch · Dec 13, 2006

Tony skrev:

The cache misses are a container implementation issue and not something
inherent to references.

They certainly can't be a container implementation issue. At best it
can be a memory allocation issue (requiring subsequent allocation to be
sequential in memory), but that would require that those allocations
happens in an optimal order - e.g. allocate element n before element n
+1 and require no allocations in between.

/Peter

The Semantics of 'volatile'	73	Jun 2, 2009
STL containers in shared memory	14	May 21, 2005
Proc / def / yield semantics (long)	15	Sep 25, 2004
Understanding how to design STL-compatible iterators	4	Nov 29, 2007
Question about STL Vector design	3	Oct 12, 2007
I'm tempted to quit out of frustration	1	Aug 13, 2023
Understanding search queries, semantics, and "Meaning" ...aren't weall looking for meaning?	4	Dec 30, 2008
On the creation of an API for containers in C	14	Oct 26, 2007

Rationale behind copy semantics in STL containers.

dragoncoder

Mark P

persenaama

peter koch

bjarne

Tony

eriwik

Tony

eriwik

Roland Pibinger

Mirek Fidler

Mirek Fidler

peter koch

Roland Pibinger

Roland Pibinger

Mirek Fidler

Mirek Fidler

Evan

Tony

peter koch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads