Rationale behind copy semantics in STL containers.

D

dragoncoder

Just a simple theoritical question to the experts.

What was the rationale behind making STL containers follow copy
semantics rather than reference semantics. References almost always
make things easier without much of overhead. Then why not reference ?

Thanks
/P
 
M

Mark P

dragoncoder said:
Just a simple theoritical question to the experts.

What was the rationale behind making STL containers follow copy
semantics rather than reference semantics. References almost always
make things easier without much of overhead. Then why not reference ?

Thanks
/P

Because to have a valid reference you must somewhere else have an object
and you must therefore be aware of the lifetime of that object. This
leaves open the possibilities of dangling references and memory leaks.
STL containers own their objects and clean up after themselves when the
container is destructed, essentially solving both of these problems.

If you really want the equivalent of a stored reference, you can make a
container of pointers-- likely that's what a reference is "under the
hood" anyway.
 
P

persenaama

Mark P kirjoitti:
If you really want the equivalent of a stored reference, you can make a
container of pointers-- likely that's what a reference is "under the
hood" anyway.

Or more precisely, a memory address. ;)
 
P

peter koch

dragoncoder said:
Just a simple theoritical question to the experts.

What was the rationale behind making STL containers follow copy
semantics rather than reference semantics. References almost always
make things easier without much of overhead. Then why not reference ?
Because C and C++ always has been value-based and always has allowed
you to use pointers in case you really wanted "reference" semantics. (I
believe the wording is from Java - thus it is misleading in a C++
context). And references do have quite a lot of overhead: that is why
"simple objects" such as integers are value-based in languages such as
Java (needing "boxing" to work as real objects).

/Peter
 
B

bjarne

peter said:
Because C and C++ always has been value-based and always has allowed
you to use pointers in case you really wanted "reference" semantics. (I
believe the wording is from Java - thus it is misleading in a C++
context). And references do have quite a lot of overhead: that is why
"simple objects" such as integers are value-based in languages such as
Java (needing "boxing" to work as real objects).

/Peter

and containers of built-in types (e.g. int and double) and value types
(e.g., complex and pair) are very common and important.

-- Bjarne Stroustrup; http://www.reasearch.att.com/~bs
 
T

Tony

Peter:

Can you quantify/explain, "references do have quite a lot of overhead" a
bit?

Tony
 
E

eriwik

Can you quantify/explain, "references do have quite a lot of overhead" a
bit?

When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance.
 
T

Tony

message



Can you quantify/explain, "references do have quite a lot of overhead" a
bit?

"When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance."

OK. I thought he (you?) meant that references had some kind of overhead
over that which pointers have (something behind the scenes). The above
is obvious. "references do have quite a lot of overhead" sounds like
something more than just the extra level of indirection.

Tony
 
E

eriwik

memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance."

OK. I thought he (you?) meant that references had some kind of overhead
over that which pointers have (something behind the scenes). The above
is obvious. "references do have quite a lot of overhead" sounds like
something more than just the extra level of indirection.

Maybe, I don't know what he was thinking of, but as others have pointed
out reference semantics is not clearly defined in C++ since we have
both pointers and references and neither of them are the same thing as
in Java or C#. <speculation> In those languages the references are more
than just pointers since they are also used to manage garbage
collection, this might have some kind of impact on performance too.
</speculation>

Notice that in certain applications the overhead of the extra
indirection can be quite a large part of the total time needed to
perform an operation, say for example if you have a really large
std::vector<int*> and you go through it sum up all the elements, since
the actual operation is quite fast the overhead could be quite
noticeable, if you on the other hand performed a complicated operation
on each element the impact can be relatively small compared to the
total running time.
 
R

Roland Pibinger

STL uses 'value semantics' because Alexander Stepanov's view of
programming is rooted in the functional programming 'paradigm' which
is value based (Stepanov is the designer of STL). Anyone interested in
STL should know the basics of functional programming.
Generic programming (programing with templates) per se is not bound to
(i.e. independant of) value semantics. Generic containers and
functions (algorithms) can be designed with reference semantics alike
(just not by Stepanov).
Interestingly, the the theoretical background of STL is never
explained, at least not in the available books and (online) articles I
know. The probably best-selling STL-book, Scott Meyer's 'Effective
STL', not even mentions value semantics. Some information can be found
at the collection of Stepanov's papers: http://www.stepanovpapers.com/


Best wishes,
Roland Pibinger
 
M

Mirek Fidler

When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance.

Yes. Great. Means we have container library able to store just values.

Translates in container library really efficient only for fundamental
and trivial concrete types.

Therefore you have to store anything non-copyable (or hard to copyable)
as pointer.

Means we are getting all troubles described above, plus the problem of
that container is no more able to manage object lifetime.

Regards,

Mirek
 
M

Mirek Fidler

Generic programming (programing with templates) per se is not bound to
(i.e. independant of) value semantics. Generic containers and
functions (algorithms) can be designed with reference semantics alike
(just not by Stepanov).

BTW, lately I am less sure about permutating algorithms. Maybe things
like "sort" or "remove_if" should rather be implemented as container
methods. (I will most likely do that in next U++ Core iteration).

In fact, even current value based STL has to deal with this to some
degree.

Mirek
 
P

peter koch

Tony skrev:
"When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance."

OK. I thought he (you?) meant that references had some kind of overhead
over that which pointers have (something behind the scenes). The above
is obvious. "references do have quite a lot of overhead" sounds like
something more than just the extra level of indirection.

Well, I do not believe you should call it "just one extra level of
indirection". It is something that easily equates to a factor of 10 (or
more) in performance on modern processors (due to cache-misses).

/Peter
 
R

Roland Pibinger

Yes. Great. Means we have container library able to store just values.

Yes, because it was designed only for values but not for objects. Not
for some obscure performance reasons.
Translates in container library really efficient only for fundamental
and trivial concrete types.

Therefore you have to store anything non-copyable (or hard to copyable)
as pointer.

You have to store pointers anyway (unless you use some inplace
creation method). The user interface need not be a 'pointer
interface'. But i guess you know that already :)
Means we are getting all troubles described above, plus the problem of
that container is no more able to manage object lifetime.

Don't fall into the same trap as Boost. Containment and ownership are
distinct and independant responsibilities.

Best wishes,
Roland Pibinger
 
R

Roland Pibinger

BTW, lately I am less sure about permutating algorithms. Maybe things
like "sort" or "remove_if" should rather be implemented as container
methods. (I will most likely do that in next U++ Core iteration).

'sort' should be a container function, at least for convenience. Other
mutating algorithms must somehow know that they have to swap
(underlying) pointers instead of pointed-to objects.
Any fundamental changes in the next U++ Core version?

Best wishes,
Roland Pibinger
 
M

Mirek Fidler

Roland said:
'sort' should be a container function, at least for convenience. Other
mutating algorithms must somehow know that they have to swap
(underlying) pointers instead of pointed-to objects.

Unfortunately, the problem is that swapping is often not enough... E.g.
you cannot implement stable_sort or remove_if using the swap (AFAIK!
:).
Any fundamental changes in the next U++ Core version?

No, nothing really fundamental. Maybe getting rid of last traces of
iterators and this one.

U++ Core works as expected (means great) for years now... Actually, the
only trouble is exactly this - stable sort of polymorphic containers.
Easy to be implemented as method, impossible as algorithm. OTOH, in
reality, I never really needed stable sort of polymorphic container :)

Mirek
 
M

Mirek Fidler

Don't fall into the same trap as Boost. Containment and ownership are
distinct and independant responsibilities.

Well, IME it is really a good idea to merge two.

Mirek
 
E

Evan

Tony said:
"When using references/pointers you need to perform an additional
memory-access, first one when you get the pointer from the container
and a second one when following the pointer to the actual object. Take
for example a vector (in which the contained elements are stored
contiguously), what you need to keep in the processors cache when
working with it can be the iterator and the contained elements, if you
use pointers you need to keep the iterator, the contained elements
_and_ the actual data, which might be spread all over the place leading
to repeated cache-misses and severely affecting performance."

OK. I thought he (you?) meant that references had some kind of overhead
over that which pointers have (something behind the scenes). The above
is obvious. "references do have quite a lot of overhead" sounds like
something more than just the extra level of indirection.

There's also a hidden cost of pretty much destroying optimization
opportunities on the part of traditional compilers.

AFAIK, compilers won't register allocate pointed-to values. Not really
a problem in containers, but to efficently use primitive types if they
were accessed through an indirection would require compilers to get
smarter.

Evan
 
T

Tony

peter koch said:
Tony skrev:

Well, I do not believe you should call it "just one extra level of
indirection". It is something that easily equates to a factor of 10 (or
more) in performance on modern processors (due to cache-misses).

The cache misses are a container implementation issue and not something
inherent to references.

Tony
 
P

peter koch

Tony skrev:
The cache misses are a container implementation issue and not something
inherent to references.

They certainly can't be a container implementation issue. At best it
can be a memory allocation issue (requiring subsequent allocation to be
sequential in memory), but that would require that those allocations
happens in an optimal order - e.g. allocate element n before element n
+1 and require no allocations in between.

/Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top