std::list iterators and swapping

J

Juha Nieminen

Assume we have this:

std::list<Type> list1(10, 1), list2(20, 2);
std::list<Type>::iterator iter = list1.end();
list1.swap(list2);

What happens here, according to the standard?

1) 'iter' still points to list1::end().
2) 'iter' now points to list2::end().
3) Undefined behavior.
 
H

Howard Hinnant

Since 'swap' is required to invalidate *no iterators*, I believe the '2'
is correct.  Iterators by definition "point to elements".  There is an
imaginary element "one past the end of the collection".  It sits right
after the last one.  Since the "one past the last" in list1 follows "the
last one" in list1, and that element _migrates_ to 'list2' (and becomes
its "last one"), then it's logical to conclude that the iterator that
pointed to the end of list1 after swapping will point to the end of the
list2...

Tricky area. The definition of "invalidate" is probably a little
fuzzier than it should be. Certainly if an iterator is not
invalidated then (if it was dereferenceable) if you dereference it you
should get the same thing. But if you increment or decrement it
should you get an iterator pointing to the same sibling?

The splice example points to a counter example of my previous
statement. If you splice from one list to another, the iterators are
not really invalidated. C++03 says they are, but C++0X working draft
says they aren't. And in all existing implementations, the
outstanding iterators referring to spliced elements continue to point
to the spliced elements, now in another list. But the neighbors of
these spliced elements (found via increment or decrement of the
outstanding iterators) are not necessarily the same as before the
splice.

In the case of end(), invalidation is particularly ill-defined. One
can not dereference end() to ensure that it is still pointing to the
same place. All one can do is associate it with its predecessor by
decrementing it. Thus detecting "invalidation" of end() is
problematic at least by current definition (or lack thereof) of
"invalidation".

In the case of list, the OP's question is quite relevant to this area
as there are two implementation techniques of list where the OP
detects the difference in implementations (which is probably what
prompted the post in the first place).
std::list<Type> list1(10, 1), list2(20, 2);
std::list<Type>::iterator iter = list1.end();
list1.swap(list2);
What happens here, according to the standard?
1) 'iter' still points to list1::end().
2) 'iter' now points to list2::end().

Some implementations will satisfy (1), while others will satisfy (2).
And there are good reasons for both implementations.

In the early days, all implementations satisfied (2). These
implementations worked by creating a "ghost node" for end() to point
to on the heap. This node was just like all of the other nodes in the
list except that it held a spot for T which was simply never
constructed. The implications of this is that even the default
constructor of list needed to allocate a "ghost node" so that end()
would have something to point to. Naturally under swap, two lists
would swap "ghost nodes" and thus any outstanding end() iterators
would be swapped (exactly analogous to vector).

In the early part of the 2000 decade Metrowerks' CodeWarrior switched
to an "embedded end node". This is somewhat analogous to the "short
string optimization". Instead of the end node being allocated on the
heap, it was allocated within the list class itself as a member. This
has a couple of advantages:

1. The default constructor no longer needs to allocate an end node on
the heap. end() simply points to within the list class itself.
2. In C++0X this design means that the list move constructor can be
nothrow since no resources need be left behind in the moved-from
source.

A few years later the FSF (gcc) independently developed this same
design. But with this design, end() is not swapped when the lists are
swapped.

Now we have multiple and widespread STL implementations with both (1)
and (2) behaviors. A LWG issue is not out of the question on this
subject (you can open a LWG issue by emailing me). My preferred
resolution would be to say that one can not detect invalidation of an
end() iterator, at least after a swap or splice. At the very least, I
do not want to make illegal the "embedded end node" implementation for
node based containers such as list (and map, set, etc.). Nothrow
default constructors and nothrow move constructors serve clients very
well imho. They are wicked fast compared to constructors which must
allocate memory.

-Howard
 
J

Juha Nieminen

Howard said:
The splice example points to a counter example of my previous
statement. If you splice from one list to another, the iterators are
not really invalidated. C++03 says they are, but C++0X working draft
says they aren't. And in all existing implementations, the
outstanding iterators referring to spliced elements continue to point
to the spliced elements, now in another list.

Couldn't such a requirement (ie. splicing never invalidates iterators)
theoretically cause problems with custom allocators?

Assume this:

MyAllocator<int> alloc1(1); // Use allocation strategy 1
MyAllocator<int> alloc2(2); // Use allocation strategy 2, which
// is completely different from 1

std::list<int, MyAllocator<int> > list1(10, 1, alloc1);
std::list<int, MyAllocator<int> > list2(20, 2, alloc2);

list1.splice(list1.begin(), list2);

The question is now: Can std::list simply link the last element of
list2 with the first element of list1 and then take hold of the first
element of list2 (and make list2 empty), given that list1 and list2 are
using *different* allocators (even though they are of the same type)?

If it did that, it would own elements allocated differently, using a
different allocator, and when it's for example destroyed, it will
attempt to deallocate the spliced elements with the wrong allocator.

Obviously if the allocators in list1 and list2 differ, list1 must
re-allocate the elements during the splicing using the allocator given
to list1. The question is: Can this be done without invalidating
iterators pointing to the spliced elements?

AFAIK it's not only the 'int' element which must be re-allocated, but
the entire list node structure (the one which contains the prev and next
pointers). Thus a simple re-allocation and assignment of the 'int' value
itself will not be enough.

If the new standard really demands for iterators pointing to spliced
elements to not to be invalidated, then it must put some constraints on
what kind of allocators you can use with a std::list. This would seem a
bit odd.
In the early part of the 2000 decade Metrowerks' CodeWarrior switched
to an "embedded end node". This is somewhat analogous to the "short
string optimization". Instead of the end node being allocated on the
heap, it was allocated within the list class itself as a member.

One advantage of allocating the end node on the heap is that it
doesn't need to be constructed. In other words, space can be allocated
for it, but the list element inside it doesn't need to be constructed.
There may be cases where constructing such an extra element would be
counter-productive, if not outright hindering (for example if the
element constructor always allocates an enormous array, for instance,
something the user might not want the list data container doing for no
reason).

Or, put in another way: You can allocate an "end node" which has its
"prev" and "next" pointers, but where the element itself has not been
needlessly constructed (because it's not used for anything, and
dereferencing an iterator pointing to the end node is undefined behavior
anyways).

Is it possible to achieve this same thing with a class member?
(I would be interested in knowing the trick, if there exists one.)
 
J

James Kanze

Couldn't such a requirement (ie. splicing never invalidates
iterators) theoretically cause problems with custom
allocators?
Assume this:
MyAllocator<int> alloc1(1); // Use allocation strategy 1
MyAllocator<int> alloc2(2); // Use allocation strategy 2, which
// is completely different from 1
std::list<int, MyAllocator<int> > list1(10, 1, alloc1);
std::list<int, MyAllocator<int> > list2(20, 2, alloc2);
list1.splice(list1.begin(), list2);
The question is now: Can std::list simply link the last
element of list2 with the first element of list1 and then take
hold of the first element of list2 (and make list2 empty),
given that list1 and list2 are using *different* allocators
(even though they are of the same type)?
If it did that, it would own elements allocated differently,
using a different allocator, and when it's for example
destroyed, it will attempt to deallocate the spliced elements
with the wrong allocator.

That's probably worth a defect report, if there isn't already
one. I don't see off hand how you can implement splice if the
allocators compare unequal. (Does anyone know what Dinkumware
does? I believe they claim to support unequal allocators.)
Obviously if the allocators in list1 and list2 differ, list1
must re-allocate the elements during the splicing using the
allocator given to list1. The question is: Can this be done
without invalidating iterators pointing to the spliced
elements?

Can it be done in constant time: splice is required to have
constant time (and is uninteresting if it doesn't).
One advantage of allocating the end node on the heap is that it
doesn't need to be constructed.

Yes and no. Whether the node is on the heap or part of the
object doesn't change anything here: the pointers in the node
need to be initialized, but the data element doesn't need to be
constructed (and can't be constructed, because the class has no
initial element to copy, and it doesn't have to support default
construction). In pre-standard GB_DLList, the memory for the
data element wasn't even allocated.
Or, put in another way: You can allocate an "end node" which
has its "prev" and "next" pointers, but where the element
itself has not been needlessly constructed (because it's not
used for anything, and dereferencing an iterator pointing to
the end node is undefined behavior anyways).
Is it possible to achieve this same thing with a class
member? (I would be interested in knowing the trick, if
there exists one.)

Obviously, it's possible, since the all of the implementations
which have the node as a class member do it. And are required
to, by the standard.

The simplest solution (in my mind, anyway) uses inheritance: you
have a BaseNode with the pointers, and a DerivedNode which
contains the memory for the data. In my pre-standard
implementation, an "unsigned char buffer[ sizeof( T ) ]", with
some additional tricks to ensure alignment. I think the
standard more or less requires this as well, since the
implementation is required to use the allocator, and the
allocator separates allocation and initialization. And since
dereferencing the end iterator is undefined behavior, it can be
a simple BaseNode, without even the memory for the data.
 
B

Bo Persson

James said:
That's probably worth a defect report, if there isn't already
one. I don't see off hand how you can implement splice if the
allocators compare unequal. (Does anyone know what Dinkumware
does? I believe they claim to support unequal allocators.)

They create new copies of the elements, if the allocators compare
unequal.

Pretty useless, but with a smaller support problem than crashing. :)


Bo Persson
 
J

Juha Nieminen

Pete said:
When the allocators don't compare equal, you have to copy the elements.

That's what I thought, and that's why it sounded odd that the new
standard requires for iterators pointing to spliced elements to not to
be invalidated.

If there were an additional condition "unless the allocators in both
lists compare unequal", then it would make sense.
 
J

Juha Nieminen

James said:
The simplest solution (in my mind, anyway) uses inheritance: you
have a BaseNode with the pointers, and a DerivedNode which
contains the memory for the data.

Do you mean that you have a BaseNode object as member (as the end
node), but an iterator pointing to it would have a DerivedNode type
pointer pointing to this BaseNode type object? Is this even allowed?
 
J

James Kanze

Actually, you don't, because...
That's what I thought, and that's why it sounded odd that the
new standard requires for iterators pointing to spliced
elements to not to be invalidated.
If there were an additional condition "unless the allocators
in both lists compare unequal", then it would make sense.

At least in the latest draft, there is a statement: "The
behavior of splice operations is undefined if get_allocator() !=
x.get_allocator()." Which means that anything the application
does in this case is OK. (And so it doesn't have to copy---it
could reformat the hard disk instead, for example. Although
from a QoI point of view, I'd prefer the deep copy. Or better
yet, an assertion failure.)
 
J

James Kanze

Do you mean that you have a BaseNode object as member (as the
end node),
Yes.

but an iterator pointing to it would have a
DerivedNode type pointer pointing to this BaseNode type
object?

No. All pointers are always to the BaseNode; the element()
function of my iterator (which corresponds more or less to the
operator*() of a standard iterator) used a static_cast (actually
a C style cast, back then) to convert the pointer, e.g.:

return static_cast said:
Is this even allowed?

Having a pointer to base which actually points to a derived is
certainly allowed:). And templates (actually <generic.h>, back
then) guarantee the type, so the static_cast downcast is
guaranteed (unless you're at the end, of course).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,167
Latest member
SusanaSwan
Top