Resizing arrays in C++?

J

John Harrison

Chris Dearlove said:
John Harrison ([email protected]) wrote:

: Well believe it or not I had good reasons for having a const member.
: Specifically I needed a vector of map<T, U>::value_type. And the C++
: standard defines map<T, U>::value_type as std::pair<const T, U>. Note the
: const.

OK. Now are those pairs still in a map, or have they been removed
from one? (If the answer is some of each I give up.) In the former
case is there a reason not to have a vector not of the pairs but
of iterators to the pairs? In the latter case can you not work
with a vector of non-const pairs converted from the map value type?

(I'm working from the position that unpleasant although these options
may be compared to what you really want, they aren't as unpleasant
as reinventing vector.)

What I'm actually trying to do is develop a general data structure which I'm
calling IndexedMap (add IndexedSet, IndexedMultiMap, and IndexedMultiSet).
This will be very similar to map etc. but the iterators will be random
access and reflect not the ordering of the keys but the order in which the
values were added to the set. A sort of cross between a map and a vector.
You can do it provided you don't mind deletion being O(n) instead of O(lg n)
as it is with map, insert and find remain O(lg n).

My first attempt was something similar to what you suggest, basically
combining a map with a vector and storing links between the two. One thing
bothered my about this however which was that as far as I can tell you need
to store two allocators, one in the map and one in the vector. If the user
wants to use a stateful allocator then they aren't going to be happy.

Then I realised that you could implement a red-black tree algorithm where
all the nodes are stored in a vector and everything would fall into place.
That was when I found out about the assignable requirement of vectors.

I've implemented my vector replacement, its called LightweightVector, didn't
take too long, it has everything that doesn't require you to assign elements
(push_back and pop_back for instance, but no insert or erase), even whole
vector assignment, which is done by copy and swap. As a nice bonus you get
strong exception safety as well. I think I may be using it more often.

john
 
T

tom_usenet

What I'm actually trying to do is develop a general data structure which I'm
calling IndexedMap (add IndexedSet, IndexedMultiMap, and IndexedMultiSet).
This will be very similar to map etc. but the iterators will be random
access and reflect not the ordering of the keys but the order in which the
values were added to the set. A sort of cross between a map and a vector.
You can do it provided you don't mind deletion being O(n) instead of O(lg n)
as it is with map, insert and find remain O(lg n).

My first attempt was something similar to what you suggest, basically
combining a map with a vector and storing links between the two. One thing
bothered my about this however which was that as far as I can tell you need
to store two allocators, one in the map and one in the vector. If the user
wants to use a stateful allocator then they aren't going to be happy.

Allocators should be cheap to copy, and shouldn't take up too much
space. Remember that, for example, std::deque has to hold two copies
of the allocator (one for the segment map, one for the segments), so I
don't think it's a problem your map doing the same.
Then I realised that you could implement a red-black tree algorithm where
all the nodes are stored in a vector and everything would fall into place.
That was when I found out about the assignable requirement of vectors.

I see a slight problem with this - you lose stability of references to
elements, an important feature of node based containers. Also, how do
you erase elements from the map? Do you swap the last element over the
erased one, updating the pointers (presumably you've used vector
indices rather than pointers for stability) in the referencing nodes?
Or do you just mark the node as deleted, and skip it during an
insertion-order-iteration?

Tuning your tree to be as fast as a good std::map implementation may
be hard, but the last few % of performance may not be an issue for you
of course. Testing your implementation for correctness is another
issue, and I bet there's a bug in there somewhere if you haven't
written comprehensive unit tests yet.
I've implemented my vector replacement, its called LightweightVector, didn't
take too long, it has everything that doesn't require you to assign elements
(push_back and pop_back for instance, but no insert or erase), even whole
vector assignment, which is done by copy and swap. As a nice bonus you get
strong exception safety as well. I think I may be using it more often.

This should be easier to implement bug free, but have you implemented
all of the comparison operators, etc.?

Finally, have you looked at boost's multi_index library? It's not part
of an official release yet, but it's in the main CVS.

Tom
 
J

John Harrison

tom_usenet said:
Allocators should be cheap to copy, and shouldn't take up too much
space. Remember that, for example, std::deque has to hold two copies
of the allocator (one for the segment map, one for the segments), so I
don't think it's a problem your map doing the same.

It's not the cost of copying, but the fact that if the user uses a stateful
allocator, then they are going to have trouble accessing that state. To make
a trivial example, suppose the allocator counts allocations made, then with
two allocators we have two counts, presumably the get_allocator method would
return one or other of the allocators and the other count would be lost.

I hadn't realised that std::deque does this already, but then the standard
does give STL implementors license to assume stateless allocators.
Personally I think it would be entirely reasonable for the standard to
mandate that all allocators be stateless.

Of course I didn't abandon my original design solely for this reason. It's
just this issue set me thinking about how I could implement my class with
only a single allocator and when I realised I could, I preferred the new
design anyway.
I see a slight problem with this - you lose stability of references to
elements, an important feature of node based containers. Also, how do
you erase elements from the map? Do you swap the last element over the
erased one, updating the pointers (presumably you've used vector
indices rather than pointers for stability) in the referencing nodes?
Or do you just mark the node as deleted, and skip it during an
insertion-order-iteration?

The main point of my class is that is allows you to efficiently get an index
for any entry added to the container. Because the iterators are random
access you can just subtract the iterator to the start of the container from
the iterator for the inserted element. And of course the reverse, given an
index efficiently get the associated entry.

To answer your other points - I'm not bothered about O(n) deletion time.
Obviously this is a drawback, but it won't matter for the application I have
in mind where elements are only going to be added to the container. I
haven't implemented the erase yet, but it is probably going to successively
destroy and copy construct elements from the deletion point to the end of
the vector. I don't fancy mark as deleted, though it hadn't occurred to me
until now.

And I do use vector indices internally.

And the class will have reserve and capacity methods which will help with
stability of references in some situations (and help with efficiency).
Tuning your tree to be as fast as a good std::map implementation may
be hard, but the last few % of performance may not be an issue for you
of course.

I'll run some comparisons. I suspect the use of vector indices will slow my
implementation down. It might be interesting to see how allocating in a
single block vs allocating nodes affects things. As you say I'm not that
bothered about a few percentage points (but how many is a few?).
Testing your implementation for correctness is another
issue, and I bet there's a bug in there somewhere if you haven't
written comprehensive unit tests yet.

I hate unit tests, but they will be written at some point.
This should be easier to implement bug free, but have you implemented
all of the comparison operators, etc.?

Yes, they are easy enough.
Finally, have you looked at boost's multi_index library? It's not part
of an official release yet, but it's in the main CVS.

No, but I'll check it out, thanks.

john
 
T

tom_usenet

It's not the cost of copying, but the fact that if the user uses a stateful
allocator, then they are going to have trouble accessing that state. To make
a trivial example, suppose the allocator counts allocations made, then with
two allocators we have two counts, presumably the get_allocator method would
return one or other of the allocators and the other count would be lost.

An allocator like that should use reference counting, using a
representation class that is independent of the template parameter of
the allocator so that differently typed instances can share the data.
get_allocator returns a copy of the allocator, so there will be
multiple copies of it around if someone is using it like that.
I hadn't realised that std::deque does this already, but then the standard
does give STL implementors license to assume stateless allocators.
Personally I think it would be entirely reasonable for the standard to
mandate that all allocators be stateless.

But that makes pool allocators impossible, and they are very useful
for high performance applications, such as games; a lot of games
programmers use the STL these days I think. I think most libraries do
allow stateful allocators, but very few allow exotic
allocator::pointer types (Dinkumware is the only one that I know for
sure supports them).
Of course I didn't abandon my original design solely for this reason. It's
just this issue set me thinking about how I could implement my class with
only a single allocator and when I realised I could, I preferred the new
design anyway.

It is more efficient in terms of memory usage and performance, until
you start deallocating nodes (and as long as you reserve sufficient
space). The locality of reference gained from using contiguous storage
might help performance too, although a custom (stateful!) allocator
can achieve this too.
To answer your other points - I'm not bothered about O(n) deletion time.
Obviously this is a drawback, but it won't matter for the application I have
in mind where elements are only going to be added to the container. I
haven't implemented the erase yet, but it is probably going to successively
destroy and copy construct elements from the deletion point to the end of
the vector. I don't fancy mark as deleted, though it hadn't occurred to me
until now.

When deleting, you'll have to update all of the nodes, subtracting 1
from indices greater than the element (or elements) being deleted, so
erase is going to be a slow operation.

Still, it's always fun writing low level code like containers, if
rather wasteful of time and energy (there are almost always free
alternatives already written and tested). I've never written a
red-black tree, but I bet it's great fun! :eek:)

Tom
 
J

John Harrison

On Thu, 15 Jul 2004 16:56:36 +0100, John Harrison
haven't implemented the erase yet, but it is probably going to
successively
destroy and copy construct elements from the deletion point to the end of
the vector. I don't fancy mark as deleted, though it hadn't occurred to
me
until now.

I'm starting to get a bit worried about this. It seems difficult to
arrange any sort of execption safety without copying the entire vector,
which I'm a bit reluctent to do. Any thoughts?

john
 
J

John Harrison

Still, it's always fun writing low level code like containers, if
rather wasteful of time and energy (there are almost always free
alternatives already written and tested). I've never written a
red-black tree, but I bet it's great fun! :eek:)

In a perverse sort of way, yes it is. Of course this is a hobby activity,
I wouldn't be allowed to waste my time on this at work.

john
 
T

tom_usenet

On Thu, 15 Jul 2004 16:56:36 +0100, John Harrison


I'm starting to get a bit worried about this. It seems difficult to
arrange any sort of execption safety without copying the entire vector,
which I'm a bit reluctent to do. Any thoughts?

Hmm, that's a thorny one. I guess you'll have to start by relinking
the nodes as though the erased elements aren't there, then actually do
erase in the vector, finally updating the indices to reflect the
change. If an exception is thrown during the erase in the vector,
you'll have to then update the indices to elements that have moved,
and somehow cope with the fact that you've got two copies of one of
the elements, before rethrowing.

Alternative implementations suddenly do look more attractive, although
it's always hard to give very good exception safety with vector::erase
- if an exception is thrown you inevitably end up with two copies of
one of the elements.

Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top