Fast Container

Thomas Lehmann · Nov 2, 2011

There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

Imagine a filter that decides on each update the a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

NOW, what is the right way to implement a fast container for this?

Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?
- Using a std::set I have no index access. Or is there a way to handle this properly?
- Somebody told me something about a weighted balanced tree as possible solution.
- Somebody else told me to use standard containers only.

Victor Bazarov · Nov 2, 2011

There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

Imagine a filter that decides on each update the a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

NOW, what is the right way to implement a fast container for this?

Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?
- Using a std::set I have no index access. Or is there a way to handle this properly?
- Somebody told me something about a weighted balanced tree as possible solution.
- Somebody else told me to use standard containers only.

You didn't say whether the order of the objects in a container is
important. If you can define deletion from your custom vector as
swapping with the last element and resizing by -1, it's fast. Insertion
at the end is not too bad, and causes reallocation only once in a while
(when the vector grows past its capacity), and you can kind of control
that by managing the capacity at "calm" times (shooter's rule: reload
when you have a chance, don't wait 'til you're empty).

Also, consider std::deque, it's appends are even faster than vector's.

V

jacob navia · Nov 2, 2011

Le 02/11/11 16:43, Leigh Johnston a écrit :
[snip]

if you want fast inserts/deletes

anywhere but also want fast indexing then consider using my
"segmented_array" container which can be found at:

http://i42.co.uk/stuff/segmented_array.htm

/Leigh

Actually this is a list of vectors. Why should be faster
than using list and vector?

As far as I can read from your code
you do not use those containers. Why?

Thanks for publishing that code, it is full of ideas.

jacob

Werner · Nov 2, 2011

There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

Imagine a filter that decides on each update the a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

NOW, what is the right way to implement a fast container for this?

Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?
- Using a std::set I have no index access. Or is there a way to handle this properly?
- Somebody told me something about a weighted balanced tree as possible solution.
- Somebody else told me to use standard containers only.

I think you can use a combination of list and vector.
The list is used for fast insertions, deletions etc of the data
elements,
and the vector contains iterators that are sorted iaw. criteria as
defined by the actual data elements, and not the iterator (list
unsorted).

The vector always remains sorted (insertion of iterator via
lower_bound).

You have the benefit of random access iterators for fast access on
sorted sequence.
You have the benefit of fast insertion, removal.

Adding data elements happen by pushing back onto the list, and adding
the iterator to the sorted sequence (vector).
Removing data elements happen by finding the item in the (sorted)
vector
and using the iterator to erase the data element from the list. The
"referring" iterators in the vector remain valid.

Kind regards,

Werner

Werner · Nov 2, 2011

I think you can use a combination of list and vector.
The list is used for fast insertions, deletions etc of the data
elements,
and the vector contains iterators that are sorted iaw. criteria as
defined by the actual data elements, and not the iterator (list
unsorted).

typedef std::list<DataElement> ElementList;
typedef std::vector<ElementList::iterator> ElementRefSequence;
ElementList elementList;
ElementRefSequence elementRefSequence;

Thomas Lehmann · Nov 3, 2011

Well, I did say that it is a sorted container.

Thomas Lehmann · Nov 3, 2011

Maybe I'm doing something wrong but comparing to vector
and deque your segmented_array is not faster.

- I'm compiling on Windows Ultimate 64 Bit
- I've tried with 64 and 1024 segments.
- 200000 elements (with push_back)
- took 5ms (vector took 2ms, deque took 5ms)
- The code looks same for deque and vector:

<code>
BOOST_AUTO_TEST_CASE(SegmentedArrayAddValues)
{
const unsigned int limit = 200000;

SegmentedArrayTest<int>* container = new SegmentedArrayTest<int>();
container->createContainer();

// creating content
for (unsigned int ix = 0; ix < limit; ++ix)
container->addValue(ix);

BOOST_CHECK_EQUAL(container->size(), (size_t)limit);

// cleanup
delete container;
}
</code>

Victor Bazarov · Nov 3, 2011

Well, I did say that it is a sorted container.

You did, I wasn't paying attention, sorry.

Still, take a look at 'std::deque'.

V

Nick Keighley · Nov 3, 2011

On Nov 2, 3:21 pm, Thomas Lehmann

I'm feeling I've missed something...

There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

how are "elements" identified?

Imagine a filter that decides on each update [that] a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

"final scope"? So when it's sorted it goes in a "table" that's
indexed?

NOW, what is the right way to implement a fast container for this?

use a relational database.

I was thinking std::map but I don't know how your elements are
identified. std::map has reasonable performance for insertion and
deletion- anywhere in the container. And you can choose what to index
it with. at the end you can copy it into whatever your user finds
conventient.

Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?

yes. Elements are always contiuous so if you delete say the 50th
element of a 100 element vector its going to have to move elements
51-99 down one. At 500000 elements that gets expensive.

- Using a std::set I have no index access. Or is there a way to handle this properly?

so you want to be able to index it? How are these indexes assigned?
Time?

- Somebody told me something about a weighted balanced tree as possible solution.

some sort of tree. (which is what a std::map usually is under the
hood). B-tree? (that's not a binary tree).

- Somebody else told me to use standard containers only.

why? I mean its a good place to start but is it an absolute
constraint- and why?

Do you have any performance criteria?

STL Container?	4	May 2, 2012
Container for Look Up to be Fast	10	Aug 13, 2011
Building a Large Container	26	Dec 3, 2013
A Better Container Choice?	3	Aug 22, 2013
Selecting Container for "Top 20" Application	25	Aug 17, 2013
heterogenous container class	12	Jan 12, 2011
Retrieve data 'live' from spinner within a RecyclerView	1	May 20, 2023
STL Container Choice	11	Mar 28, 2012

Fast Container

Thomas Lehmann

Victor Bazarov

jacob navia

Werner

Werner

Thomas Lehmann

Thomas Lehmann

Victor Bazarov

Nick Keighley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads