Fast Container

Discussion in 'C++' started by Thomas Lehmann, Nov 2, 2011.

  1. There are written so many articles but no article is really providing
    the answer I'm looking for.

    Given a scenario with 100000 data elements the day (500000 the week).
    Potential we might have 3000-5000 updates for individual elements
    each second.

    Imagine a filter that decides on each update the a data element will
    be added, updated or removed.

    The final container which is sorted by a user defined criteria must be
    able to handle this amount of data.

    Info: basically each value of that sorted container is a pointer to the
    real data.

    Final scope: we display the data in a table which provides index based
    positions (row)

    NOW, what is the right way to implement a fast container for this?

    Some of the problems I was confronted with:
    - Using a std::vector deletion/insertion is expensive, is it?
    - Using a std::set I have no index access. Or is there a way to handle this properly?
    - Somebody told me something about a weighted balanced tree as possible solution.
    - Somebody else told me to use standard containers only.
    Thomas Lehmann, Nov 2, 2011
    #1
    1. Advertising

  2. On 11/2/2011 11:21 AM, Thomas Lehmann wrote:
    > There are written so many articles but no article is really providing
    > the answer I'm looking for.
    >
    > Given a scenario with 100000 data elements the day (500000 the week).
    > Potential we might have 3000-5000 updates for individual elements
    > each second.
    >
    > Imagine a filter that decides on each update the a data element will
    > be added, updated or removed.
    >
    > The final container which is sorted by a user defined criteria must be
    > able to handle this amount of data.
    >
    > Info: basically each value of that sorted container is a pointer to the
    > real data.
    >
    > Final scope: we display the data in a table which provides index based
    > positions (row)
    >
    > NOW, what is the right way to implement a fast container for this?
    >
    > Some of the problems I was confronted with:
    > - Using a std::vector deletion/insertion is expensive, is it?
    > - Using a std::set I have no index access. Or is there a way to handle this properly?
    > - Somebody told me something about a weighted balanced tree as possible solution.
    > - Somebody else told me to use standard containers only.


    You didn't say whether the order of the objects in a container is
    important. If you can define deletion from your custom vector as
    swapping with the last element and resizing by -1, it's fast. Insertion
    at the end is not too bad, and causes reallocation only once in a while
    (when the vector grows past its capacity), and you can kind of control
    that by managing the capacity at "calm" times (shooter's rule: reload
    when you have a chance, don't wait 'til you're empty).

    Also, consider std::deque, it's appends are even faster than vector's.

    V
    --
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Nov 2, 2011
    #2
    1. Advertising

  3. Thomas Lehmann

    jacob navia Guest

    Le 02/11/11 16:43, Leigh Johnston a écrit :
    [snip]

    if you want fast inserts/deletes
    > anywhere but also want fast indexing then consider using my
    > "segmented_array" container which can be found at:
    >
    > http://i42.co.uk/stuff/segmented_array.htm
    >
    > /Leigh


    Actually this is a list of vectors. Why should be faster
    than using list and vector?

    As far as I can read from your code
    you do not use those containers. Why?

    Thanks for publishing that code, it is full of ideas.

    jacob
    jacob navia, Nov 2, 2011
    #3
  4. Thomas Lehmann

    Werner Guest

    On Nov 2, 5:21 pm, Thomas Lehmann
    <> wrote:
    > There are written so many articles but no article is really providing
    > the answer I'm looking for.
    >
    > Given a scenario with 100000 data elements the day (500000 the week).
    > Potential we might have 3000-5000 updates for individual elements
    > each second.
    >
    > Imagine a filter that decides on each update the a data element will
    > be added, updated or removed.
    >
    > The final container which is sorted by a user defined criteria must be
    > able to handle this amount of data.
    >
    > Info: basically each value of that sorted container is a pointer to the
    > real data.
    >
    > Final scope: we display the data in a table which provides index based
    > positions (row)
    >
    > NOW, what is the right way to implement a fast container for this?
    >
    > Some of the problems I was confronted with:
    > - Using a std::vector deletion/insertion is expensive, is it?
    > - Using a std::set I have no index access. Or is there a way to handle this properly?
    > - Somebody told me something about a weighted balanced tree as possible solution.
    > - Somebody else told me to use standard containers only.


    I think you can use a combination of list and vector.
    The list is used for fast insertions, deletions etc of the data
    elements,
    and the vector contains iterators that are sorted iaw. criteria as
    defined by the actual data elements, and not the iterator (list
    unsorted).

    The vector always remains sorted (insertion of iterator via
    lower_bound).

    You have the benefit of random access iterators for fast access on
    sorted sequence.
    You have the benefit of fast insertion, removal.

    Adding data elements happen by pushing back onto the list, and adding
    the iterator to the sorted sequence (vector).
    Removing data elements happen by finding the item in the (sorted)
    vector
    and using the iterator to erase the data element from the list. The
    "referring" iterators in the vector remain valid.

    Kind regards,

    Werner
    Werner, Nov 2, 2011
    #4
  5. Thomas Lehmann

    Werner Guest

    On Nov 2, 11:06 pm, Werner <> wrote:

    > I think you can use a combination of list and vector.
    > The list is used for fast insertions, deletions etc of the data
    > elements,
    > and the vector contains iterators that are sorted iaw. criteria as
    > defined by the actual data elements, and not the iterator (list
    > unsorted).


    typedef std::list<DataElement> ElementList;
    typedef std::vector<ElementList::iterator> ElementRefSequence;
    ElementList elementList;
    ElementRefSequence elementRefSequence;
    Werner, Nov 2, 2011
    #5
  6. Well, I did say that it is a sorted container.
    Thomas Lehmann, Nov 3, 2011
    #6
  7. Maybe I'm doing something wrong but comparing to vector
    and deque your segmented_array is not faster.

    - I'm compiling on Windows Ultimate 64 Bit
    - I've tried with 64 and 1024 segments.
    - 200000 elements (with push_back)
    - took 5ms (vector took 2ms, deque took 5ms)
    - The code looks same for deque and vector:

    <code>
    BOOST_AUTO_TEST_CASE(SegmentedArrayAddValues)
    {
    const unsigned int limit = 200000;

    SegmentedArrayTest<int>* container = new SegmentedArrayTest<int>();
    container->createContainer();

    // creating content
    for (unsigned int ix = 0; ix < limit; ++ix)
    container->addValue(ix);

    BOOST_CHECK_EQUAL(container->size(), (size_t)limit);

    // cleanup
    delete container;
    }
    </code>
    Thomas Lehmann, Nov 3, 2011
    #7
  8. On 11/3/2011 10:01 AM, Thomas Lehmann wrote:
    > Well, I did say that it is a sorted container.


    You did, I wasn't paying attention, sorry.

    Still, take a look at 'std::deque'.

    V
    --
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Nov 3, 2011
    #8
  9. On Nov 2, 3:21 pm, Thomas Lehmann
    <> wrote:

    I'm feeling I've missed something...

    > There are written so many articles but no article is really providing
    > the answer I'm looking for.
    >
    > Given a scenario with 100000 data elements the day (500000 the week).
    > Potential we might have 3000-5000 updates for individual elements
    > each second.


    how are "elements" identified?

    > Imagine a filter that decides on each update [that] a data element will
    > be added, updated or removed.
    >
    > The final container which is sorted by a user defined criteria must be
    > able to handle this amount of data.
    >
    > Info: basically each value of that sorted container is a pointer to the
    > real data.
    >
    > Final scope: we display the data in a table which provides index based
    > positions (row)


    "final scope"? So when it's sorted it goes in a "table" that's
    indexed?

    > NOW, what is the right way to implement a fast container for this?


    use a relational database.

    I was thinking std::map but I don't know how your elements are
    identified. std::map has reasonable performance for insertion and
    deletion- anywhere in the container. And you can choose what to index
    it with. at the end you can copy it into whatever your user finds
    conventient.

    > Some of the problems I was confronted with:
    > - Using a std::vector deletion/insertion is expensive, is it?


    yes. Elements are always contiuous so if you delete say the 50th
    element of a 100 element vector its going to have to move elements
    51-99 down one. At 500000 elements that gets expensive.

    > - Using a std::set I have no index access. Or is there a way to handle this properly?


    so you want to be able to index it? How are these indexes assigned?
    Time?

    > - Somebody told me something about a weighted balanced tree as possible solution.


    some sort of tree. (which is what a std::map usually is under the
    hood). B-tree? (that's not a binary tree).

    > - Somebody else told me to use standard containers only.


    why? I mean its a good place to start but is it an absolute
    constraint- and why?

    Do you have any performance criteria?
    Nick Keighley, Nov 3, 2011
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    662
  2. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    561
  3. Replies:
    2
    Views:
    322
    Thorsten Kiefer
    Jul 21, 2006
  4. Juha Nieminen
    Replies:
    22
    Views:
    1,023
    Kai-Uwe Bux
    Oct 12, 2007
  5. Pallav singh

    Container for Look Up to be Fast

    Pallav singh, Aug 13, 2011, in forum: C++
    Replies:
    10
    Views:
    883
    Jorgen Grahn
    Aug 16, 2011
Loading...

Share This Page