Container for Look Up to be Fast

Discussion in 'C++' started by Pallav singh, Aug 13, 2011.

  1. Pallav singh

    Pallav singh Guest

    Hi All ,

    i have data coming at variable rate and basic requirement was look up
    and insertion should be very Fast
    i used STL MAP for that , but when data was very HUGE(10 GB data was
    to be stored in MAP ), its performance degraded.

    Time to fetch data based on KEY increases exponentially.

    when I use unordered_map of C++0x to store data, it got reduced
    significantly.

    Is there any Other container provided by Boost or Ace Framework, which
    i can use to reduce my Look up time more.

    Thanks
    Pallav Singh
    Pallav singh, Aug 13, 2011
    #1
    1. Advertising

  2. Pallav singh

    Dombo Guest

    Op 13-Aug-11 11:42, Pallav singh schreef:
    > Hi All ,
    >
    > i have data coming at variable rate and basic requirement was look up
    > and insertion should be very Fast
    > i used STL MAP for that , but when data was very HUGE(10 GB data was
    > to be stored in MAP ), its performance degraded.
    >
    > Time to fetch data based on KEY increases exponentially.


    Considering the computational complexity of the std::map (O(log n)) I
    wouldn't expect the lookup time to increase exponentially with the size
    of the data set.

    Probably you are experiencing the effects caching and/or virtual memory
    which would explain why performance degrades dramatically when the data
    set gets large. When there are a huge amount of elements in the map,
    chances are that the nodes are spread out over memory. A lookup could
    result in memory accesses all over the place which could result in many
    cache misses and/or page faults. These days the way you access memory
    (locality of reference) can have a bigger impact on performance than the
    computational complexity.

    > when I use unordered_map of C++0x to store data, it got reduced
    > significantly.
    >
    > Is there any Other container provided by Boost or Ace Framework, which
    > i can use to reduce my Look up time more.


    As far as lookup performance is concerned hash based associative
    containers are about as good as it gets, assuming the hash function is
    good. Possibly you may be able to provide a better hashing function for
    your data.

    If that doesn't help the only other option I can think of is to tackle
    the problem at a more fundamental (design) level.
    Dombo, Aug 13, 2011
    #2
    1. Advertising

  3. Pallav singh

    Jorgen Grahn Guest

    On Sat, 2011-08-13, Pallav singh wrote:
    > Hi All ,
    >
    > i have data coming at variable rate and basic requirement was look up
    > and insertion should be very Fast
    > i used STL MAP for that , but when data was very HUGE(10 GB data was
    > to be stored in MAP ), its performance degraded.
    >
    > Time to fetch data based on KEY increases exponentially.


    But what about insertions and deletions? I'd expect that to be even
    worse.

    > when I use unordered_map of C++0x to store data, it got reduced
    > significantly.
    >
    > Is there any Other container provided by Boost or Ace Framework, which
    > i can use to reduce my Look up time more.


    I think I would do an experiment with a BerkeleyDB of some kind stored
    on disk, to see if that's better for huge data sets. Maybe also try to
    go from an unordered_map<Key, Value> to unordered_map<Key, Value*> to
    see if that is more cache-friendly.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 13, 2011
    #3
  4. Le 13/08/2011 11:42, Pallav singh a écrit :
    > Hi All ,
    >
    > i have data coming at variable rate and basic requirement was look up
    > and insertion should be very Fast
    > i used STL MAP for that , but when data was very HUGE(10 GB data was
    > to be stored in MAP ), its performance degraded.
    >
    > Time to fetch data based on KEY increases exponentially.
    >
    > when I use unordered_map of C++0x to store data, it got reduced
    > significantly.
    >
    > Is there any Other container provided by Boost or Ace Framework, which
    > i can use to reduce my Look up time more.
    >
    > Thanks
    > Pallav Singh
    >
    >


    As you have 10GB of data, your OS is likely to swap, explaining the
    performance issue (except is you have 10GB RAM ?). Moreover, even if
    you have the RAM needed, RAM access isn't as fast as cache access.
    So, maybe should you use something like a B-Tree with the number of
    childs per node adapted to your processor cache, so you could get a
    O(logK N) (with K the number of childs and N the number of elements),
    and reducing RAM accesses by adapting to your cache processor.

    But I'm not sure whether retrieving, say, 1Mo of data from RAM to
    processor cache is better or worse in performance than retrieving 10
    times 100Ko from RAM to cache.

    However, if your OS is swapping (if you have not enough memory), then
    the B-Tree is very likely to improve performance, compared to std::map.

    Compared to std::unordered_map, I'm not sure which one is the best.
    A B-tree is an improvement of a std::map for large amounts of data,
    an unordered_map could also be improved.
    For example, reserve enough buckets on the creation of the unordered_map,
    of create a better hash function.

    Then, maybe will some libraries offer you more data structures.

    So, my advices :
    - Try B-Trees
    - For unordered_map :
    - Reserve enough buckets to avoid re-hashing
    - Provide a good hash function (examine the space your keys are in,
    and try to make it fit with the most large space of integers, with
    as little collisions as possible)
    - Finally, maybe are there some other data structures I don't know yet.

    Cheers & hth.,
    Leo
    Leo Equinox Gaspard, Aug 13, 2011
    #4
  5. Dombo <> wrote:
    > Probably you are experiencing the effects caching and/or virtual memory


    I think the term you are looking for is "swapping".
    Juha Nieminen, Aug 14, 2011
    #5
  6. On Aug 14, 7:21 pm, Juha Nieminen <> wrote:
    > Dombo <> wrote:
    > > Probably you are experiencing the effects caching and/or virtual memory

    >
    >   I think the term you are looking for is "swapping".


    not really
    Nick Keighley, Aug 15, 2011
    #6
  7. Nick Keighley <> wrote:
    > On Aug 14, 7:21 pm, Juha Nieminen <> wrote:
    >> Dombo <> wrote:
    >> > Probably you are experiencing the effects caching and/or virtual memory

    >>
    >>   I think the term you are looking for is "swapping".

    >
    > not really


    I don't think caching and virtual memory would cause std::map to slow
    down from logaritmic to exponential. However, swapping sounds exactly
    like it could.
    Juha Nieminen, Aug 15, 2011
    #7
  8. Pallav singh

    Jorgen Grahn Guest

    On Mon, 2011-08-15, Juha Nieminen wrote:
    > Nick Keighley <> wrote:
    >> On Aug 14, 7:21 pm, Juha Nieminen <> wrote:
    >>> Dombo <> wrote:
    >>> > Probably you are experiencing the effects caching and/or virtual memory
    >>>
    >>>   I think the term you are looking for is "swapping".

    >>
    >> not really

    >
    > I don't think caching and virtual memory would cause std::map to slow
    > down from logaritmic to exponential. However, swapping sounds exactly
    > like it could.


    Are you two /really/ talking about exponential, or are you using the
    word to mean "ridiculously slow"?

    Even if you have a tree that's 40 levels deep and everything has been
    swapped to disk, the number of disk hits needed to find a certain node
    are proportional to the depth -- that is, increases logarithmically
    with the size of the tree.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 15, 2011
    #8
  9. Le 13/08/2011 14:22, Jorgen Grahn a écrit :
    [...]
    > I think I would do an experiment with a BerkeleyDB of some kind stored
    > on disk, to see if that's better for huge data sets. Maybe also try to
    > go from an unordered_map<Key, Value> to unordered_map<Key, Value*> to
    > see if that is more cache-friendly.
    >
    > /Jorgen
    >


    I don't think an unordered_map could be easily cached, because of its
    random-like distribution. So, except if you are always requesting the
    same elements, I don't think processor cache could choose which of the
    thousands of data you got in your map it should cache.

    BTW, to get better advices, you (Pallav singh) could say how much a
    data is (10GB of 8B data isn't treated by the same data structure as
    10GB of 1GB data).

    Leo
    Leo Equinox Gaspard, Aug 16, 2011
    #9
  10. Jorgen Grahn <> wrote:
    > Are you two /really/ talking about exponential, or are you using the
    > word to mean "ridiculously slow"?


    Ok, it's probably not technically speaking O(2^n) with swapping, just
    extremely slow. My point was, however, that caching or virtual memory
    is not the cause for this, but disk swapping.
    Juha Nieminen, Aug 16, 2011
    #10
  11. Pallav singh

    Jorgen Grahn Guest

    On Tue, 2011-08-16, Leo Equinox Gaspard wrote:
    > Le 13/08/2011 14:22, Jorgen Grahn a écrit :
    > [...]
    >> [...] Maybe also try to
    >> go from an unordered_map<Key, Value> to unordered_map<Key, Value*> to
    >> see if that is more cache-friendly.
    >>
    >> /Jorgen
    >>

    >
    > I don't think an unordered_map could be easily cached, because of its
    > random-like distribution. So, except if you are always requesting the
    > same elements, I don't think processor cache could choose which of the
    > thousands of data you got in your map it should cache.


    True, but my idea was that unordered_map<Key, Value*> might be better
    than unordered_map<Key, Value>, which was what he was already using
    and was apparently pretty happy with.

    > BTW, to get better advices, you (Pallav singh) could say how much a
    > data is (10GB of 8B data isn't treated by the same data structure as
    > 10GB of 1GB data).


    Yes, moving to unordered_map<Key, Value*> can only be useful if his
    value type is much bigger than his key. I forgot to point that out.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 16, 2011
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    646
  2. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    551
  3. inhahe
    Replies:
    3
    Views:
    2,323
    Diez B. Roggisch
    Jan 28, 2005
  4. Juha Nieminen
    Replies:
    22
    Views:
    994
    Kai-Uwe Bux
    Oct 12, 2007
  5. Replies:
    4
    Views:
    161
Loading...

Share This Page