reserve() for map

Discussion in 'C++' started by Ralf Goertz, May 12, 2010.

  1. Ralf Goertz

    Ralf Goertz Guest

    Hi,

    I frequently have to read in very many elements (about 60 Million) of a
    map. I assume that using insert() or the []-operator leads to many
    allocations of memory which I understand to be slow. Is there a way to
    tell the map in advance to allocate enough memory for all elements? Do I
    need to write my own allocator or can I use allocate() of the standard
    allocator?
     
    Ralf Goertz, May 12, 2010
    #1
    1. Advertising

  2. Ralf Goertz

    Öö Tiib Guest

    On May 12, 11:22 am, Ralf Goertz <> wrote:
    > Hi,
    >
    > I frequently have to read in very many elements (about 60 Million) of a
    > map. I assume that using insert() or the []-operator leads to many
    > allocations of memory which I understand to be slow. Is there a way to
    > tell the map in advance to allocate enough memory for all elements? Do I
    > need to write my own allocator or can I use allocate() of the standard
    > allocator?


    There are tons of ways to avoid using map if that is causing
    performance issues. Most obvious is to use vector of pairs. If later
    linear search is too slow use std::sort or better to sort and
    std::lower_bound to binary search.
     
    Öö Tiib, May 12, 2010
    #2
    1. Advertising

  3. Ralf Goertz

    Ralf Goertz Guest

    Öö Tiib wrote:

    > On May 12, 11:22 am, Ralf Goertz <> wrote:
    >> Hi,
    >>
    >> I frequently have to read in very many elements (about 60 Million) of a
    >> map. I assume that using insert() or the []-operator leads to many
    >> allocations of memory which I understand to be slow. Is there a way to
    >> tell the map in advance to allocate enough memory for all elements? Do I
    >> need to write my own allocator or can I use allocate() of the standard
    >> allocator?

    >
    > There are tons of ways to avoid using map if that is causing
    > performance issues. Most obvious is to use vector of pairs. If later
    > linear search is too slow use std::sort or better to sort and
    > std::lower_bound to binary search.


    The map was chosen because it is used in different threads where one
    thread may erase an element which is irrelevant for the other threads
    but which does not invalidate iterators. I wouldn't know how to do that
    with vector. A list would be the other option where iterators remain
    valid but then the same question as in the OP arises.
     
    Ralf Goertz, May 12, 2010
    #3
  4. On 12.05.2010 12:44, * Ralf Goertz:
    > Öö Tiib wrote:
    >
    >> On May 12, 11:22 am, Ralf Goertz<> wrote:
    >>> Hi,
    >>>
    >>> I frequently have to read in very many elements (about 60 Million) of a
    >>> map. I assume that using insert() or the []-operator leads to many
    >>> allocations of memory which I understand to be slow. Is there a way to
    >>> tell the map in advance to allocate enough memory for all elements? Do I
    >>> need to write my own allocator or can I use allocate() of the standard
    >>> allocator?

    >>
    >> There are tons of ways to avoid using map if that is causing
    >> performance issues. Most obvious is to use vector of pairs. If later
    >> linear search is too slow use std::sort or better to sort and
    >> std::lower_bound to binary search.

    >
    > The map was chosen because it is used in different threads where one
    > thread may erase an element which is irrelevant for the other threads
    > but which does not invalidate iterators.


    You need exclusive access to the map for the whole of that operation.

    Removing an element may adjust several pointers in the internal data structure
    (usually a red-black tree), involving elements used by other threads.

    If you're not locking - bang.


    > I wouldn't know how to do that with vector.


    A simple solution is to set a boolean flag in the item. Depends how many
    elements will usually be set as "to be ignored".


    > A list would be the other option where iterators remain
    > valid but then the same question as in the OP arises.


    I'm guessing that you could always try block insertions, if std::map supports
    that (I'm pretty sure that it does but check). Read e.g. 2048 elements into a
    vector, then block insert into the map, which /might/ allow it to do things more
    efficiently. Measurement is the important thing, though.


    Cheers & hth.,

    - Alf
     
    Alf P. Steinbach, May 12, 2010
    #4
  5. Ralf Goertz <> wrote:
    > I frequently have to read in very many elements (about 60 Million) of a
    > map. I assume that using insert() or the []-operator leads to many
    > allocations of memory which I understand to be slow. Is there a way to
    > tell the map in advance to allocate enough memory for all elements? Do I
    > need to write my own allocator or can I use allocate() of the standard
    > allocator?


    Try using this allocator:

    http://warp.povusers.org/FSBAllocator/

    (You don't need to tell the allocator anything, just use it as the
    allocator for your map. You will probably notice a singificant improvement
    in allocation performance.)
     
    Juha Nieminen, May 12, 2010
    #5
  6. Ralf Goertz

    Ralf Goertz Guest

    Juha Nieminen wrote:

    >
    > Try using this allocator:
    >
    > http://warp.povusers.org/FSBAllocator/
    >
    > (You don't need to tell the allocator anything, just use it as the
    > allocator for your map. You will probably notice a singificant improvement
    > in allocation performance.)


    Hm, this gave me a segfault upon termination of the program.
    Interestingly, the segfault seems to have appeared in a
    boost::unordered_set for which I hadn't used the FSBAllocator (it didn't
    compile when I tried it for a boost::unordered_map, so I didn't try it
    for unordered_set). This is the backtrace:

    #0 0x00007fd195ea5efd in ?? () from /lib64/libc.so.6
    #1 0x00007fd195ea75d8 in ?? () from /lib64/libc.so.6
    #2 0x00007fd195eaa96c in free () from /lib64/libc.so.6
    #3 0x000000000040e6f8 in deallocate (this=<value optimized out>,
    __p=<value optimized out>)
    at /home/usr/bin/../lib64/gcc/../../include/c++/4.4/ext/new_allocator.h:95
    #4 delete_buckets (this=<value optimized out>, __p=<value optimized out>)
    at /usr/include/boost/unordered/detail/buckets.hpp:101
    #5 ~hash_buckets (this=<value optimized out>, __p=<value optimized out>)
    at /usr/include/boost/unordered/detail/buckets.hpp:135
    #6 ~hash_table (this=<value optimized out>, __p=<value optimized out>)
    at /usr/include/boost/unordered/detail/fwd.hpp:492
    #7 ~hash_unique_table (this=<value optimized out>, __p=<value optimized out>)
    at /usr/include/boost/unordered/detail/fwd.hpp:604
    #8 boost::unordered_set<std::basic_string<wchar_t, std::char_traits<wchar_t>,
    std::allocator<wchar_t> >, boost::hash<std::basic_string<wchar_t,
    std::char_traits<wchar_t>, std::allocator<wchar_t> > >,
    std::equal_to<std::basic_string<wchar_t, std::char_traits<wchar_t>,
    std::allocator<wchar_t> > >, std::allocator<std::basic_string<wchar_t,
    std::char_traits<wchar_t>, std::allocator<wchar_t> > > >::~unordered_set
    (this=<value optimized out>, __p=<value optimized out>)
    at /usr/include/boost/unordered/unordered_set.hpp:154
    #9 0x00007fd195e68065 in ?? () from /lib64/libc.so.6
    #10 0x00007fd195e680b5 in exit () from /lib64/libc.so.6
     
    Ralf Goertz, May 12, 2010
    #6
  7. Ralf Goertz

    Brian Guest

    On May 12, 5:44 am, Ralf Goertz <> wrote:
    > Öö Tiib wrote:


    > > There are tons of ways to avoid using map if that is causing
    > > performance issues. Most obvious is to use vector of pairs. If later
    > > linear search is too slow use std::sort or better to sort and
    > > std::lower_bound to binary search.

    >
    > The map was chosen because it is used in different threads where one
    > thread may erase an element which is irrelevant for the other threads
    > but which does not invalidate iterators. I wouldn't know how to do that
    > with vector. A list would be the other option where iterators remain
    > valid but then the same question as in the OP arises.


    Boost has a stable_vector class.
    http://bannalia.blogspot.com/2008/09/introducing-stablevector.html


    Brian Wood
    http://webEbenezer.net
    (651) 251-9384
     
    Brian, May 12, 2010
    #7
  8. Ralf Goertz <> wrote:
    > Hm, this gave me a segfault upon termination of the program.
    > Interestingly, the segfault seems to have appeared in a
    > boost::unordered_set for which I hadn't used the FSBAllocator (it didn't
    > compile when I tried it for a boost::unordered_map, so I didn't try it
    > for unordered_set).


    unordered_map/unordered_set usually don't need specialized allocators
    because the usually already use arrays (rather than allocating individual
    elements), which is efficient.

    Are you using multithreading? Note that FSBAllocator is not thread-safe
    by default (although thread-safety can be turned on).

    A minimal program which exhibits the problem would be nice.
     
    Juha Nieminen, May 13, 2010
    #8
  9. Ralf Goertz

    Ralf Goertz Guest

    Juha Nieminen wrote:

    > Ralf Goertz <> wrote:
    >> Hm, this gave me a segfault upon termination of the program.
    >> Interestingly, the segfault seems to have appeared in a
    >> boost::unordered_set for which I hadn't used the FSBAllocator (it didn't
    >> compile when I tried it for a boost::unordered_map, so I didn't try it
    >> for unordered_set).

    >
    > unordered_map/unordered_set usually don't need specialized allocators
    > because the usually already use arrays (rather than allocating individual
    > elements), which is efficient.
    >
    > Are you using multithreading? Note that FSBAllocator is not thread-safe
    > by default (although thread-safety can be turned on).


    Yes I read that. No for this particular test I didn't use mt. I just
    wanted to see whether inserting would be faster, after all pairs are
    read I returned from main().

    > A minimal program which exhibits the problem would be nice.


    I will try that but probably not before Monday.
     
    Ralf Goertz, May 13, 2010
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dot
    Replies:
    0
    Views:
    436
  2. joeblack
    Replies:
    3
    Views:
    542
    joeblack
    Nov 4, 2003
  3. MarionEll
    Replies:
    0
    Views:
    370
    MarionEll
    Oct 26, 2004
  4. john smith

    vector.reserve

    john smith, Jul 25, 2003, in forum: C++
    Replies:
    5
    Views:
    713
    John Harrison
    Jul 25, 2003
  5. David Hilsee

    Re: map::reserve

    David Hilsee, Oct 19, 2004, in forum: C++
    Replies:
    0
    Views:
    520
    David Hilsee
    Oct 19, 2004
Loading...

Share This Page