STL hashmap

Discussion in 'C++' started by Daniel Heiserer, Jan 13, 2004.

  1. Hello,

    I used a unique associative container such as:
    //---------------
    map<vector<int>,double> X;
    //---------------

    In general I fill X with millions of entries.

    Sorting plays no role for me. All I need is uniqueness
    of the keys and speed when inserting them.
    At the end I retrieve them using iterators from the beginning
    to the end. Which should lead to constant complexity.

    Memory requirement is important but less then speed.

    Unfortunately the used "map" is too slow.
    I need a faster implementation.

    I found hash_map as an alterantive, but it does not work
    by simply replacing "map" with "hash_map".

    Can anybody help me out?
    So is hash_map part of the STL?
    When not where can I get a good implementation of it?

    I use g++ 3.2.2 on linux.

    -- thanks, daniel
     
    Daniel Heiserer, Jan 13, 2004
    #1
    1. Advertising

  2. Daniel Heiserer

    tom_usenet Guest

    On Tue, 13 Jan 2004 12:23:44 +0100, Daniel Heiserer
    <> wrote:

    >Hello,
    >
    >I used a unique associative container such as:
    >//---------------
    >map<vector<int>,double> X;
    >//---------------
    >
    >In general I fill X with millions of entries.


    What code do you use to add element? There may be a more efficient
    way.

    >
    >Sorting plays no role for me. All I need is uniqueness
    >of the keys and speed when inserting them.


    Why are you using vector<int> as your key? If there a fixed maximum
    length of vector? How do you create the vector? You would probably get
    an improvement from a custom key class.

    >At the end I retrieve them using iterators from the beginning
    >to the end. Which should lead to constant complexity.
    >
    >Memory requirement is important but less then speed.
    >
    >Unfortunately the used "map" is too slow.
    >I need a faster implementation.
    >
    >I found hash_map as an alterantive, but it does not work
    >by simply replacing "map" with "hash_map".


    No, you need to provide a hashing function at least.

    >
    >Can anybody help me out?
    >So is hash_map part of the STL?


    No. unordered_map is part of the draft standard library technical
    report (TR1). It is basically hash_map by another name, with a few
    differences from the various different hash_map versions in use.

    >When not where can I get a good implementation of it?
    >
    >I use g++ 3.2.2 on linux.


    GCC comes with an implementation <ext/hash_map>. I don't know how good
    it is. Docs are here:

    http://gcc.gnu.org/onlinedocs/libstdc /ext/howto.html#1

    You might be able to make std::map fast enough if you optimize your
    use of it...

    Tom

    C++ FAQ: http://www.parashift.com/c -faq-lite/
    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
     
    tom_usenet, Jan 13, 2004
    #2
    1. Advertising


  3. > >I used a unique associative container such as:
    > >//---------------
    > >map<vector<int>,double> X;
    > >//---------------
    > >
    > >In general I fill X with millions of entries.

    >
    > What code do you use to add element? There may be a more efficient
    > way.


    vector<int> a;
    double b;
    X[a]=b;
    // or
    X[a]+=b;

    > >Sorting plays no role for me. All I need is uniqueness
    > >of the keys and speed when inserting them.

    >
    > Why are you using vector<int> as your key? If there a fixed maximum
    > length of vector? How do you create the vector? You would probably get
    > an improvement from a custom key class.


    foreach of my maps the vectors have the same length, but I want to
    use different maps. e.g. one with vectors of size 3 others with
    longer or shorter ones. Before I "add" my element I "X.resize(length)"
    the vectors.
    map<vector<int>,double> X; // vector length 3
    map<vector<int>,double> Y; // vector lenght 5

    Of course this distinction has to be made during runtime.
    How much would a

    hash_map<int[3],double> X;

    speed that up? And how much memory would I save?

    > >At the end I retrieve them using iterators from the beginning
    > >to the end. Which should lead to constant complexity.
    > >
    > >Memory requirement is important but less then speed.
    > >
    > >Unfortunately the used "map" is too slow.
    > >I need a faster implementation.
    > >
    > >I found hash_map as an alterantive, but it does not work
    > >by simply replacing "map" with "hash_map".

    >
    > No, you need to provide a hashing function at least.


    How do I define one for vectors?

    >
    > >
    > >Can anybody help me out?
    > >So is hash_map part of the STL?

    >
    > No. unordered_map is part of the draft standard library technical
    > report (TR1). It is basically hash_map by another name, with a few
    > differences from the various different hash_map versions in use.
    >
    > >When not where can I get a good implementation of it?
    > >
    > >I use g++ 3.2.2 on linux.

    >
    > GCC comes with an implementation <ext/hash_map>. I don't know how good
    > it is. Docs are here:
    >
    > http://gcc.gnu.org/onlinedocs/libstdc /ext/howto.html#1
    >
    > You might be able to make std::map fast enough if you optimize your
    > use of it...


    Well a suggestion might be useful ;-)
    Again: The time critical part is insertion and therefore checking
    for duplicate ones. In maybe 50% of the cases the next insertion
    "might" be "close" to the previous one so this might help to speed it
    up.
    After that I only retrieve all pairs from the beginning to the end
    which should be of complexity O(1) using the iterators.

    I would appreciate if I could use as much of the standard container
    functions as possible and avoid writing new templates and classes.


    -- thanks, daniel
     
    Daniel Heiserer, Jan 13, 2004
    #3
  4. Daniel Heiserer

    tom_usenet Guest

    On Tue, 13 Jan 2004 14:26:01 +0100, Daniel Heiserer
    <> wrote:

    >
    >> >I used a unique associative container such as:
    >> >//---------------
    >> >map<vector<int>,double> X;
    >> >//---------------
    >> >
    >> >In general I fill X with millions of entries.

    >>
    >> What code do you use to add element? There may be a more efficient
    >> way.

    >
    >vector<int> a;
    >double b;
    >X[a]=b;
    >// or
    >X[a]+=b;


    Ok, first improvement is to insert using:

    X.insert(mymaptype::value_type(a, b));

    >
    >> >Sorting plays no role for me. All I need is uniqueness
    >> >of the keys and speed when inserting them.

    >>
    >> Why are you using vector<int> as your key? If there a fixed maximum
    >> length of vector? How do you create the vector? You would probably get
    >> an improvement from a custom key class.

    >
    >foreach of my maps the vectors have the same length, but I want to
    >use different maps. e.g. one with vectors of size 3 others with
    >longer or shorter ones. Before I "add" my element I "X.resize(length)"
    >the vectors.
    >map<vector<int>,double> X; // vector length 3
    >map<vector<int>,double> Y; // vector lenght 5
    >
    >Of course this distinction has to be made during runtime.
    >How much would a
    >
    >hash_map<int[3],double> X;
    >
    >speed that up? And how much memory would I save?


    Well, the above is illegal (arrays aren't copyable), but you might try
    this:

    #include <cstddef>
    #include <algorithm>

    template <std::size_t Length>
    class Key
    {
    int m_data[Length];
    public:
    static std::size_t const SIZE = Length;

    Key()
    {
    //0 initialize
    std::set(m_data, m_data + SIZE, 0);
    }

    int& operator[](std::size_t i)
    {
    assert(i < SIZE);
    return m_data;
    }

    int const& operator[](std::size_t i) const
    {
    assert(i < Length);
    return m_data;
    }

    friend bool operator<(Key const& lhs, Key const& rhs)
    {
    return std::lexicographical_compare(
    lhs.m_data, lhs.m_data + SIZE,
    rhs.m_data, rhs.m_data + SIZE);
    }

    friend bool operator==(Key const& lhs, Key const& rhs)
    {
    return std::equal_range(
    lhs.m_data, lhs.m_data + SIZE,
    rhs.m_data, rhs.m_data + SIZE);
    }

    //add anything required for convenience.
    };

    std::map<Key<3>, double> m;

    That would add a pretty big speedup at insert time (in terms of
    constant factor), and an even bigger memory saving. Key<3> takes up
    12-16 bytes whereas std::vector<int> takes up 16 bytes before you even
    consider the contents of the vector (another 16 at least, if not more
    will allocation overhead). Using the above should roughly halve the
    memory requirements and provide a major speed increase.

    >
    >> >At the end I retrieve them using iterators from the beginning
    >> >to the end. Which should lead to constant complexity.
    >> >
    >> >Memory requirement is important but less then speed.
    >> >
    >> >Unfortunately the used "map" is too slow.
    >> >I need a faster implementation.
    >> >
    >> >I found hash_map as an alterantive, but it does not work
    >> >by simply replacing "map" with "hash_map".

    >>
    >> No, you need to provide a hashing function at least.

    >
    >How do I define one for vectors?


    For your vectors you could have something like:

    std::size_t hash(std::vector<int> const& v)
    {
    return std::accumulate(v.begin(), v.end(), std::size_t(0));
    }

    That's a bad hashing algorithm, so you may want to read up on hashing
    algorithms.

    >> GCC comes with an implementation <ext/hash_map>. I don't know how good
    >> it is. Docs are here:
    >>
    >> http://gcc.gnu.org/onlinedocs/libstdc /ext/howto.html#1
    >>
    >> You might be able to make std::map fast enough if you optimize your
    >> use of it...

    >
    >Well a suggestion might be useful ;-)
    >Again: The time critical part is insertion and therefore checking
    >for duplicate ones. In maybe 50% of the cases the next insertion
    >"might" be "close" to the previous one so this might help to speed it
    >up.
    >After that I only retrieve all pairs from the beginning to the end
    >which should be of complexity O(1) using the iterators.


    Iteration over the whole container is O(n). Inserting n elements is
    O(n log n).

    >I would appreciate if I could use as much of the standard container
    >functions as possible and avoid writing new templates and classes.


    The standard containers are building blocks and in many cases aren't
    the optimal solution. If your problem definitely requires these
    multi-int keys and performance isn't good enough using the standard
    approach, then first trying the Key class and then trying the Key
    class in a hash_map (writing a hash function for Key) is going to work
    best.

    Tom

    C++ FAQ: http://www.parashift.com/c -faq-lite/
    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
     
    tom_usenet, Jan 13, 2004
    #4
  5. Daniel Heiserer

    Paul Dubuc Guest

    Daniel Heiserer wrote:
    > Hello,
    >
    > I used a unique associative container such as:
    > //---------------
    > map<vector<int>,double> X;
    > //---------------

    ....
    > Unfortunately the used "map" is too slow.
    > I need a faster implementation.
    >
    > I found hash_map as an alterantive, but it does not work
    > by simply replacing "map" with "hash_map".
    >
    > Can anybody help me out?
    > So is hash_map part of the STL?
    > When not where can I get a good implementation of it?
    >
    > I use g++ 3.2.2 on linux.
    >
    > -- thanks, daniel


    With this compiler hash_map is int the __gnu_cxx namespace, not int the std
    namespace. hash_map isn't C++ standard. Use

    __gnu_cxx::hash_map<vector<int>,double> X;

    Of course, you'll need to define a hash function for your key. See
    http://www.sgi.com/tech/stl/hash_map.html for details.

    Paul Dubuc
     
    Paul Dubuc, Jan 13, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Vince Darley
    Replies:
    4
    Views:
    4,551
    emilchacko
    Mar 2, 2010
  2. Allan Bruce

    To STL or not to STL

    Allan Bruce, Oct 16, 2003, in forum: C++
    Replies:
    41
    Views:
    1,122
    Christopher Benson-Manica
    Oct 17, 2003
  3. sam
    Replies:
    1
    Views:
    687
    Gianni Mariani
    May 1, 2005
  4. Replies:
    4
    Views:
    830
    Daniel T.
    Feb 16, 2006
  5. Rakesh
    Replies:
    10
    Views:
    12,267
    Mike Schilling
    Apr 8, 2008
Loading...

Share This Page