STL Map uses hashing?

Discussion in 'C++' started by Vinodh, May 29, 2008.

  1. Vinodh

    Vinodh Guest

    I am reading about hashing techniques. The map data structure
    available in C++ STL uses hashing techniques?
     
    Vinodh, May 29, 2008
    #1
    1. Advertising

  2. Vinodh

    Barry Guest

    Vinodh wrote:
    > I am reading about hashing techniques. The map data structure
    > available in C++ STL uses hashing techniques?


    std::map std::multimap std::set std::multiset are often implemented
    using Red-Black Tree

    In tr1, there are unordered_xxx types using hash table
    which were implemented by some vendors as hash_xxx for extension before tr1

    --
    Best Regards
    Barry
     
    Barry, May 29, 2008
    #2
    1. Advertising

  3. Vinodh

    Jerry Coffin Guest

    In article <4353ee37-4fa4-4bc0-924f-482523333e21
    @u12g2000prd.googlegroups.com>, says...
    > I am reading about hashing techniques. The map data structure
    > available in C++ STL uses hashing techniques?


    No -- it uses a balanced tree of some sort (e.g. AVL or red-black tree).

    TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
    which are intended to be based on hashing.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, May 29, 2008
    #3
  4. Vinodh

    Jerry Coffin Guest

    In article <>, says...

    [ ... ]

    > For information, this is because of requirements:


    That's true.

    > A STL map is required to have a worse case find() of O log(N).
    > Balanced trees give you that. Although hash maps are typically faster
    > than balanced tree on large amount of random data, they do not
    > guarantee a worse case scenario of O log(N).


    Actually, it IS entirely possible to create a hash map that guarantees O
    (log(N)) complexity in the worst case for insert, delete and find.

    The requirement that's difficult to meet is iterating in order in a
    specified period of time.

    > >TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
    > >which are intended to be based on hashing.

    >
    > I.e. TR1 did accept that hash maps are also useful and added them.
    > Now all we need is for the developpers to be able to make an informed
    > choice on the correct one to use for a particular application. :)


    Actually, there never seems to have been a belief that hash-based
    containers weren't/coudn't be useful -- but they're relatively complex
    to specify correctly, and the committee had decided not to accept any
    new features (for the original 1998 standard) before a suitable
    specification of hash-based containers was finished.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, May 31, 2008
    #4
  5. Vinodh

    James Kanze Guest

    On May 30, 4:28 pm, (Yannick Tremblay) wrote:
    > In article <>,
    > Jerry Coffin <> wrote:


    > >In article <4353ee37-4fa4-4bc0-924f-482523333e21
    > >@u12g2000prd.googlegroups.com>, says...
    > >> I am reading about hashing techniques. The map data structure
    > >> available in C++ STL uses hashing techniques?


    > >No -- it uses a balanced tree of some sort (e.g. AVL or
    > >red-black tree).


    > For information, this is because of requirements:


    > A STL map is required to have a worse case find() of O log(N).
    > Balanced trees give you that. Although hash maps are
    > typically faster than balanced tree on large amount of random
    > data, they do not guarantee a worse case scenario of O log(N).


    I don't know about the "typically faster". Their average access
    time is faster IF the hash function is good. Typically, I find
    that it's not always that good (although the situation is
    improving).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 31, 2008
    #5
  6. James Kanze wrote:
    > I don't know about the "typically faster". Their average access
    > time is faster IF the hash function is good. Typically, I find
    > that it's not always that good (although the situation is
    > improving).


    Question: Will the new C++ standard require for the libraries to
    provide high-quality default hashing functions for internal types (and
    perhaps some other standard types such as std::string), or will the user
    be forced to always provide one himself?

    Creating good-quality hashing functions are not a trivial task at all
    (and the subject of continuing extensive research). One cannot expect
    the average user to invent a good-quality function without extensive
    knowledge and experience on the subject.
     
    Juha Nieminen, May 31, 2008
    #6
  7. On 2008-05-31 17:50, Juha Nieminen wrote:
    > James Kanze wrote:
    >> I don't know about the "typically faster". Their average access
    >> time is faster IF the hash function is good. Typically, I find
    >> that it's not always that good (although the situation is
    >> improving).

    >
    > Question: Will the new C++ standard require for the libraries to
    > provide high-quality default hashing functions for internal types (and
    > perhaps some other standard types such as std::string), or will the user
    > be forced to always provide one himself?
    >
    > Creating good-quality hashing functions are not a trivial task at all
    > (and the subject of continuing extensive research). One cannot expect
    > the average user to invent a good-quality function without extensive
    > knowledge and experience on the subject.


    Currently the parametrised hash struct (which is used by the unordered
    containers) is required to be instantiable for integers, float, double,
    pointers, and string-types. It says nothing about the quality of the
    implementation.

    --
    Erik Wikström
     
    Erik Wikström, May 31, 2008
    #7
  8. Pete Becker schrieb:
    >> Creating good-quality hashing functions are not a trivial task at all
    >> (and the subject of continuing extensive research). One cannot expect
    >> the average user to invent a good-quality function without extensive
    >> knowledge and experience on the subject.

    >
    > Indeed. That's why I've never thought that standardizing hashed
    > containers was a good idea. There's just too much flexibility, making
    > them hard to specify well, with the result that naive users can get
    > horrible performance without knowing why.
    >
    > But, to answer your question, no, there is no requirement for "high
    > quality" hashing functions. Implementations will be required to provide
    > hashing functions for the builtin types, pointers, std::string,
    > std::u16string, std::u32string, std::wstring, std::error_code, and
    > std::thread::id.


    What about std::pair<>, tuple<>, etc.?

    How to do a good hashing function for a combined type like

    struct key
    {
    int key1;
    std::string key2;
    };

    ?

    --
    Thomas
     
    Thomas J. Gritzan, May 31, 2008
    #8
  9. Vinodh

    Jerry Coffin Guest

    In article <Bte0k.157$>, lid
    says...

    [ ... ]

    > Question: Will the new C++ standard require for the libraries to
    > provide high-quality default hashing functions for internal types (and
    > perhaps some other standard types such as std::string), or will the user
    > be forced to always provide one himself?


    Hash functions are required in the standard library. <functional> has
    specializations of hash<> for the obvious arithmetic types, plus
    pointers, string types (string, u16string, u32string, wstring) and
    error_codes and thread::id's.

    As usual, the quality is up to the individual implementation. [N2521,
    unord.hash/2]: "The return value of operator() is unspecified, except
    that equal arguments shall yield the same result."

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, May 31, 2008
    #9
  10. Vinodh

    James Kanze Guest

    On May 31, 6:02 pm, Pete Becker <> wrote:
    > On 2008-05-31 11:50:57 -0400, Juha Nieminen <> said:


    [...]
    > But, to answer your question, no, there is no requirement for
    > "high quality" hashing functions. Implementations will be
    > required to provide hashing functions for the builtin types,
    > pointers, std::string, std::u16string, std::u32string,
    > std::wstring, std::error_code, and std::thread::id.


    But not std::type_info?

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 1, 2008
    #10
  11. Vinodh

    James Kanze Guest

    On Jun 2, 3:28 pm, (Yannick Tremblay) wrote:
    > In article
    > <>,
    > James Kanze <> wrote:


    > >On May 30, 4:28 pm, (Yannick Tremblay) wrote:


    > >> A STL map is required to have a worse case find() of O log(N).
    > >> Balanced trees give you that. Although hash maps are
    > >> typically faster than balanced tree on large amount of random
    > >> data, they do not guarantee a worse case scenario of O log(N).


    > >I don't know about the "typically faster". Their average access
    > >time is faster IF the hash function is good. Typically, I find
    > >that it's not always that good (although the situation is
    > >improving).


    > Ok, sorry about my lack of definition for "typically":


    > Assuming a non-perfect hash function that return you a bucket
    > identifier (almost necessary since a perfect hash function
    > would be too expensive for it to be worthwhile), this hash
    > function makes assumptions on the data it receives (e.g.
    > random). The hash table making use of this hash function will
    > be very fast when presented with "typical" data (i.e. data
    > that generally meet the assumptions used for creating the hash
    > function). Unfortunately, there may be some worse case
    > scenario under which the performance of the hash table will be
    > abyssimal.


    That's not my point. The performance of a hash table depends
    very much on the quality of the hash function. While a perfect
    hash function is only possible if the exact set of data are
    known in advance, hash functions do differ enormously in their
    quality, and I've seen more than a few cases where even some
    very competent programmers have come up with relatively poor
    hash functions; the most obvious is in Java's hash map, where
    Java was forced to change the hash function as a result. I
    suspect that part of the reason why the original STL had a
    balanced tree, but not a hash table, is that one can sort of
    expect an average programmer to get the ordering relationship
    right, but you can't expect him to define a good hash function.
    (There's also the point that if he gets the ordering function
    wrong, there's a good chance of the program not working at all,
    so he'll notice. Where as in the case of a hash function, the
    program will only be a little bit slower than it should be.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 3, 2008
    #11
  12. "James Kanze" <> wrote in message
    news:...
    On Jun 2, 3:28 pm, (Yannick Tremblay) wrote:

    > I suspect that part of the reason why the original STL had a
    > balanced tree, but not a hash table, is that one can sort of
    > expect an average programmer to get the ordering relationship
    > right, but you can't expect him to define a good hash function.
    > (There's also the point that if he gets the ordering function
    > wrong, there's a good chance of the program not working at all,
    > so he'll notice. Where as in the case of a hash function, the
    > program will only be a little bit slower than it should be.)


    I believe that I implemented the first associative-array class in C++ (20
    years ago -- how time flies! -- see
    http://www.softwarepreservation.org/projects/c_plus_plus/library/att/koenig-assoc.pdf),
    and the architecture of the STL map class was partly based on my design; so
    I'm pretty confident that I understand why it was designed as it was :)

    James Kanze's description is correct, as far as it goes; but it misses one
    point that I considered socially (or, if you prefer, politically) important
    at the time: If a user supplies a naive hash function that results in
    terrible performance, the user is likely to blame the library (or the
    language!) rather than the hash function. Therefore, when I designed the
    associative-array class, I felt that it was more important to obtain
    acceptable worst-case performance easily than it was to obtain excellent
    average-case performance with substantial effort.
     
    Andrew Koenig, Jun 18, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ed
    Replies:
    3
    Views:
    337
    Jeff Flinn
    Apr 4, 2005
  2. Marcus
    Replies:
    2
    Views:
    620
    Marcus
    Dec 9, 2005
  3. Replies:
    2
    Views:
    574
    klaus hoffmann
    Feb 22, 2006
  4. kl
    Replies:
    7
    Views:
    1,347
    James Kanze
    Jan 1, 2008
  5. Luca Risolia

    STL map to STL vector

    Luca Risolia, Jan 13, 2014, in forum: C++
    Replies:
    32
    Views:
    434
    Seungbeom Kim
    Jan 18, 2014
Loading...

Share This Page