STL Map uses hashing?

V

Vinodh

I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?
 
B

Barry

Vinodh said:
I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?

std::map std::multimap std::set std::multiset are often implemented
using Red-Black Tree

In tr1, there are unordered_xxx types using hash table
which were implemented by some vendors as hash_xxx for extension before tr1
 
J

Jerry Coffin

I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?

No -- it uses a balanced tree of some sort (e.g. AVL or red-black tree).

TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
which are intended to be based on hashing.
 
J

Jerry Coffin

[email protected] says... said:
For information, this is because of requirements:

That's true.
A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are typically faster
than balanced tree on large amount of random data, they do not
guarantee a worse case scenario of O log(N).

Actually, it IS entirely possible to create a hash map that guarantees O
(log(N)) complexity in the worst case for insert, delete and find.

The requirement that's difficult to meet is iterating in order in a
specified period of time.
TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
which are intended to be based on hashing.

I.e. TR1 did accept that hash maps are also useful and added them.
Now all we need is for the developpers to be able to make an informed
choice on the correct one to use for a particular application. :)

Actually, there never seems to have been a belief that hash-based
containers weren't/coudn't be useful -- but they're relatively complex
to specify correctly, and the committee had decided not to accept any
new features (for the original 1998 standard) before a suitable
specification of hash-based containers was finished.
 
J

James Kanze

For information, this is because of requirements:
A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are
typically faster than balanced tree on large amount of random
data, they do not guarantee a worse case scenario of O log(N).

I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).
 
J

Juha Nieminen

James said:
I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.
 
E

Erik Wikström

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

Currently the parametrised hash struct (which is used by the unordered
containers) is required to be instantiable for integers, float, double,
pointers, and string-types. It says nothing about the quality of the
implementation.
 
T

Thomas J. Gritzan

Pete said:
Indeed. That's why I've never thought that standardizing hashed
containers was a good idea. There's just too much flexibility, making
them hard to specify well, with the result that naive users can get
horrible performance without knowing why.

But, to answer your question, no, there is no requirement for "high
quality" hashing functions. Implementations will be required to provide
hashing functions for the builtin types, pointers, std::string,
std::u16string, std::u32string, std::wstring, std::error_code, and
std::thread::id.

What about std::pair<>, tuple<>, etc.?

How to do a good hashing function for a combined type like

struct key
{
int key1;
std::string key2;
};

?
 
J

Jerry Coffin

[ ... ]
Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Hash functions are required in the standard library. <functional> has
specializations of hash<> for the obvious arithmetic types, plus
pointers, string types (string, u16string, u32string, wstring) and
error_codes and thread::id's.

As usual, the quality is up to the individual implementation. [N2521,
unord.hash/2]: "The return value of operator() is unspecified, except
that equal arguments shall yield the same result."
 
J

James Kanze

On 2008-05-31 11:50:57 -0400, Juha Nieminen <[email protected]> said:

[...]
But, to answer your question, no, there is no requirement for
"high quality" hashing functions. Implementations will be
required to provide hashing functions for the builtin types,
pointers, std::string, std::u16string, std::u32string,
std::wstring, std::error_code, and std::thread::id.

But not std::type_info?
 
J

James Kanze

Ok, sorry about my lack of definition for "typically":
Assuming a non-perfect hash function that return you a bucket
identifier (almost necessary since a perfect hash function
would be too expensive for it to be worthwhile), this hash
function makes assumptions on the data it receives (e.g.
random). The hash table making use of this hash function will
be very fast when presented with "typical" data (i.e. data
that generally meet the assumptions used for creating the hash
function). Unfortunately, there may be some worse case
scenario under which the performance of the hash table will be
abyssimal.

That's not my point. The performance of a hash table depends
very much on the quality of the hash function. While a perfect
hash function is only possible if the exact set of data are
known in advance, hash functions do differ enormously in their
quality, and I've seen more than a few cases where even some
very competent programmers have come up with relatively poor
hash functions; the most obvious is in Java's hash map, where
Java was forced to change the hash function as a result. I
suspect that part of the reason why the original STL had a
balanced tree, but not a hash table, is that one can sort of
expect an average programmer to get the ordering relationship
right, but you can't expect him to define a good hash function.
(There's also the point that if he gets the ordering function
wrong, there's a good chance of the program not working at all,
so he'll notice. Where as in the case of a hash function, the
program will only be a little bit slower than it should be.)
 
A

Andrew Koenig

I suspect that part of the reason why the original STL had a
balanced tree, but not a hash table, is that one can sort of
expect an average programmer to get the ordering relationship
right, but you can't expect him to define a good hash function.
(There's also the point that if he gets the ordering function
wrong, there's a good chance of the program not working at all,
so he'll notice. Where as in the case of a hash function, the
program will only be a little bit slower than it should be.)

I believe that I implemented the first associative-array class in C++ (20
years ago -- how time flies! -- see
http://www.softwarepreservation.org/projects/c_plus_plus/library/att/koenig-assoc.pdf),
and the architecture of the STL map class was partly based on my design; so
I'm pretty confident that I understand why it was designed as it was :)

James Kanze's description is correct, as far as it goes; but it misses one
point that I considered socially (or, if you prefer, politically) important
at the time: If a user supplies a naive hash function that results in
terrible performance, the user is likely to blame the library (or the
language!) rather than the hash function. Therefore, when I designed the
associative-array class, I felt that it was more important to obtain
acceptable worst-case performance easily than it was to obtain excellent
average-case performance with substantial effort.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Sorting an STL map 1
Universal hashing 5
Asp.net interactive map 0
hashing strings to integers for sqlite3 keys 22
STL Map Problem 3
64-bit hashing function 30
Registration Form 7
Dynamic Hashing 1

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top