Help migrating hash_set to c++0x

Discussion in 'C++' started by Paulo da Silva, Dec 20, 2010.

  1. Please correct me if I am wrong ...

    I replaced hash_set by unordered_set. It seems that hash_set does not
    exist using c++0x.

    I am having the following problem:

    pair<some_container::iterator,bool> r=ct_dir->insert(de);

    When "de" already exists, with hash_table r.second contained false, but
    using unordered_set r.second contains true, i.e. it accepts duplicates.

    How do I fix this to avoid duplicates in the set and also be aware of that?

    Thanks
     
    Paulo da Silva, Dec 20, 2010
    #1
    1. Advertising

  2. Paulo da Silva

    red floyd Guest

    On 12/20/2010 2:59 PM, Paulo da Silva wrote:
    > Please correct me if I am wrong ...
    >
    > I replaced hash_set by unordered_set. It seems that hash_set does not
    > exist using c++0x.
    >


    Just an FYI. hash_set does not exist in c++98 or c++03 either.
     
    red floyd, Dec 21, 2010
    #2
    1. Advertising

  3. Em 21-12-2010 02:24, red floyd escreveu:
    > On 12/20/2010 2:59 PM, Paulo da Silva wrote:
    >> Please correct me if I am wrong ...
    >>
    >> I replaced hash_set by unordered_set. It seems that hash_set does not
    >> exist using c++0x.
    >>

    >
    > Just an FYI. hash_set does not exist in c++98 or c++03 either.

    It existed, at least as an experimental feature.
    The problem is what replaced it? In other words, how to implement a hash
    set without duplicates?
     
    Paulo da Silva, Dec 21, 2010
    #3
  4. Em 20-12-2010 22:59, Paulo da Silva escreveu:

    I just saw (wikipedia for example) that unordered_set should implement
    the same behaviour as that of hash_set. Neverthless it is not working in
    this code!

    #include <memory>
    #include <unordered_set>
    #include <iostream>

    using namespace std;

    class Foo
    {public:
    string s;
    Foo(char const * const sc): s(sc) {}
    };

    class eqf
    {public:
    inline bool operator()(Foo const &s1,Foo const &s2) const
    { return (s1.s==s2.s);
    }
    };

    class hf
    {public:
    inline size_t operator()(Foo const &x) const
    { return hash<char const *>()(x.s.c_str());
    }
    };

    // typedef __gnu_cxx::hash_set<Foo,hf,eqf> MSet;
    typedef unordered_set<Foo,hf,eqf> MSet;

    int main()
    { MSet mc;
    pair<MSet::iterator,bool> r;
    r=mc.insert(Foo("xxxx"));
    // OK expected and obtained
    cout << "xxxx " << (r.second?"OK":"BAD") << endl;
    mc.insert(Foo("zzzz"));
    // Does not allow duplicates ...
    r=mc.insert(Foo("zzzz"));
    // BAD (duplicate) expected but OK obtained
    cout << "zzzz " << (r.second?"OK":"BAD") << endl;
    MSet::const_iterator it=mc.find(Foo("xxxx"));
    for (it=mc.begin();it!=mc.end();++it)
    cout << it->s << endl;
    return 0;
    }

    Anything wrong? Better way to implement?
    Thanks for any comments.






    > Please correct me if I am wrong ...
    >
    > I replaced hash_set by unordered_set. It seems that hash_set does not
    > exist using c++0x.
    >
    > I am having the following problem:
    >
    > pair<some_container::iterator,bool> r=ct_dir->insert(de);
    >
    > When "de" already exists, with hash_table r.second contained false, but
    > using unordered_set r.second contains true, i.e. it accepts duplicates.
    >
    > How do I fix this to avoid duplicates in the set and also be aware of that?
    >
    > Thanks
     
    Paulo da Silva, Dec 21, 2010
    #4
  5. Re: [SOLVED] Help migrating hash_set to c++0x

    Em 21-12-2010 03:37, Paulo da Silva escreveu:
    > Em 20-12-2010 22:59, Paulo da Silva escreveu:
    >
    > I just saw (wikipedia for example) that unordered_set should implement
    > the same behaviour as that of hash_set. Neverthless it is not working in
    > this code!
    >
    > #include <memory>
    > #include <unordered_set>
    > #include <iostream>
    >
    > using namespace std;
    >
    > class Foo
    > {public:
    > string s;
    > Foo(char const * const sc): s(sc) {}
    > };
    >
    > class eqf
    > {public:
    > inline bool operator()(Foo const &s1,Foo const &s2) const
    > { return (s1.s==s2.s);
    > }
    > };
    >
    > class hf
    > {public:
    > inline size_t operator()(Foo const &x) const
    > { return hash<char const *>()(x.s.c_str());
    > }
    > };
    >
    > // typedef __gnu_cxx::hash_set<Foo,hf,eqf> MSet;
    > typedef unordered_set<Foo,hf,eqf> MSet;
    >
    > int main()
    > { MSet mc;
    > pair<MSet::iterator,bool> r;
    > r=mc.insert(Foo("xxxx"));
    > // OK expected and obtained
    > cout << "xxxx " << (r.second?"OK":"BAD") << endl;
    > mc.insert(Foo("zzzz"));
    > // Does not allow duplicates ...
    > r=mc.insert(Foo("zzzz"));
    > // BAD (duplicate) expected but OK obtained
    > cout << "zzzz " << (r.second?"OK":"BAD") << endl;
    > MSet::const_iterator it=mc.find(Foo("xxxx"));
    > for (it=mc.begin();it!=mc.end();++it)
    > cout << it->s << endl;
    > return 0;
    > }
    >
    > Anything wrong? Better way to implement?
    > Thanks for any comments.


    Replacing class hf with this works.
    class hf
    {public:
    inline size_t operator()(Foo const &s) const
    { return hash<string const &>()(s.s);
    }
    };

    I still don't understand why the previous example stopped to work!


    >
    >
    >
    >
    >
    >
    >> Please correct me if I am wrong ...
    >>
    >> I replaced hash_set by unordered_set. It seems that hash_set does not
    >> exist using c++0x.
    >>
    >> I am having the following problem:
    >>
    >> pair<some_container::iterator,bool> r=ct_dir->insert(de);
    >>
    >> When "de" already exists, with hash_table r.second contained false, but
    >> using unordered_set r.second contains true, i.e. it accepts duplicates.
    >>
    >> How do I fix this to avoid duplicates in the set and also be aware of that?
    >>
    >> Thanks

    >
     
    Paulo da Silva, Dec 21, 2010
    #5
  6. On Dec 20, 11:25 pm, Paulo da Silva
    <> wrote:
    > Em 21-12-2010 03:37, Paulo da Silva escreveu:
    >
    >
    >
    >
    >
    > > Em 20-12-2010 22:59, Paulo da Silva escreveu:

    >
    > > I just saw (wikipedia for example) that unordered_set should implement
    > > the same behaviour as that of hash_set. Neverthless it is not working in
    > > this code!

    >
    > > #include <memory>
    > > #include <unordered_set>
    > > #include <iostream>

    >
    > > using namespace std;

    >
    > > class Foo
    > > {public:
    > >    string s;
    > >    Foo(char const * const sc): s(sc) {}
    > > };

    >
    > > class eqf
    > > {public:
    > >    inline bool operator()(Foo const &s1,Foo const &s2) const
    > >    {       return (s1.s==s2.s);
    > >    }
    > > };

    >
    > > class hf
    > > {public:
    > >    inline size_t operator()(Foo const &x) const
    > >    {       return hash<char const *>()(x.s.c_str());
    > >    }
    > > };

    >
    > > // typedef __gnu_cxx::hash_set<Foo,hf,eqf> MSet;
    > > typedef unordered_set<Foo,hf,eqf> MSet;

    >
    > > int main()
    > > {  MSet mc;
    > >    pair<MSet::iterator,bool> r;
    > >    r=mc.insert(Foo("xxxx"));
    > >    // OK expected and obtained
    > >    cout << "xxxx " << (r.second?"OK":"BAD") << endl;
    > >    mc.insert(Foo("zzzz"));
    > >    // Does not allow duplicates ...
    > >    r=mc.insert(Foo("zzzz"));
    > >    // BAD (duplicate) expected but OK obtained
    > >    cout << "zzzz " << (r.second?"OK":"BAD") << endl;
    > >    MSet::const_iterator it=mc.find(Foo("xxxx"));
    > >    for (it=mc.begin();it!=mc.end();++it)
    > >            cout << it->s << endl;
    > >    return 0;
    > > }

    >
    > > Anything wrong? Better way to implement?
    > > Thanks for any comments.

    >
    > Replacing class hf with this works.
    > class hf
    > {public:
    >         inline size_t operator()(Foo const &s) const
    >         {       return hash<string const &>()(s.s);
    >         }
    >
    > };
    >
    > I still don't understand why the previous example stopped to work!


    I suspect it was a change in the behavior of the hash<char const*>
    function. In C++0x std::hash<char const*> simply hashes the pointer,
    not a null terminated string. Therefore using std::hash<char const*>
    combined with your eqf would give you the situation that two Foo's
    could compare equal but hash to different buckets. This is a
    situation that will essentially corrupt your container. Your fix
    (using hash<string const &>) is correct.

    -Howard
     
    Howard Hinnant, Dec 21, 2010
    #6
  7. Em 21-12-2010 16:05, Howard Hinnant escreveu:
    > On Dec 20, 11:25 pm, Paulo da Silva
    > <> wrote:
    >> Em 21-12-2010 03:37, Paulo da Silva escreveu:
    >>
    >>
    >>
    >>
    >>
    >>> Em 20-12-2010 22:59, Paulo da Silva escreveu:

    >>
    >>> I just saw (wikipedia for example) that unordered_set should implement
    >>> the same behaviour as that of hash_set. Neverthless it is not working in
    >>> this code!

    >>
    >>> #include <memory>
    >>> #include <unordered_set>
    >>> #include <iostream>

    >>
    >>> using namespace std;

    >>
    >>> class Foo
    >>> {public:
    >>> string s;
    >>> Foo(char const * const sc): s(sc) {}
    >>> };

    >>
    >>> class eqf
    >>> {public:
    >>> inline bool operator()(Foo const &s1,Foo const &s2) const
    >>> { return (s1.s==s2.s);
    >>> }
    >>> };

    >>
    >>> class hf
    >>> {public:
    >>> inline size_t operator()(Foo const &x) const
    >>> { return hash<char const *>()(x.s.c_str());
    >>> }
    >>> };

    >>
    >>> // typedef __gnu_cxx::hash_set<Foo,hf,eqf> MSet;
    >>> typedef unordered_set<Foo,hf,eqf> MSet;

    >>
    >>> int main()
    >>> { MSet mc;
    >>> pair<MSet::iterator,bool> r;
    >>> r=mc.insert(Foo("xxxx"));
    >>> // OK expected and obtained
    >>> cout << "xxxx " << (r.second?"OK":"BAD") << endl;
    >>> mc.insert(Foo("zzzz"));
    >>> // Does not allow duplicates ...
    >>> r=mc.insert(Foo("zzzz"));
    >>> // BAD (duplicate) expected but OK obtained
    >>> cout << "zzzz " << (r.second?"OK":"BAD") << endl;
    >>> MSet::const_iterator it=mc.find(Foo("xxxx"));
    >>> for (it=mc.begin();it!=mc.end();++it)
    >>> cout << it->s << endl;
    >>> return 0;
    >>> }

    >>
    >>> Anything wrong? Better way to implement?
    >>> Thanks for any comments.

    >>
    >> Replacing class hf with this works.
    >> class hf
    >> {public:
    >> inline size_t operator()(Foo const &s) const
    >> { return hash<string const &>()(s.s);
    >> }
    >>
    >> };
    >>
    >> I still don't understand why the previous example stopped to work!

    >
    > I suspect it was a change in the behavior of the hash<char const*>
    > function. In C++0x std::hash<char const*> simply hashes the pointer,
    > not a null terminated string. Therefore using std::hash<char const*>
    > combined with your eqf would give you the situation that two Foo's
    > could compare equal but hash to different buckets. This is a
    > situation that will essentially corrupt your container. Your fix
    > (using hash<string const &>) is correct.
    >

    That makes sense and explains the whole thing.
    Nevertheless it seems a weird behaviour of hash<char const*>.
    Thanks Howard.
     
    Paulo da Silva, Dec 21, 2010
    #7
  8. Pete Becker <> wrote:
    > The C++ standard doesn't have experimental features.


    One could cynically argue that export templates turned out to be one,
    in practice.
     
    Juha Nieminen, Dec 21, 2010
    #8
  9. Paulo da Silva

    James Kanze Guest

    On Dec 21, 4:05 pm, Howard Hinnant <> wrote:
    > On Dec 20, 11:25 pm, Paulo da Silva

    [...]
    > In C++0x std::hash<char const*> simply hashes the pointer,
    > not a null terminated string.


    Why did they do that? I can see hashing the pointer for other
    pointer types, but char const*? (Of course, if the indices are
    strings, which is usually the case, hash<std::string> seems more
    logical.)

    --
    James Kanze
     
    James Kanze, Dec 21, 2010
    #9
  10. On Dec 21, 2:54 pm, James Kanze <> wrote:
    > On Dec 21, 4:05 pm, Howard Hinnant <> wrote:
    >
    >     [...]
    > > In C++0x std::hash<char const*> simply hashes the pointer,
    > > not a null terminated string.

    >
    > Why did they do that?  I can see hashing the pointer for other
    > pointer types, but char const*?  (Of course, if the indices are
    > strings, which is usually the case, hash<std::string> seems more
    > logical.)


    Because it makes the behavior uniform for different pointer types.
    This is a big win in generic code. For example storing pointers or
    iterators in containers is common practice. Making hash<char*> behave
    differently than hash<int*> would have even more severe consequences
    than making vector<bool> behave differently than vector<int>. If
    there is a silver lining to the vector<bool> cloud, it is that it
    taught us a lesson in generic programming. Best we not forget that
    lesson.

    -Howard
     
    Howard Hinnant, Dec 21, 2010
    #10
  11. Paulo da Silva

    James Kanze Guest

    On Dec 21, 9:05 pm, Howard Hinnant <> wrote:
    > On Dec 21, 2:54 pm, James Kanze <> wrote:


    > > On Dec 21, 4:05 pm, Howard Hinnant <> wrote:


    > > [...]
    > > > In C++0x std::hash<char const*> simply hashes the pointer,
    > > > not a null terminated string.


    > > Why did they do that? I can see hashing the pointer for other
    > > pointer types, but char const*? (Of course, if the indices are
    > > strings, which is usually the case, hash<std::string> seems more
    > > logical.)


    > Because it makes the behavior uniform for different pointer types.


    Like in iostream. For better or for worse, it's a different
    behavior that most programmers are used to, and even count on.

    > This is a big win in generic code. For example storing pointers or
    > iterators in containers is common practice.


    You mean an excessively dangerous practice, banned by a lot of
    coding guidelines. (Iterators and pointers tend to become
    invalid at inconvenient times.)

    > Making hash<char*> behave differently than hash<int*> would
    > have even more severe consequences than making vector<bool>
    > behave differently than vector<int>.


    Come now. In both cases, all that's needed is a little
    specialization (or overloading, if it's a question of
    functions). With the difference that I suspect unordered
    containers are fairly rare in generic code, where as vectors
    aren't.

    > If there is a silver lining to the vector<bool> cloud, it is
    > that it taught us a lesson in generic programming. Best we
    > not forget that lesson.


    The problem with vector<bool> isn't just that it's unorthogonal.
    It's that it doesn't meet any of the requirements for a
    container. (Arguable, some of the requirements may be
    overspecification; it's a bit hard that operator[] must return a
    real reference, for example. But that's a different issue.)

    If we were designing the language from zero, I'd totally agree
    with you. But we're not, and for better or worse, char const*
    is, in most people's minds (and in a lot of the interfaces we
    have to deal with), a string.

    Anyway, I doubt that it's a real problem. Since the standard
    doesn't make any quality guarantees (and I don't see how it
    could), no one will actually use the standard hash function
    anyway. (It suffers from the same problem rand() does: anything
    that's good enough to work in all cases will be unnecessarily
    slow in a number of frequent cases.)

    --
    James Kanze
     
    James Kanze, Dec 21, 2010
    #11
  12. On Dec 21, 6:32 pm, James Kanze <> wrote:
    > On Dec 21, 9:05 pm, Howard Hinnant <> wrote:
    >
    > > On Dec 21, 2:54 pm, James Kanze <> wrote:
    > > > On Dec 21, 4:05 pm, Howard Hinnant <> wrote:
    > > >     [...]
    > > > > In C++0x std::hash<char const*> simply hashes the pointer,
    > > > > not a null terminated string.
    > > > Why did they do that?  I can see hashing the pointer for other
    > > > pointer types, but char const*?  (Of course, if the indices are
    > > > strings, which is usually the case, hash<std::string> seems more
    > > > logical.)

    > > Because it makes the behavior uniform for different pointer types.


    My apologies, I was probably brusk.

    > Like in iostream.  For better or for worse, it's a different
    > behavior that most programmers are used to, and even count on.


    Agreed for iostream.

    > > This is a big win in generic code.  For example storing pointers or
    > > iterators in containers is common practice.

    >
    > You mean an excessively dangerous practice, banned by a lot of
    > coding guidelines.  (Iterators and pointers tend to become
    > invalid at inconvenient times.)


    No, I see containers of iterators or pointers into other containers on
    a regular basis used in a safe and useful manner. I see this pattern
    recommended here on this newsgroup with little or no objections.

    > > Making hash<char*> behave differently than hash<int*> would
    > > have even more severe consequences than making vector<bool>
    > > behave differently than vector<int>.

    >
    > Come now.  In both cases, all that's needed is a little
    > specialization (or overloading, if it's a question of
    > functions).  With the difference that I suspect unordered
    > containers are fairly rare in generic code, where as vectors
    > aren't.


    I expect unordered containers to become about as popular as (multi)map/
    set, perhaps a little more. Naturally the future is always difficult
    to predict.

    > > If there is a silver lining to the vector<bool> cloud, it is
    > > that it taught us a lesson in generic programming.  Best we
    > > not forget that lesson.

    >
    > The problem with vector<bool> isn't just that it's unorthogonal.
    > It's that it doesn't meet any of the requirements for a
    > container.  (Arguable, some of the requirements may be
    > overspecification; it's a bit hard that operator[] must return a
    > real reference, for example.  But that's a different issue.)


    The problem with vector<bool> is that it has expected semantics based
    on experience with vector<int>, vector<char>, vector<char*>, etc. And
    those expectations are not always met. This has more to do with the
    consistency of the client's experience than it does with a spec that
    may or may not be thoroughly read by the majority of the client
    authors. Don't get me wrong: I think a dynamically sized array of
    bits is a very beautiful and useful data structure. It just shouldn't
    have been named vector<bool> because vector<T> for all scalars has a
    well defined behavior without the specialization and the
    specialization of vector<bool> implementing a dynamic array of bits is
    unable to completely mimic the data structure of a dynamic array of
    bools. /That/ is why it was a mistake -- a mismatch of reasonable
    expectations.

    > If we were designing the language from zero, I'd totally agree
    > with you.  But we're not, and for better or worse, char const*
    > is, in most people's minds (and in a lot of the interfaces we
    > have to deal with), a string.


    For C++0x we are designing std::hash from scratch. And we are also
    designing std::unordered_(multi)map/set from scratch. Though
    admittedly gratefully guided by a fair amount field experience. That
    field experience indicates to me that:

    1. std::hash should treat all pointers uniformly.
    2. For storing containers of strings in a hash container, std::string,
    not char* is the preferred key. The latter is replete with memory
    ownership issues that the beginner is ill-equiped to deal with.
    3. If the client really wants to store null-terminated char*'s,
    dealing with memory ownership himself, then the hash containers should
    allow that. They do: write your own hash function. This client must
    be knowledgable enough to handle the manual memory management, manual
    equality comparison, and manual hash functions. It is all allowed, it
    just isn't the default.

    > Anyway, I doubt that it's a real problem.  Since the standard
    > doesn't make any quality guarantees (and I don't see how it
    > could), no one will actually use the standard hash function
    > anyway.  (It suffers from the same problem rand() does: anything
    > that's good enough to work in all cases will be unnecessarily
    > slow in a number of frequent cases.)


    rand suffered from a different problem: a implicitly recommended poor
    implementation. <random> goes a long way to correct that problem ...
    a very long way. <random> is one of the most impressive random number
    libraries I've seen for general purpose use. And it took a tremendous
    effort and expense from a few very dedicated individuals to guide it
    through the standardization process (I was not one of those).

    But I agree with you that clients should feel free to write their own
    hash functions. And I'm quite glad we didn't try to standardize, or
    make an example hashing function for arrays of char. That would have
    repeated the problems of rand. When prototyping, try the default
    first. If hashing turns up performance problems, then write your own,
    else don't. The library anticipates both cases. I expect some
    std::hash<std::string> to be better than others, and for the market to
    decide which implementation does best in this area. I have no doubt
    that all implementations will do their best to become the best.
    Vendors of the std::lib's want to please their customers. They have
    high motivation to do so.

    -Howard
     
    Howard Hinnant, Dec 22, 2010
    #12
  13. Paulo da Silva

    James Kanze Guest

    On Dec 22, 12:59 am, Howard Hinnant <> wrote:
    > On Dec 21, 6:32 pm, James Kanze <> wrote:


    [...]
    > > > This is a big win in generic code. For example storing pointers or
    > > > iterators in containers is common practice.


    > > You mean an excessively dangerous practice, banned by a lot of
    > > coding guidelines. (Iterators and pointers tend to become
    > > invalid at inconvenient times.)


    > No, I see containers of iterators or pointers into other containers on
    > a regular basis used in a safe and useful manner. I see this pattern
    > recommended here on this newsgroup with little or no objections.


    I've not seen it recommended by any serious professional. And
    it *is* dangerous: iterators are by design to be ephemeral.
    They are more or less the equivalent to pointers to local or
    member variables. (As with all "rules", there are some
    exceptions. But you won't find them much in application code.)

    > > > Making hash<char*> behave differently than hash<int*> would
    > > > have even more severe consequences than making vector<bool>
    > > > behave differently than vector<int>.


    > > Come now. In both cases, all that's needed is a little
    > > specialization (or overloading, if it's a question of
    > > functions). With the difference that I suspect unordered
    > > containers are fairly rare in generic code, where as vectors
    > > aren't.


    > I expect unordered containers to become about as popular as (multi)map/
    > set, perhaps a little more. Naturally the future is always difficult
    > to predict.


    I don't doubt that they'll become popular. But I can't imagine
    much generic code which deals with hash---if it does, it's not
    too generic, since it is limited to unordered containers.

    > > > If there is a silver lining to the vector<bool> cloud, it is
    > > > that it taught us a lesson in generic programming. Best we
    > > > not forget that lesson.


    > > The problem with vector<bool> isn't just that it's unorthogonal.
    > > It's that it doesn't meet any of the requirements for a
    > > container. (Arguable, some of the requirements may be
    > > overspecification; it's a bit hard that operator[] must return a
    > > real reference, for example. But that's a different issue.)


    > The problem with vector<bool> is that it has expected semantics based
    > on experience with vector<int>, vector<char>, vector<char*>, etc.


    Expected semantics based on the fact that vector is a sequence.

    > And
    > those expectations are not always met. This has more to do with the
    > consistency of the client's experience than it does with a spec that
    > may or may not be thoroughly read by the majority of the client
    > authors. Don't get me wrong: I think a dynamically sized array of
    > bits is a very beautiful and useful data structure. It just shouldn't
    > have been named vector<bool> because vector<T> for all scalars has a
    > well defined behavior without the specialization and the
    > specialization of vector<bool> implementing a dynamic array of bits is
    > unable to completely mimic the data structure of a dynamic array of
    > bools. /That/ is why it was a mistake -- a mismatch of reasonable
    > expectations.


    Agreed. Or in more formal terms, the standard defines
    a contract for std::vector, then violates it for
    std::vector<bool>.

    > > If we were designing the language from zero, I'd totally agree
    > > with you. But we're not, and for better or worse, char const*
    > > is, in most people's minds (and in a lot of the interfaces we
    > > have to deal with), a string.


    > For C++0x we are designing std::hash from scratch. And we are also
    > designing std::unordered_(multi)map/set from scratch. Though
    > admittedly gratefully guided by a fair amount field experience. That
    > field experience indicates to me that:


    > 1. std::hash should treat all pointers uniformly.
    > 2. For storing containers of strings in a hash container, std::string,
    > not char* is the preferred key. The latter is replete with memory
    > ownership issues that the beginner is ill-equiped to deal with.
    > 3. If the client really wants to store null-terminated char*'s,
    > dealing with memory ownership himself, then the hash containers should
    > allow that. They do: write your own hash function. This client must
    > be knowledgable enough to handle the manual memory management, manual
    > equality comparison, and manual hash functions. It is all allowed, it
    > just isn't the default.


    I'm not convinced that all of the above are that essential, but
    I am beginning to think that treating std::hash<char*> as just
    another pointer may be OK. If only because you also want it to
    work when the pointer is null. And of course, because the cases
    where the container will be indexed by C style strings should be
    very exceptional, and the coder should expect to have to do
    something exceptional in that case. (On the other hand, I'm
    sure we both know programmers who thing char const* is
    a string.)

    > > Anyway, I doubt that it's a real problem. Since the standard
    > > doesn't make any quality guarantees (and I don't see how it
    > > could), no one will actually use the standard hash function
    > > anyway. (It suffers from the same problem rand() does: anything
    > > that's good enough to work in all cases will be unnecessarily
    > > slow in a number of frequent cases.)


    > rand suffered from a different problem: a implicitly recommended poor
    > implementation.


    That aggrevated the problem, I agree. But the problem is, IMHO,
    more general: if you're doing any serious work, and you need
    random numbers, you need a minimum quality for those random
    numbers.

    Similarly, if you're using an unordered container, it's probably
    because of speed as well -- if not, std::vector and std::find
    work very well. And speed likely implies that you'll want
    a guaranteed minimum quality for the hash. For the data sets
    you expect. And while I think it would be possible to define
    a hash with a minimum quality for std::string (but
    std::basic_string already causes problems, because the hash has
    to be consistent with char_traits::eq), but I don't know of any
    generic way of specifying quality criteria for all possible
    types; for that matter, I don't know how you'd specify the
    quality critera other than by specifying an implementation.

    > <random> goes a long way to correct that problem ...
    > a very long way.


    Some would say too far:). (Over engineering and all that.
    Personally, since it's someone else who did the work, I'm all
    for over engineering.)

    > <random> is one of the most impressive random number
    > libraries I've seen for general purpose use. And it took a tremendous
    > effort and expense from a few very dedicated individuals to guide it
    > through the standardization process (I was not one of those).


    > But I agree with you that clients should feel free to write their own
    > hash functions. And I'm quite glad we didn't try to standardize, or
    > make an example hashing function for arrays of char. That would have
    > repeated the problems of rand. When prototyping, try the default
    > first. If hashing turns up performance problems, then write your own,
    > else don't. The library anticipates both cases. I expect some
    > std::hash<std::string> to be better than others, and for the market to
    > decide which implementation does best in this area. I have no doubt
    > that all implementations will do their best to become the best.
    > Vendors of the std::lib's want to please their customers. They have
    > high motivation to do so.


    Agreed in general, although I suspect that there's one issue
    that people will tend to forget: portability of the data. As
    long as the unordered containers are entirely in a single
    process, or only shared by code using the same implementation
    (and those are the only cases which the standard should be
    concerned with), using the standard hash is find, even if it is
    underspecified. I fear, however, that there are people who will
    use it (because it's there) to hash tables on disk, or other
    external media. I don't think it's a problem the standard
    should address (other than to specify that the hashing
    algorithms are "unspecified"), but teachers should at least
    mention the issue.

    --
    James Kanze
     
    James Kanze, Dec 22, 2010
    #13
  14. Paulo da Silva

    Jorgen Grahn Guest

    On Tue, 2010-12-21, Paulo da Silva wrote:
    > Em 20-12-2010 22:59, Paulo da Silva escreveu:
    >
    > I just saw (wikipedia for example) that unordered_set should implement
    > the same behaviour as that of hash_set.


    Where more exactly? As I recall it, the iterator invalidation rules are
    different, and there may be other differences as well. Unfortunately I
    don't have any references handy.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
     
    Jorgen Grahn, Dec 30, 2010
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Vasileios
    Replies:
    4
    Views:
    501
    Rolf Magnus
    Nov 4, 2003
  2. Abhijit Ray
    Replies:
    1
    Views:
    621
    John Harrison
    Apr 27, 2004
  3. Alex Gerdemann

    Performance of hash_set vs. Java

    Alex Gerdemann, Oct 11, 2004, in forum: C++
    Replies:
    10
    Views:
    2,314
    Tom Widmer
    Oct 13, 2004
  4. breck

    hash_set help

    breck, May 15, 2006, in forum: C++
    Replies:
    2
    Views:
    542
    Mark P
    May 15, 2006
  5. eric
    Replies:
    3
    Views:
    1,367
    Michael DOUBEZ
    Jul 12, 2011
Loading...

Share This Page