string hashing design/implementation questions involving stringliterals

Discussion in 'C++' started by Sebastian Karlsson, Feb 12, 2008.

  1. I'm creating an audio player and want its clients to be able to play a
    certain sound in a certain category using an identifier of some sort.
    The identifier corresponding to a certain sound will be linked to a
    file using a xml file, and as a consequence I'm thinking the best
    design is probably using a string as an identifier. For performance
    reasons as well as to satisfy my curiosity of the subject I'm thinking
    this might be an ideal candidate for string hashing. Now I have some
    questions regarding this which I'm having troubles finding an answer
    to:

    I don't want the client of my player to really have to familiarize
    himself to much with my string hashing. Simplified I'm thinking
    something along the lines of audioplayer.Play( "boom" ) where Play is
    Play( StringHash p_Identifier ). Since construction of the hash is
    pretty expensive I'm thinking of having a private static hash table in
    StringHash which would mantain the actual hashes, and then use the
    memory address of passed argument as the key. Then I would just do a
    fast lookup from the argument to see if there already is a hash that
    is computed for that particular string. I'm not exactly an expert on
    the particulars of string literals but from my experiments and from
    what I've read their memory address should remain constant throughout
    all the calls. Is this true in theory? Or atleast in practice?

    If above is true there's still a very large problem remaining, users
    passing strings that aren't string literals, for example
    std::string::c_str() comes to mind. These aren't guaranteed to be
    constant between calls to a StringHash constructor so I can't use the
    memory address as a key to the value. I could create a constructor of
    StringHash which accepts a std::string&, however that certainly
    doesn't guarantee that the user doesn't use c_str(), and std::string
    isn't really the only possible problem anyway. What really would help
    me here would be if there's any neat way to guarantee that the client
    only uses string literals for my StringHash( const char* )
    constructor. Is there any?

    Any feedback appreciated!
     
    Sebastian Karlsson, Feb 12, 2008
    #1
    1. Advertising

  2. On 12 Feb, 13:30, Sebastian Karlsson <>
    wrote:
    [ snip ]
    >
    > If above is true there's still a very large problem remaining, users
    > passing strings that aren't string literals, for example
    > std::string::c_str() comes to mind. These aren't guaranteed to be
    > constant between calls to a StringHash constructor so I can't use the
    > memory address as a key to the value. I could create a constructor of
    > StringHash which accepts a std::string&, however that certainly
    > doesn't guarantee that the user doesn't use c_str(), and std::string
    > isn't really the only possible problem anyway. What really would help
    > me here would be if there's any neat way to guarantee that the client
    > only uses string literals for my StringHash( const char* )
    > constructor. Is there any?
    >
    > Any feedback appreciated!


    Firstly, I don't think there is any way to enforce string
    literals. One might be tempted to defeat passing a char*
    via eg
    template<int N> void bar(const char(&ca)[N]) { ...}
    This will accept
    bar("yowza");
    but reject e.g.
    bar(std_string.c_str());
    The problem is it _also_ accepts
    char oh_dear[] = "In trouble now";
    bar(oh_dear);
    Where oh_dear may eg be on the stack.

    But I think the problem is this: you've gone down
    a rat-hole looking for a solution without looking
    properly at the problem, because what actual use
    is:
    bar("magically works only with literals"); ??

    When the string has to be hard-coded in, you
    may as well just do this:
    void bar(int tag) { .... }

    so that the user can just do this:
    enum my_tags {
    wow, yowza, hooray
    };
    bar(wow);
    bar(yowza);
    etc
     
    tragomaskhalos, Feb 12, 2008
    #2
    1. Advertising

  3. * tragomaskhalos:
    > On 12 Feb, 13:30, Sebastian Karlsson <>
    > wrote:
    > [ snip ]
    >> If above is true there's still a very large problem remaining, users
    >> passing strings that aren't string literals, for example
    >> std::string::c_str() comes to mind. These aren't guaranteed to be
    >> constant between calls to a StringHash constructor so I can't use the
    >> memory address as a key to the value. I could create a constructor of
    >> StringHash which accepts a std::string&, however that certainly
    >> doesn't guarantee that the user doesn't use c_str(), and std::string
    >> isn't really the only possible problem anyway. What really would help
    >> me here would be if there's any neat way to guarantee that the client
    >> only uses string literals for my StringHash( const char* )
    >> constructor. Is there any?
    >>
    >> Any feedback appreciated!

    >
    > Firstly, I don't think there is any way to enforce string
    > literals. One might be tempted to defeat passing a char*
    > via eg
    > template<int N> void bar(const char(&ca)[N]) { ...}
    > This will accept
    > bar("yowza");
    > but reject e.g.
    > bar(std_string.c_str());
    > The problem is it _also_ accepts
    > char oh_dear[] = "In trouble now";
    > bar(oh_dear);
    > Where oh_dear may eg be on the stack.


    How to avoid this is a FAQ. Essentially, you document that one
    shouldn't do that. No design can prevent all cases of PEBCAK, and so
    it's futile to try, and generally invalid to use the impossibibility of
    100% perfection in that regard as argument against something.


    > But I think the problem is this: you've gone down
    > a rat-hole looking for a solution without looking
    > properly at the problem, because what actual use
    > is:
    > bar("magically works only with literals"); ??


    For example, an exception object where there should be no possibility of
    failure for construction of the exception object.

    For another example, returning a string from a function, in a general
    format that requires no further processing to be used along with e.g.
    dynamically created strings, without the overhead of dynamic allocation
    or string copying.

    Some code available at <url: http://alfsstringvalue.sourceforge.net/>,
    with an example showing constant time for various operations at bottom.



    > When the string has to be hard-coded in, you
    > may as well just do this:
    > void bar(int tag) { .... }
    >
    > so that the user can just do this:
    > enum my_tags {
    > wow, yowza, hooray
    > };
    > bar(wow);
    > bar(yowza);
    > etc


    No, that's an invalid argument. Try to replace all string literals in
    your code with symbolic constants. Then go on to maintain that code.


    Cheers, & hth.,

    - Alf

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Feb 12, 2008
    #3
  4. On 12 Feb, 20:27, "Alf P. Steinbach" <> wrote:
    > * tragomaskhalos:
    >
    >
    >
    >
    >
    > > On 12 Feb, 13:30, Sebastian Karlsson <>
    > > wrote:
    > > [ snip ]
    > >> If above is true there's still a very large problem remaining, users
    > >> passing strings that aren't string literals, for example
    > >> std::string::c_str() comes to mind. These aren't guaranteed to be
    > >> constant between calls to a StringHash constructor so I can't use the
    > >> memory address as a key to the value. I could create a constructor of
    > >> StringHash which accepts a std::string&, however that certainly
    > >> doesn't guarantee that the user doesn't use c_str(), and std::string
    > >> isn't really the only possible problem anyway. What really would help
    > >> me here would be if there's any neat way to guarantee that the client
    > >> only uses string literals for my StringHash( const char* )
    > >> constructor. Is there any?

    >
    > >> Any feedback appreciated!

    >
    > > Firstly, I don't think there is any way to enforce string
    > > literals. One might be tempted to defeat passing a char*
    > > via eg
    > >   template<int N> void bar(const char(&ca)[N]) { ...}
    > > This will accept
    > >   bar("yowza");
    > > but reject e.g.
    > >   bar(std_string.c_str());
    > > The problem is it _also_ accepts
    > >   char oh_dear[] = "In trouble now";
    > >   bar(oh_dear);
    > > Where oh_dear may eg be on the stack.

    >
    > How to avoid this is a FAQ.  Essentially, you document that one
    > shouldn't do that.  No design can prevent all cases of PEBCAK, and so
    > it's futile to try, and generally invalid to use the impossibibility of
    > 100% perfection in that regard as argument against something.
    >
    > > But I think the problem is this: you've gone down
    > > a rat-hole looking for a solution without looking
    > > properly at the problem, because what actual use
    > > is:
    > >   bar("magically works only with literals");  ??

    >
    > For example, an exception object where there should be no possibility of
    > failure for construction of the exception object.
    >
    > For another example, returning a string from a function, in a general
    > format that requires no further processing to be used along with e.g.
    > dynamically created strings, without the overhead of dynamic allocation
    > or string copying.
    >
    > Some code available at <url:http://alfsstringvalue.sourceforge.net/>,
    > with an example showing constant time for various operations at bottom.
    >
    > > When the string has to be hard-coded in, you
    > > may as well just do this:
    > >   void bar(int tag) { .... }

    >
    > > so that the user can just do this:
    > >   enum my_tags {
    > >     wow, yowza, hooray
    > >   };
    > >   bar(wow);
    > >   bar(yowza);
    > > etc

    >
    > No, that's an invalid argument.  Try to replace all string literals in
    > your code with symbolic constants.  Then go on to maintain that code.
    >
    > Cheers, & hth.,
    >
    > - Alf
    >


    I think we may be at cross-purposes; I wasn't trying to
    imply that having functions that take/return only string
    literals is never useful, or that one should systematically
    replace them with enums (that would be crazy), just
    suggesting an alternative in this particular case.
    But rereading the OP's post again this is moot, because
    if I understand correctly he wants to also use the text
    value in an XML file, so he does need a string of some
    sort. And your StringValue looks like it would do very
    nicely.
     
    tragomaskhalos, Feb 12, 2008
    #4
  5. On 12 Feb, 22:35, tragomaskhalos <>
    wrote:
    > On 12 Feb, 20:27, "Alf P. Steinbach" <> wrote:
    >
    >
    >
    > > * tragomaskhalos:

    >
    > > > On 12 Feb, 13:30, Sebastian Karlsson <>
    > > > wrote:
    > > > [ snip ]
    > > >> If above is true there's still a very large problem remaining, users
    > > >> passing strings that aren't string literals, for example
    > > >> std::string::c_str() comes to mind. These aren't guaranteed to be
    > > >> constant between calls to a StringHash constructor so I can't use the
    > > >> memory address as a key to the value. I could create a constructor of
    > > >> StringHash which accepts a std::string&, however that certainly
    > > >> doesn't guarantee that the user doesn't use c_str(), and std::string
    > > >> isn't really the only possible problem anyway. What really would help
    > > >> me here would be if there's any neat way to guarantee that the client
    > > >> only uses string literals for my StringHash( const char* )
    > > >> constructor. Is there any?

    >
    > > >> Any feedback appreciated!

    >
    > > > Firstly, I don't think there is any way to enforce string
    > > > literals. One might be tempted to defeat passing a char*
    > > > via eg
    > > > template<int N> void bar(const char(&ca)[N]) { ...}
    > > > This will accept
    > > > bar("yowza");
    > > > but reject e.g.
    > > > bar(std_string.c_str());
    > > > The problem is it _also_ accepts
    > > > char oh_dear[] = "In trouble now";
    > > > bar(oh_dear);
    > > > Where oh_dear may eg be on the stack.

    >
    > > How to avoid this is a FAQ. Essentially, you document that one
    > > shouldn't do that. No design can prevent all cases of PEBCAK, and so
    > > it's futile to try, and generally invalid to use the impossibibility of
    > > 100% perfection in that regard as argument against something.

    >
    > > > But I think the problem is this: you've gone down
    > > > a rat-hole looking for a solution without looking
    > > > properly at the problem, because what actual use
    > > > is:
    > > > bar("magically works only with literals"); ??

    >
    > > For example, an exception object where there should be no possibility of
    > > failure for construction of the exception object.

    >
    > > For another example, returning a string from a function, in a general
    > > format that requires no further processing to be used along with e.g.
    > > dynamically created strings, without the overhead of dynamic allocation
    > > or string copying.

    >
    > > Some code available at <url:http://alfsstringvalue.sourceforge.net/>,
    > > with an example showing constant time for various operations at bottom.

    >
    > > > When the string has to be hard-coded in, you
    > > > may as well just do this:
    > > > void bar(int tag) { .... }

    >
    > > > so that the user can just do this:
    > > > enum my_tags {
    > > > wow, yowza, hooray
    > > > };
    > > > bar(wow);
    > > > bar(yowza);
    > > > etc

    >
    > > No, that's an invalid argument. Try to replace all string literals in
    > > your code with symbolic constants. Then go on to maintain that code.

    >
    > > Cheers, & hth.,

    >
    > > - Alf

    >
    > I think we may be at cross-purposes; I wasn't trying to
    > imply that having functions that take/return only string
    > literals is never useful, or that one should systematically
    > replace them with enums (that would be crazy), just
    > suggesting an alternative in this particular case.
    > But rereading the OP's post again this is moot, because
    > if I understand correctly he wants to also use the text
    > value in an XML file, so he does need a string of some
    > sort. And your StringValue looks like it would do very
    > nicely.


    Thanks for all the feedback, I'll take a look at StringValue and see
    where that takes me.
     
    Sebastian Karlsson, Feb 13, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    6
    Views:
    422
    Ian T
    Dec 10, 2004
  2. Matias Woloski

    Extendible Hashing implementation

    Matias Woloski, Sep 3, 2003, in forum: C++
    Replies:
    2
    Views:
    3,428
    LibraryUser
    Sep 3, 2003
  3. Matias Woloski

    Extendible Hashing implementation

    Matias Woloski, Sep 3, 2003, in forum: C Programming
    Replies:
    2
    Views:
    597
    LibraryUser
    Sep 3, 2003
  4. Kyle T. Jones
    Replies:
    7
    Views:
    341
    Carl Banks
    Apr 30, 2009
  5. Angus
    Replies:
    3
    Views:
    486
    James Kanze
    Aug 27, 2009
Loading...

Share This Page