string hashing design/implementation questions involving stringliterals

S

Sebastian Karlsson

I'm creating an audio player and want its clients to be able to play a
certain sound in a certain category using an identifier of some sort.
The identifier corresponding to a certain sound will be linked to a
file using a xml file, and as a consequence I'm thinking the best
design is probably using a string as an identifier. For performance
reasons as well as to satisfy my curiosity of the subject I'm thinking
this might be an ideal candidate for string hashing. Now I have some
questions regarding this which I'm having troubles finding an answer
to:

I don't want the client of my player to really have to familiarize
himself to much with my string hashing. Simplified I'm thinking
something along the lines of audioplayer.Play( "boom" ) where Play is
Play( StringHash p_Identifier ). Since construction of the hash is
pretty expensive I'm thinking of having a private static hash table in
StringHash which would mantain the actual hashes, and then use the
memory address of passed argument as the key. Then I would just do a
fast lookup from the argument to see if there already is a hash that
is computed for that particular string. I'm not exactly an expert on
the particulars of string literals but from my experiments and from
what I've read their memory address should remain constant throughout
all the calls. Is this true in theory? Or atleast in practice?

If above is true there's still a very large problem remaining, users
passing strings that aren't string literals, for example
std::string::c_str() comes to mind. These aren't guaranteed to be
constant between calls to a StringHash constructor so I can't use the
memory address as a key to the value. I could create a constructor of
StringHash which accepts a std::string&, however that certainly
doesn't guarantee that the user doesn't use c_str(), and std::string
isn't really the only possible problem anyway. What really would help
me here would be if there's any neat way to guarantee that the client
only uses string literals for my StringHash( const char* )
constructor. Is there any?

Any feedback appreciated!
 
T

tragomaskhalos

On 12 Feb, 13:30, Sebastian Karlsson <[email protected]>
wrote:
[ snip ]
If above is true there's still a very large problem remaining, users
passing strings that aren't string literals, for example
std::string::c_str() comes to mind. These aren't guaranteed to be
constant between calls to a StringHash constructor so I can't use the
memory address as a key to the value. I could create a constructor of
StringHash which accepts a std::string&, however that certainly
doesn't guarantee that the user doesn't use c_str(), and std::string
isn't really the only possible problem anyway. What really would help
me here would be if there's any neat way to guarantee that the client
only uses string literals for my StringHash( const char* )
constructor. Is there any?

Any feedback appreciated!

Firstly, I don't think there is any way to enforce string
literals. One might be tempted to defeat passing a char*
via eg
template<int N> void bar(const char(&ca)[N]) { ...}
This will accept
bar("yowza");
but reject e.g.
bar(std_string.c_str());
The problem is it _also_ accepts
char oh_dear[] = "In trouble now";
bar(oh_dear);
Where oh_dear may eg be on the stack.

But I think the problem is this: you've gone down
a rat-hole looking for a solution without looking
properly at the problem, because what actual use
is:
bar("magically works only with literals"); ??

When the string has to be hard-coded in, you
may as well just do this:
void bar(int tag) { .... }

so that the user can just do this:
enum my_tags {
wow, yowza, hooray
};
bar(wow);
bar(yowza);
etc
 
A

Alf P. Steinbach

* tragomaskhalos:
On 12 Feb, 13:30, Sebastian Karlsson <[email protected]>
wrote:
[ snip ]
If above is true there's still a very large problem remaining, users
passing strings that aren't string literals, for example
std::string::c_str() comes to mind. These aren't guaranteed to be
constant between calls to a StringHash constructor so I can't use the
memory address as a key to the value. I could create a constructor of
StringHash which accepts a std::string&, however that certainly
doesn't guarantee that the user doesn't use c_str(), and std::string
isn't really the only possible problem anyway. What really would help
me here would be if there's any neat way to guarantee that the client
only uses string literals for my StringHash( const char* )
constructor. Is there any?

Any feedback appreciated!

Firstly, I don't think there is any way to enforce string
literals. One might be tempted to defeat passing a char*
via eg
template<int N> void bar(const char(&ca)[N]) { ...}
This will accept
bar("yowza");
but reject e.g.
bar(std_string.c_str());
The problem is it _also_ accepts
char oh_dear[] = "In trouble now";
bar(oh_dear);
Where oh_dear may eg be on the stack.

How to avoid this is a FAQ. Essentially, you document that one
shouldn't do that. No design can prevent all cases of PEBCAK, and so
it's futile to try, and generally invalid to use the impossibibility of
100% perfection in that regard as argument against something.

But I think the problem is this: you've gone down
a rat-hole looking for a solution without looking
properly at the problem, because what actual use
is:
bar("magically works only with literals"); ??

For example, an exception object where there should be no possibility of
failure for construction of the exception object.

For another example, returning a string from a function, in a general
format that requires no further processing to be used along with e.g.
dynamically created strings, without the overhead of dynamic allocation
or string copying.

Some code available at <url: http://alfsstringvalue.sourceforge.net/>,
with an example showing constant time for various operations at bottom.


When the string has to be hard-coded in, you
may as well just do this:
void bar(int tag) { .... }

so that the user can just do this:
enum my_tags {
wow, yowza, hooray
};
bar(wow);
bar(yowza);
etc

No, that's an invalid argument. Try to replace all string literals in
your code with symbolic constants. Then go on to maintain that code.


Cheers, & hth.,

- Alf
 
T

tragomaskhalos

* tragomaskhalos:




On 12 Feb, 13:30, Sebastian Karlsson <[email protected]>
wrote:
[ snip ]
If above is true there's still a very large problem remaining, users
passing strings that aren't string literals, for example
std::string::c_str() comes to mind. These aren't guaranteed to be
constant between calls to a StringHash constructor so I can't use the
memory address as a key to the value. I could create a constructor of
StringHash which accepts a std::string&, however that certainly
doesn't guarantee that the user doesn't use c_str(), and std::string
isn't really the only possible problem anyway. What really would help
me here would be if there's any neat way to guarantee that the client
only uses string literals for my StringHash( const char* )
constructor. Is there any?
Any feedback appreciated!
Firstly, I don't think there is any way to enforce string
literals. One might be tempted to defeat passing a char*
via eg
  template<int N> void bar(const char(&ca)[N]) { ...}
This will accept
  bar("yowza");
but reject e.g.
  bar(std_string.c_str());
The problem is it _also_ accepts
  char oh_dear[] = "In trouble now";
  bar(oh_dear);
Where oh_dear may eg be on the stack.

How to avoid this is a FAQ.  Essentially, you document that one
shouldn't do that.  No design can prevent all cases of PEBCAK, and so
it's futile to try, and generally invalid to use the impossibibility of
100% perfection in that regard as argument against something.
But I think the problem is this: you've gone down
a rat-hole looking for a solution without looking
properly at the problem, because what actual use
is:
  bar("magically works only with literals");  ??

For example, an exception object where there should be no possibility of
failure for construction of the exception object.

For another example, returning a string from a function, in a general
format that requires no further processing to be used along with e.g.
dynamically created strings, without the overhead of dynamic allocation
or string copying.

When the string has to be hard-coded in, you
may as well just do this:
  void bar(int tag) { .... }
so that the user can just do this:
  enum my_tags {
    wow, yowza, hooray
  };
  bar(wow);
  bar(yowza);
etc

No, that's an invalid argument.  Try to replace all string literals in
your code with symbolic constants.  Then go on to maintain that code.

Cheers, & hth.,

- Alf

I think we may be at cross-purposes; I wasn't trying to
imply that having functions that take/return only string
literals is never useful, or that one should systematically
replace them with enums (that would be crazy), just
suggesting an alternative in this particular case.
But rereading the OP's post again this is moot, because
if I understand correctly he wants to also use the text
value in an XML file, so he does need a string of some
sort. And your StringValue looks like it would do very
nicely.
 
S

Sebastian Karlsson

* tragomaskhalos:
On 12 Feb, 13:30, Sebastian Karlsson <[email protected]>
wrote:
[ snip ]
If above is true there's still a very large problem remaining, users
passing strings that aren't string literals, for example
std::string::c_str() comes to mind. These aren't guaranteed to be
constant between calls to a StringHash constructor so I can't use the
memory address as a key to the value. I could create a constructor of
StringHash which accepts a std::string&, however that certainly
doesn't guarantee that the user doesn't use c_str(), and std::string
isn't really the only possible problem anyway. What really would help
me here would be if there's any neat way to guarantee that the client
only uses string literals for my StringHash( const char* )
constructor. Is there any?
Any feedback appreciated!
Firstly, I don't think there is any way to enforce string
literals. One might be tempted to defeat passing a char*
via eg
template<int N> void bar(const char(&ca)[N]) { ...}
This will accept
bar("yowza");
but reject e.g.
bar(std_string.c_str());
The problem is it _also_ accepts
char oh_dear[] = "In trouble now";
bar(oh_dear);
Where oh_dear may eg be on the stack.
How to avoid this is a FAQ. Essentially, you document that one
shouldn't do that. No design can prevent all cases of PEBCAK, and so
it's futile to try, and generally invalid to use the impossibibility of
100% perfection in that regard as argument against something.
For example, an exception object where there should be no possibility of
failure for construction of the exception object.
For another example, returning a string from a function, in a general
format that requires no further processing to be used along with e.g.
dynamically created strings, without the overhead of dynamic allocation
or string copying.
Some code available at <url:http://alfsstringvalue.sourceforge.net/>,
with an example showing constant time for various operations at bottom.
No, that's an invalid argument. Try to replace all string literals in
your code with symbolic constants. Then go on to maintain that code.
Cheers, & hth.,

I think we may be at cross-purposes; I wasn't trying to
imply that having functions that take/return only string
literals is never useful, or that one should systematically
replace them with enums (that would be crazy), just
suggesting an alternative in this particular case.
But rereading the OP's post again this is moot, because
if I understand correctly he wants to also use the text
value in an XML file, so he does need a string of some
sort. And your StringValue looks like it would do very
nicely.

Thanks for all the feedback, I'll take a look at StringValue and see
where that takes me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top