Justus said:
I was told that searching by integers instead of strings would
speed up performance when using large mysql tables. Is that not so?
This is what's known as "premature optimisation".
The general rule is: build your application first. If it has performance
problems, profile it. Only after profiling to determine where it
*really* is slow, then modify it. Even the most experienced of
architects and programmers often get it wrong, when they rely on their
intuition as to where optimisation actually benefits you.
I'd say that if you have a few hundred thousand rows in a table, and the
column you are searching on is indexed, then I doubt you will get any
noticeable speed improvement searching on an integer rather than a
string column.
However, note that it is common practice in database applications to
have an integer primary key assigned from a sequence.
1 : "Peter"
2 : "James"
3 : "John"
4 : "Andrew"
... etc
In that way, if you happen to know the ID of the row you want already,
you can jump to it by ID. But if you want to search for it by name -
which might return 0, 1 or more results - you can do that efficiently
too as long as that column is indexed.
What you seem to be asking for is to allocate the IDs in such a way that
given the string, you can calculate the ID off-line without performing a
database search. But to avoid the possibility of two strings giving the
same integer, then you would have to use a strong cryptographic hash
like SHA1. This will give you an integer of size 2^160, which is very
large; so large that actually just storing the string (and searching on
it) will likely be more efficient anyway. Furthermore, the integers
themselves will effectively be randomly distributed, rather than a
linear sequence, so the same sort of tree index and lookup will be
required.