hash()

J

John Marshall

Hi,

For strings of > 1 character, what are the chances
that hash(st) and hash(st[::-1]) would return the
same value?

My goal is to uniquely identify multicharacter strings,
all of which begin with "/" and never end with "/".
Therefore, st != st[::-1].

Thanks,
John
 
S

Scott David Daniels

John said:
For strings of > 1 character, what are the chances
that hash(st) and hash(st[::-1]) would return the
same value?

Why not grab a dictionary and do the stats yourself?

--Scott David Daniels
(e-mail address removed)
 
J

John Marshall

Scott said:
John said:
For strings of > 1 character, what are the chances
that hash(st) and hash(st[::-1]) would return the
same value?


Why not grab a dictionary and do the stats yourself?

I was actually interested in the mathematical/probability
side rather than the empirical w/r to the current
hash function in python. Although I imagine I could do
a brute force test for x-character strings.

John
 
C

Christopher Subich

John said:
I was actually interested in the mathematical/probability
side rather than the empirical w/r to the current
hash function in python. Although I imagine I could do
a brute force test for x-character strings.

Hah. No.

At least on the version I have handy (Py 2.2.3 on Itanium2), hash
returns a 64-bit value. Brute-forcing that in any reasonable length of
time is rather impossible.
 
S

Scott David Daniels

John said:
I was actually interested in the mathematical/probability
side rather than the empirical w/r to the current
hash function in python.
Well, the probability depends on the universe you are choosing from.
That was why I was suggesting a dictionary: words may well have a
different distribution than arbitrary strings.

--Scott David Daniels
(e-mail address removed)
 
R

Raymond Hettinger

[John Marshall]
For strings of > 1 character, what are the chances
that hash(st) and hash(st[::-1]) would return the
same value?

Python's string hash algorithm is non-commutative, so a collision with
a reversed string is not likely. The exact answer depends on the
population of strings being hashed, but it's not hard to compute
collision statistics for a sampling of those strings:

collisions = len(sample) - len(set(hash(s) for s in sample))


FWIW, here is how Python computes string hash values:

static long
string_hash(PyStringObject *a)
{
register int len;
register unsigned char *p;
register long x;

len = a->ob_size;
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= a->ob_size;
if (x == -1)
x = -2;
return x;
}


My goal is to uniquely identify multicharacter strings,
all of which begin with "/" and never end with "/".
Therefore, st != st[::-1].

Just use a set -- no string reversal is needed for detection of unique
multicharacter strings..


Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top