hash()

John Marshall · Dec 5, 2005

Hi,

For strings of > 1 character, what are the chances
that hash(st) and hash(st[::-1]) would return the
same value?

My goal is to uniquely identify multicharacter strings,
all of which begin with "/" and never end with "/".
Therefore, st != st[::-1].

Thanks,
John

Scott David Daniels · Dec 5, 2005

John said:
For strings of > 1 character, what are the chances
that hash(st) and hash(st[::-1]) would return the
same value?

Why not grab a dictionary and do the stats yourself?

--Scott David Daniels
(e-mail address removed)

John Marshall · Dec 5, 2005

Scott said:
John said:

For strings of > 1 character, what are the chances
that hash(st) and hash(st[::-1]) would return the
same value?

Click to expand...

Why not grab a dictionary and do the stats yourself?

I was actually interested in the mathematical/probability
side rather than the empirical w/r to the current
hash function in python. Although I imagine I could do
a brute force test for x-character strings.

John

Christopher Subich · Dec 5, 2005

John said:
I was actually interested in the mathematical/probability
side rather than the empirical w/r to the current
hash function in python. Although I imagine I could do
a brute force test for x-character strings.

Hah. No.

At least on the version I have handy (Py 2.2.3 on Itanium2), hash
returns a 64-bit value. Brute-forcing that in any reasonable length of
time is rather impossible.

Scott David Daniels · Dec 5, 2005

John said:
I was actually interested in the mathematical/probability
side rather than the empirical w/r to the current
hash function in python.

Well, the probability depends on the universe you are choosing from.
That was why I was suggesting a dictionary: words may well have a
different distribution than arbitrary strings.

--Scott David Daniels
(e-mail address removed)

Raymond Hettinger · Dec 5, 2005

[John Marshall]

For strings of > 1 character, what are the chances
that hash(st) and hash(st[::-1]) would return the
same value?

Python's string hash algorithm is non-commutative, so a collision with
a reversed string is not likely. The exact answer depends on the
population of strings being hashed, but it's not hard to compute
collision statistics for a sampling of those strings:

collisions = len(sample) - len(set(hash(s) for s in sample))

FWIW, here is how Python computes string hash values:

static long
string_hash(PyStringObject *a)
{
register int len;
register unsigned char *p;
register long x;

len = a->ob_size;
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= a->ob_size;
if (x == -1)
x = -2;
return x;
}

My goal is to uniquely identify multicharacter strings,
all of which begin with "/" and never end with "/".
Therefore, st != st[::-1].

Just use a set -- no string reversal is needed for detection of unique
multicharacter strings..

Raymond

I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
Directory Caching, suggestions and comments?	0	May 15, 2014
Converting windows SYSTEMTIME to a standard struct tm	4	Feb 21, 2014
Code suggestions?	0	Sep 21, 2013
Help with code	4	Oct 21, 2024
Arduino Code Please Help	0	Oct 29, 2024
BITCOIN PROGRAMMING - CODE INCLUDED - needs slight modification in linux terminal - NSA please do not block	0	Nov 1, 2024
Can someone pls help me with a little algorithm script	1	Nov 28, 2024

hash()

John Marshall

Scott David Daniels

John Marshall

Christopher Subich

Scott David Daniels

Raymond Hettinger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads