String#hash changed in Ruby 1.9?

D

David Palm

Hi all,
in ruby 1.8.7:
david@trince ~$ ruby -e 'puts "abc".hash'
833038373
david@trince ~$ ruby -e 'puts "abc".hash'
833038373
david@trince ~$ ruby -e 'puts "abc".hash'
833038373

[always the same number]

in ruby 1.9.1:
david@trince ~$ ruby -e 'puts "abc".hash'
402929305
david@trince ~$ ruby -e 'puts "abc".hash'
-403532784
david@trince ~$ ruby -e 'puts "abc".hash'
-650364342

What happened? Is this intentional? Rationale? Any tips on how to replace it?
 
S

Sebastian Hungerecker

Am Montag 04 Mai 2009 16:22:01 schrieb David Palm:
in ruby 1.9.1:
david@trince ~$ ruby -e 'puts "abc".hash'
402929305
david@trince ~$ ruby -e 'puts "abc".hash'
-403532784
david@trince ~$ ruby -e 'puts "abc".hash'
-650364342

What happened? Is this intentional?

1.9 uses murmurhash(http://murmurhash.googlepages.com/) with a random seed
which is generated once per application-run.
Any tips on how to replace it?

What does it hurt if the hash value of a string does not remain constant
between runs of the application?

HTH,
Sebastian
 
D

David Palm

Any tips on how to replace it?
What does it hurt if the hash value of a string does not remain constant
between runs of the application?

In my case it's pretty bad. I use it in a command line utility to cache rake tasks. I create one cachefile for each directory, naming them using the String#hash of the full path (Dir.pwd.hash). If the hash is different the next time the program runs the cache lookup fails (and I get a new cache file instead of the old one).

So, I don't need anything fancy, just an equivalent to Dir.pwd.hash that stay consistent. Do I need to MD5 it? Feels like overkill. Why was this changed in the first place?
 
R

Robert Klemme

In my case it's pretty bad. I use it in a command line utility to cache rake tasks. I create one cachefile for each directory, naming them using the String#hash of the full path (Dir.pwd.hash). If the hash is different the next time the program runs the cache lookup fails (and I get a new cache file instead of the old one).

Hm... But you do admit that this is a bit abusive, do you? Especially
since there are no guarantees that you won't have any collisions with a
hash value like the one returned by #hash.

How about storing your cache files with a fixed name in the original
directory? Or have a file with metadata (mapping from path to cache
file name)?
So, I don't need anything fancy, just an equivalent to Dir.pwd.hash that stay consistent. Do I need to MD5 it? Feels like overkill. Why was this changed in the first place?

That's an interesting question. I'm curious as well. Maybe the changes
are just a side effect of a new - supposedly better - hashing algorithm.

Kind regards

robert
 
D

David Palm

Hm... But you do admit that this is a bit abusive, do you?
Especially since there are no guarantees that you won't have any
collisions with a hash value like the one returned by #hash.

Oh yes, it's just a quick and convenient way of doing it. Dunno if I'd call it "abusive", but it's sure not military grade programming...
How about storing your cache files with a fixed name in the original
directory? Or have a file with metadata (mapping from path to cache
file name)?

Fixed name won't work; most directories are scm tracked so it'd be a mess to keep the cache files out of the way. One big(ish) cache file might work. Maybe even a sqlite db. Have to run some benchmark on that.
That's an interesting question. I'm curious as well. Maybe the
changes are just a side effect of a new - supposedly better - hashing
algorithm.

The link sebastian provided (http://murmurhash.googlepages.com/) was interesting but not exhaustive and I still don't know when/how/why the behaviour was changed. Perhaps the ml archives will tell?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top