How to write a good hash method for my struct-like class?

J

Jeremy Henty

I have an RGB colour class which wraps a C struct { uchar r,g,b; } .
I have defined #eql? to be true if two objects have the same r, g and
b values. This means I have to define #hash as well, right? So
what's a good algorithm? XOR the r, g and b values? Or concatenate
them into a 24-bit number? Or something else?

Thanks in advance,

Jeremy Henty
 
C

coachhilton

Jeremy - this may not be too much help to you but I recall reading a
paper in the Communications of the ACM about the creation of an optimal
hash relative to a given input space. I can't recall the issue but it
was in the last 5 or so years. The paper was very interesting and
accessible so if you can find it perhaps it can guide you (although you
may have to join the ACM to ge a copy ;) Otherwise, I'd google the
topic.

Ken
 
D

Dale Martenson

You don't have to define #hash. It depends on the behavior you wish to
have. I like to think of it as:

#eql? and #== are used to check equivalence.
#equal? is used to check sameness as it relates to object space.
#hash is used to check sameness as it relates to Hash only.

That means that if you wish to treat equivalent objects as the same key
values for a Hash you must define #hash. Otherwise, they will be seen
as unique key values for the Hash.

Please correct me if I am wrong, but this is my understanding.

I think generating a 24-bit number is a fine idea. I even use that
method in my example below.

Here is a long boring example that attempts to illustrate some of the
differences:

class RGB1
attr_accessor :r, :g, :b
def initialize(r,g,b)
@r = r
@g = g
@b = b
end
end

class RGB2 < RGB1
def eql?(x)
((@r==x.r)&&(@g==x.g)&&(@b==x.b))
end
def ==(x)
((@r==x.r)&&(@g==x.g)&&(@b==x.b))
end
end

class RGB3 < RGB2
def hash
@r<<16|@g<<8|@b
end
end

def test_equivalence( class1, class2 )
puts "#{class1}.new(255,0,0).eql?(#{class2}.new(255,0,0)):
#{class1.new(255,0,0).eql?(class2.new(255,0,0))}"
puts "#{class1}.new(255,0,0) == #{class2}.new(255,0,0):
#{class1.new(255,0,0) == class2.new(255,0,0)}"
puts "#{class1}.new(255,0,0).equal?(#{class2}.new(255,0,0)):
#{class1.new(255,0,0).equal?(class2.new(255,0,0))}"
puts "-----------------------------------------------------------"
end

def test_hash( class1 )
puts "Using #{class1}:"
h = {}
h[class1.new(255,0,0)] = 10
h[class1.new(255,0,0)] = 20
h.each_pair {|k,v| puts "h[#{k}]: #{v}"}
puts "-----------------------------------------------------------"
end

test_equivalence( RGB1, RGB2 )
test_equivalence( RGB1, RGB1 )
test_equivalence( RGB2, RGB2 )
test_equivalence( RGB3, RGB3 )

test_hash(RGB2)
test_hash(RGB3)


++++++++++++++++ END OF CODE ++++++++++++++++

The output from the above would be:

RGB1.new(255,0,0).eql?(RGB2.new(255,0,0)): false
RGB1.new(255,0,0) == RGB2.new(255,0,0): false
RGB1.new(255,0,0).equal?(RGB2.new(255,0,0)): false
-----------------------------------------------------------
RGB1.new(255,0,0).eql?(RGB1.new(255,0,0)): false
RGB1.new(255,0,0) == RGB1.new(255,0,0): false
RGB1.new(255,0,0).equal?(RGB1.new(255,0,0)): false
-----------------------------------------------------------
RGB2.new(255,0,0).eql?(RGB2.new(255,0,0)): true
RGB2.new(255,0,0) == RGB2.new(255,0,0): true
RGB2.new(255,0,0).equal?(RGB2.new(255,0,0)): false
 
R

Robert Klemme

Dale Martenson said:
You don't have to define #hash. It depends on the behavior you wish to
have. I like to think of it as:

#eql? and #== are used to check equivalence.
#equal? is used to check sameness as it relates to object space.
#hash is used to check sameness as it relates to Hash only.

That means that if you wish to treat equivalent objects as the same
key values for a Hash you must define #hash. Otherwise, they will be
seen as unique key values for the Hash.

Please correct me if I am wrong, but this is my understanding.

That's formally correct but I recommend to implement #hash, #eql? and #== at
the same time to make it consistent - even if the hash implementation is not
optimal or remains unused. Later you'll stuff objects into a Hash and
wonder why it won't work out the way you expected.
I think generating a 24-bit number is a fine idea. I even use that
method in my example below.

Yep, that's what I'd do in this case as well.

A much simpler approach is to just do

Color = Struct.new :r, :g, :b

and get #hash, #== and #eql? for free:
=> true

And you get even more:
c1.to_a => [1, 2, 3]
c1.members
=> ["r", "g", "b"]

Kind regards

robert
 
J

Jeremy Henty

... I recall reading a paper in the Communications of the ACM about
the creation of an optimal hash relative to a given input space. I
can't recall the issue but it was in the last 5 or so years. The
paper was very interesting and accessible so if you can find it
perhaps it can guide you (although you may have to join the ACM to
ge a copy ;) Otherwise, I'd google the topic.

Yes, Google turns up lots of references to this sort of thing, but so
far I've found nothing I can download without joining various
organisations. I was really hoping to find a simple tip for writing
good hash functions for the hash package that Ruby uses internally.

Regards,

Jeremy Henty
 
J

Jeremy Henty

... I recommend to implement #hash, #eql? and #== at the same time
to make it consistent - even if the hash implementation is not
optimal or remains unused. Later you'll stuff objects into a Hash
and wonder why it won't work out the way you expected.

I agree, particularly as this code is going into Ruby-FLTK . I don't
want to inflict unexpected behaviour on users.
Yep, that's what I'd do in this case as well.

I went for XOR-ing them together, so that each field affects the hash
in the "same way". Of course that means you have only 256 possible
hash values. Maybe this would affect really large hashes badly? I'm
guessing here, I've never dealt with these issues before.

What would be *really* nice is a way to instrument Ruby's hashes, then
throw real-life data at them and just see how it distributes itself
between the hash buckets.
A much simpler approach is to just do

Color = Struct.new :r, :g, :b

and get #hash, #== and #eql? for free:

That's what I'd do in pure Ruby but the FLTK C API uses uchars. The
code is cleaner if the Ruby/FLTK color objects wrap a C struct { uchar
r, g, b; } . I wouldn't be doing this in C if I didn't have to!
And you get even more:
c1.to_a => [1, 2, 3]
c1.members
=> ["r", "g", "b"]

Good point! But it's easy enough to add these methods by loading a
bit of Ruby after dl-ing the extension. (/me adds that to the TO DO
list.)

Thanks for all the suggestions,

Jeremy Henty
 
F

Florian Groß

Jeremy said:
I have an RGB colour class which wraps a C struct { uchar r,g,b; } .
I have defined #eql? to be true if two objects have the same r, g and
b values. This means I have to define #hash as well, right? So
what's a good algorithm? XOR the r, g and b values? Or concatenate
them into a 24-bit number? Or something else?

I'd just do [r, g, b].hash. :)
 
J

Jeremy Henty

Jeremy said:
This means I have to define #hash as well, right? So what's a good
algorithm? XOR the r, g and b values? Or concatenate them into a
24-bit number? Or something else?

I'd just do [r, g, b].hash. :)

D'oh! <slap> Luckily the Release Notes state "the implementation may
change". :)

Regards,

Jeremy Henty
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,681
Members
48,796
Latest member
Greg L.

Latest Threads

Top