How to write a good hash method for my struct-like class?

Jeremy Henty · Apr 27, 2006

I have an RGB colour class which wraps a C struct { uchar r,g,b; } .
I have defined #eql? to be true if two objects have the same r, g and
b values. This means I have to define #hash as well, right? So
what's a good algorithm? XOR the r, g and b values? Or concatenate
them into a 24-bit number? Or something else?

Thanks in advance,

Jeremy Henty

coachhilton · Apr 28, 2006

Jeremy - this may not be too much help to you but I recall reading a
paper in the Communications of the ACM about the creation of an optimal
hash relative to a given input space. I can't recall the issue but it
was in the last 5 or so years. The paper was very interesting and
accessible so if you can find it perhaps it can guide you (although you
may have to join the ACM to ge a copy

Otherwise, I'd google the
topic.

Ken

Dale Martenson · Apr 28, 2006

You don't have to define #hash. It depends on the behavior you wish to
have. I like to think of it as:

#eql? and #== are used to check equivalence.
#equal? is used to check sameness as it relates to object space.
#hash is used to check sameness as it relates to Hash only.

That means that if you wish to treat equivalent objects as the same key
values for a Hash you must define #hash. Otherwise, they will be seen
as unique key values for the Hash.

Please correct me if I am wrong, but this is my understanding.

I think generating a 24-bit number is a fine idea. I even use that
method in my example below.

Here is a long boring example that attempts to illustrate some of the
differences:

class RGB1
attr_accessor :r, :g, :b
def initialize(r,g,b)
@r = r
@g = g
@b = b
end
end

class RGB2 < RGB1
def eql?(x)
((@r==x.r)&&(@g==x.g)&&(@b==x.b))
end
def ==(x)
((@r==x.r)&&(@g==x.g)&&(@b==x.b))
end
end

class RGB3 < RGB2
def hash
@r<<16|@g<<8|@b
end
end

def test_equivalence( class1, class2 )
puts "#{class1}.new(255,0,0).eql?(#{class2}.new(255,0,0)):
#{class1.new(255,0,0).eql?(class2.new(255,0,0))}"
puts "#{class1}.new(255,0,0) == #{class2}.new(255,0,0):
#{class1.new(255,0,0) == class2.new(255,0,0)}"
puts "#{class1}.new(255,0,0).equal?(#{class2}.new(255,0,0)):
#{class1.new(255,0,0).equal?(class2.new(255,0,0))}"
puts "-----------------------------------------------------------"
end

def test_hash( class1 )
puts "Using #{class1}:"
h = {}
h[class1.new(255,0,0)] = 10
h[class1.new(255,0,0)] = 20
h.each_pair {|k,v| puts "h[#{k}]: #{v}"}
puts "-----------------------------------------------------------"
end

test_equivalence( RGB1, RGB2 )
test_equivalence( RGB1, RGB1 )
test_equivalence( RGB2, RGB2 )
test_equivalence( RGB3, RGB3 )

test_hash(RGB2)
test_hash(RGB3)

++++++++++++++++ END OF CODE ++++++++++++++++

The output from the above would be:

RGB1.new(255,0,0).eql?(RGB2.new(255,0,0)): false
RGB1.new(255,0,0) == RGB2.new(255,0,0): false
RGB1.new(255,0,0).equal?(RGB2.new(255,0,0)): false
-----------------------------------------------------------
RGB1.new(255,0,0).eql?(RGB1.new(255,0,0)): false
RGB1.new(255,0,0) == RGB1.new(255,0,0): false
RGB1.new(255,0,0).equal?(RGB1.new(255,0,0)): false
-----------------------------------------------------------
RGB2.new(255,0,0).eql?(RGB2.new(255,0,0)): true
RGB2.new(255,0,0) == RGB2.new(255,0,0): true
RGB2.new(255,0,0).equal?(RGB2.new(255,0,0)): false

Robert Klemme · Apr 29, 2006

Dale Martenson said:
You don't have to define #hash. It depends on the behavior you wish to
have. I like to think of it as:

#eql? and #== are used to check equivalence.
#equal? is used to check sameness as it relates to object space.
#hash is used to check sameness as it relates to Hash only.

That means that if you wish to treat equivalent objects as the same
key values for a Hash you must define #hash. Otherwise, they will be
seen as unique key values for the Hash.

Please correct me if I am wrong, but this is my understanding.

That's formally correct but I recommend to implement #hash, #eql? and #== at
the same time to make it consistent - even if the hash implementation is not
optimal or remains unused. Later you'll stuff objects into a Hash and
wonder why it won't work out the way you expected.

I think generating a 24-bit number is a fine idea. I even use that
method in my example below.

Yep, that's what I'd do in this case as well.

A much simpler approach is to just do

Color = Struct.new :r, :g, :b

and get #hash, #== and #eql? for free:
=> true

And you get even more:

c1.to_a => [1, 2, 3]
c1.members

Click to expand...

=> ["r", "g", "b"]

Kind regards

robert

Jeremy Henty · Apr 30, 2006

... I recall reading a paper in the Communications of the ACM about
the creation of an optimal hash relative to a given input space. I
can't recall the issue but it was in the last 5 or so years. The
paper was very interesting and accessible so if you can find it
perhaps it can guide you (although you may have to join the ACM to
ge a copy Otherwise, I'd google the topic.

Yes, Google turns up lots of references to this sort of thing, but so
far I've found nothing I can download without joining various
organisations. I was really hoping to find a simple tip for writing
good hash functions for the hash package that Ruby uses internally.

Regards,

Jeremy Henty

Jeremy Henty · Apr 30, 2006

... I recommend to implement #hash, #eql? and #== at the same time
to make it consistent - even if the hash implementation is not
optimal or remains unused. Later you'll stuff objects into a Hash
and wonder why it won't work out the way you expected.

I agree, particularly as this code is going into Ruby-FLTK . I don't
want to inflict unexpected behaviour on users.

Yep, that's what I'd do in this case as well.

I went for XOR-ing them together, so that each field affects the hash
in the "same way". Of course that means you have only 256 possible
hash values. Maybe this would affect really large hashes badly? I'm
guessing here, I've never dealt with these issues before.

What would be *really* nice is a way to instrument Ruby's hashes, then
throw real-life data at them and just see how it distributes itself
between the hash buckets.

A much simpler approach is to just do

Color = Struct.new :r, :g, :b

and get #hash, #== and #eql? for free:

That's what I'd do in pure Ruby but the FLTK C API uses uchars. The
code is cleaner if the Ruby/FLTK color objects wrap a C struct { uchar
r, g, b; } . I wouldn't be doing this in C if I didn't have to!

And you get even more:

c1.to_a => [1, 2, 3]
c1.members

Click to expand...

Click to expand...

=> ["r", "g", "b"]

Good point! But it's easy enough to add these methods by loading a
bit of Ruby after dl-ing the extension. (/me adds that to the TO DO
list.)

Thanks for all the suggestions,

Jeremy Henty

Florian Groß · May 1, 2006

Jeremy said:
I have an RGB colour class which wraps a C struct { uchar r,g,b; } .
I have defined #eql? to be true if two objects have the same r, g and
b values. This means I have to define #hash as well, right? So
what's a good algorithm? XOR the r, g and b values? Or concatenate
them into a 24-bit number? Or something else?

I'd just do [r, g, b].hash.

Jeremy Henty · May 1, 2006

Jeremy said:
Jeremy said:

This means I have to define #hash as well, right? So what's a good
algorithm? XOR the r, g and b values? Or concatenate them into a
24-bit number? Or something else?

Click to expand...

I'd just do [r, g, b].hash.

D'oh! <slap> Luckily the Release Notes state "the implementation may
change".

Regards,

Jeremy Henty

Database Manager: A C++ Console Application	14	May 12, 2025
Help: How to pass a struct as a pointer to Win32API?	3	Jul 26, 2010
PATCH to make internal Hash class retain order...	20	Aug 12, 2006
why, or when to, use a class method?	5	Oct 31, 2007
Testing C include file for C++ compatibility without a C++ caller?	3	Oct 17, 2013
Interning own classes like strings for speed and size?	11	Dec 27, 2010
Rectangle: struct or class?	15	Aug 16, 2004
For performance, write it in C	130	Jul 26, 2006

How to write a good hash method for my struct-like class?

Jeremy Henty

coachhilton

Dale Martenson

Robert Klemme

Jeremy Henty

Jeremy Henty

Florian Groß

Jeremy Henty

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads