how to use tuple as hash key

  • Thread starter SpringFlowers AutumnMoon
  • Start date
S

SpringFlowers AutumnMoon

in Python, a hash key cannot be [1,2,3] but must be (1,2,3), a tuple.

in Ruby, a list can actually be used as a hash key... but is it true
that it is using the list's ID as the key, not the list's content...

(that's why in Python, tuples can be used as hash keys, since they are
unique by the values they contain)

what about in Ruby, since there is no tuple, how can you similate using
tuple as hash key? using list.join(",") as a key?
 
S

Stefano Crocco

Alle luned=C3=AC 17 settembre 2007, SpringFlowers AutumnMoon ha scritto:
in Python, a hash key cannot be [1,2,3] but must be (1,2,3), a tuple.

in Ruby, a list can actually be used as a hash key... but is it true
that it is using the list's ID as the key, not the list's content...

(that's why in Python, tuples can be used as hash keys, since they are
unique by the values they contain)

what about in Ruby, since there is no tuple, how can you similate using
tuple as hash key? using list.join(",") as a key?

I think that ruby uses the content of the array, not its id:

irb: 001> a=3D['a','b']
["a", "b"]
irb: 002> h =3D {a =3D> 2}
{["a", "b"]=3D>2}
irb: 003> b =3D %w[a b]
["a", "b"]
irb: 004> h
2
irb: 005> a.object_id =3D=3D b.object_id
false

If the object_id of the two arrays were used as hash keys, h would have=
=20
returned nil, since the two ids are different.

You must be careful when using mutable objects as hash keys, since modifyin=
g=20
the object will also modify the key, and requires to re-index the hash (usi=
ng=20
Hash#rehash). For example, referring to the previous irb session:

irb: 010> a << 'c'
["a", "b", "c"]
irb: 011> h[a]
nil
irb: 012> h
nil
irb: 014> h.rehash
{["a", "b", "c"]=3D>2}
irb: 015> h[a]
2

I hope this helps

Stefano
 
R

Rick DeNatale

in Python, a hash key cannot be [1,2,3] but must be (1,2,3), a tuple.

in Ruby, a list can actually be used as a hash key... but is it true
that it is using the list's ID as the key, not the list's content...

Not sure that this is true, since I don't understand what you mean.

(that's why in Python, tuples can be used as hash keys, since they are
unique by the values they contain)

what about in Ruby, since there is no tuple, how can you similate using
tuple as hash key? using list.join(",") as a key?

There's a better way. You can use just about any object as a key in a
ruby hash, but you need to be aware of a few key things.

1) Ruby hashes use two methods on a key to insert it or to search
for it. Those two methods are
hash and eql? Two objects are considered the same as far as the
hash is concerned if first
their hash values are equal, and second if they compare using
the :eql? method. When you
insert into a hash using a key, it's your responsibility to
ensure that if two objects are :eql? they
will return the same hash value. Ruby arrays (and most if not
all standard Ruby objects do)

2) In most cases Ruby hashes capture a reference to a key when it's inserted.

3) If you change an object which is referenced as a key in a hash in
such a way that it's hash
value changes, then the hash won't know about it without help,
and you'll likely be surprised.

Here's some code to walk through to illustrate the above:

# Make two different arrays, referenced by variables a and b respectively
a = [1, 2, 3]
b = [1, 2, 3]

# They are eql
a.eql? b # => true
# But not the same object
a.equal? b # => false
a.hash .eql? b.hash # => true

# Now create a hash using a as a key
h = {a => "Fred"}

# We can retrieve using the object referenced by a
h[a] # => "Fred"
# And using the variable referenced by b
h # => "Fred"
# Or using an new literal array which is eql
h[[1,2,3]] # => "Fred"
# Note that a and b STILL refer to different objects
a.eql? b # => true
a.equal? b # => false

# Note also that hash.keys does not return the original object.
h.keys.first.eql? a # => true
h.keys.first.eql? b # => true

# Now lets change a
a << 4
# a and b should no longer be eql
a.eql? b # => false
# What happens if we access the hash now
h[a] # => nil
# And using the variable referenced by b
h # => nil
# Or using an new literal array which is eql
h[[1,2,3]] # => nil
# What's happened is that the hash actually DID capture the key originally.
# The internal key changed so we can't find a value using [1,2,3]

# Notice that none of the above found a value even h[a], despite the
fact that a does
# in fact reference the new key:
a # => [1, 2, 3, 4]

# And we can't find it with an eql literal array either
h[[1,2,3,4]] # => nil

# The hash didn't know about the change to the key, so the hashing is
out of sync with reality
# Rehashing the hash rights things:
h.rehash
h[[1,2,3,4]] # => "Fred"

# So if we WANT to allow changing keys after putting them in a hash we
need to rehash after a change
# If we don't want to allow changing keys, then dup the key when you
put it in the hash.

h2 = {a.dup => "Ethel"}

h2[a] # => "Ethel"
a # => [1, 2, 3, 4]
h2[[1,2,3,4]] #=>
a << 5

# Since a has changed we don't expect to find it
h2[a] # => nil
# But since we duped it the key in the hash didn't change
h2[[1,2,3,4,5]] # => nil
h2[[1,2,3,4]] # =>
 
S

SpringFlowers AutumnMoon

Stefano said:
I think that ruby uses the content of the array, not its id:

irb: 001> a=['a','b']
["a", "b"]
irb: 002> h = {a => 2}
{["a", "b"]=>2}
irb: 003> b = %w[a b]
["a", "b"]
irb: 004> h
2
irb: 005> a.object_id == b.object_id
false


wow... this is deep... the reason I thought it uses the ID instead of
the value is this:

the code:
=======================

h = {}

for i in 1..2
for j in 1..2
h[[i,j]] = "ha"
end
end

p h


h2 = {}
a = [0, 0]

for i in 1..2
for j in 1..2
a[0] = i
a[1] = j
h2[a] = "ha"
end
end

p h2

=======================
will produce:

{[1, 2]=>"ha", [2, 1]=>"ha", [1, 1]=>"ha", [2, 2]=>"ha"}
{[2, 2]=>"ha", [2, 2]=>"ha", [2, 2]=>"ha", [2, 2]=>"ha"}
 
S

Stefano Crocco

Alle luned=C3=AC 17 settembre 2007, SpringFlowers AutumnMoon ha scritto:
h2 =3D {}
a =3D [0, 0]

for i in 1..2
=C2=A0 for j in 1..2
=C2=A0 =C2=A0 a[0] =3D i
=C2=A0 =C2=A0 a[1] =3D j
=C2=A0 =C2=A0 h2[a] =3D "ha"
=C2=A0 end
end

p h2

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
will produce:

...
{[2, 2]=3D>"ha", [2, 2]=3D>"ha", [2, 2]=3D>"ha", [2, 2]=3D>"ha"}

This happens because you don't call h2.rehash after modifying the array you=
=20
use as key (look at the ri documentation for Hash#rehash for a longer=20
explaination). If you insert the line=20

h2.rehash

after=20

h2[a] =3D "ha"

you'll get the expected behaviour.

Stefano
 
S

SpringFlowers AutumnMoon

Stefano said:
insert the line

h2.rehash

after

h2[a] = "ha"

you'll get the expected behaviour.

actually if the code is

for i in 1..2
for j in 1..2
a[0] = i
a[1] = j
h2[a] = "ha"
h2.rehash
end
end

p h2

it will print out:

{[2, 2]=>"ha"}

=========================
if i use

h2 = {}
a = [0, 0]

for i in 1..2
for j in 1..2
a[0] = i
a[1] = j
h2[a.dup] = "ha"
end
end

p h2

then it will print out

{[1, 2]=>"ha", [2, 1]=>"ha", [1, 1]=>"ha", [2, 2]=>"ha"}


but my question is, if i run the loop 10 million times (for some
statistics), then it will actually duplicate the array 10 million times,
which is memory hungry. so i wonder what is a better way. if a is just
a list of numbers, i can certainly use a.join(",") as the key... but
looping 10 million times will also create "frozen strings" in the hash
table 10 million times too?
 
S

SpringFlowers AutumnMoon

SpringFlowers said:
but my question is, if i run the loop 10 million times (for some
statistics), then it will actually duplicate the array 10 million times,
which is memory hungry.

actually, i tried running it and watch ruby.exe in Windows or ruby in
Mac keep on being 2 or 3MB without every growing. (i set the loop to
100 million and dup the key and watched it keep on running with the
program size about the same while running).

I think it is due to the hash table's size is constant. so when Hash
called hash() to find the slot, and do the eql? and find that they are
the same. and so it either places that dup key in that slot or don't
place it there at all (doesn't matter), and the un-referenced
dupulicates are garbage collected. so that's why the size of the
program keep on being 2 to 3MB in the RAM.

if however, i set an array just_test and always do a just_test <<
key where key is the dup, then the program will grow from 3MB to 6MB,
to 12MB, to 20MB and it just keeps growing. (just_test is an array to
keep a reference to the dup objects so that they won't be garbage
collected). So looks like the garbage collection in Ruby is doing a
good job.
 
S

Stefano Crocco

Alle marted=C3=AC 18 settembre 2007, SpringFlowers AutumnMoon ha scritto:
actually, i tried running it and watch ruby.exe in Windows or ruby in
Mac keep on being 2 or 3MB without every growing. (i set the loop to
100 million and dup the key and watched it keep on running with the
program size about the same while running).

I think it is due to the hash table's size is constant. so when Hash
called hash() to find the slot, and do the eql? and find that they are
the same. and so it either places that dup key in that slot or don't
place it there at all (doesn't matter), and the un-referenced
dupulicates are garbage collected. so that's why the size of the
program keep on being 2 to 3MB in the RAM.

if however, i set an array just_test and always do a just_test <<
key where key is the dup, then the program will grow from 3MB to 6MB,
to 12MB, to 20MB and it just keeps growing. (just_test is an array to
keep a reference to the dup objects so that they won't be garbage
collected). So looks like the garbage collection in Ruby is doing a
good job.

I think it's easier (and more efficient) to create a new array with the=20
current values of i and j at each iteration, instead of modifying one singl=
e=20
array and duping it.

for i in 1..2
for j in 1..2
h2[[i,j]] =3D "ha"
end
end

Stefano
 
S

SpringFlowers AutumnMoon

Stefano said:
I think it's easier (and more efficient) to create a new array with the
current values of i and j at each iteration, instead of modifying one
single
array and duping it.

for i in 1..2
for j in 1..2
h2[[i,j]] = "ha"
end
end

that is already listed in the initial program. the main requirement is
that I have a list (an array), and how do I use it as a hash key.
 
R

Rick DeNatale

Stefano said:
I think it's easier (and more efficient) to create a new array with the
current values of i and j at each iteration, instead of modifying one
single
array and duping it.

for i in 1..2
for j in 1..2
h2[[i,j]] = "ha"
end
end

that is already listed in the initial program. the main requirement is
that I have a list (an array), and how do I use it as a hash key.


But this code does meet this requirement.

[i,j] IS an array, so

h2[[i,j]] = "ha"

is storing "ha" in the hash with a key which is an array containing
references to the objects currently referenced by i and j.

I think you might be tripping over the difference between a variable
and an object here. I know that lots of posters to this forum do.
Variables aren't objects per se. A variable references an object.
Variable assignment copies the reference NOT the object.

So, at the risk of being pedantic.

a = [i,j] # a now refers to the object which was generated by the
literal [i,j]
h[a] = "foo" # The hash now has a key which references the same object
a = "Something else" # a now refers to the string, but the hash
still references the array through the key.

h[["betty", "boop"] = "boop, boop, a doo" # h now ALSO references
the object generated by the array literal as a new key.

I would say that best practice for hash keys is to use either literal
values or expressions for keys, OR variables whose scope doesn't last
outside of the code which stores the key,value, you want to control
cases where an object being used as a key in a hash changes outside of
the hash.

You seem to be worried about the memory requirements for storing the
keys. I'm not sure there's much that can be done about that though,
If you want to have a hash with 10,000 entries, then all 10,000 keys
are going to have to be unique and will have to be stored.

When that becomes an issue, you probably want to explore other
implementations which store the key->value mapping externally, say a
database or something in between.
 
S

SpringFlowers AutumnMoon

Rick said:
that is already listed in the initial program. the main requirement is
that I have a list (an array), and how do I use it as a hash key.


But this code does meet this requirement.

[i,j] IS an array, so

h2[[i,j]] = "ha"

is storing "ha" in the hash with a key which is an array containing
references to the objects currently referenced by i and j.

my application is this: i have an array with numbers in it.
and i will have a shuffle method to shuffle numbers in that array.
and then i have to use that array's value as a hash key, so as to

h[a] ||= 0 # if nil, then set to 0
h[a] += 1 # to tally up the count

(and repeat the shuffling 100,000 times and keep the count)

so in this case, i don't think rehash will do the job, as rehash merely
look at what "a" currently references to, and rebuild the hash. it will
only have 1 key.

so what I will do is either to use

key = a.dup
h[key] ||= 0
h[key] += 1

or even safer:

key = a.dup
key.freeze
h[key] ||= 0
h[key] += 1

so this is like, to use an array that can change as a tuple.
 
S

SpringFlowers AutumnMoon

Rick said:
On 9/18/07, SpringFlowers AutumnMoon <[email protected]> wrote:
I think you might be tripping over the difference between a variable
and an object here. I know that lots of posters to this forum do.
Variables aren't objects per se. A variable references an object.
Variable assignment copies the reference NOT the object.

actually, my posting of "The meaning of a = b in object oriented
languages"
reminded me of what I learned in Java... that the variables are not
objects
themselves. the variable merely references an object, like a pointer to
an
object.

the main confusion I had with hash (and i think now clear) is that Hash
first
used hash() on the key's value to find a hash slot. and then at the
slot, the
mapping is done not for the key's value to the final value.
the mapping merely stores the pointer to the key object, and the pointer
to the value object.

so if the key object changes, then the hash() will most likely hash to
another
hash slot, and then found that there is no mapping there, and therefore
store
another pointer to key, pointer to value pair there.

thus the result:

irb(main):084:0> a = [1,2]
=> [1, 2]
irb(main):085:0> h = {}
=> {}
irb(main):086:0> h[a] = "ha"
=> "ha"
irb(main):087:0> a[1] = 99
=> 99
irb(main):088:0> a
=> [1, 99]
irb(main):089:0> h[a] = "hee"
=> "hee"
irb(main):090:0> h
=> {[1, 99]=>"ha", [1, 99]=>"hee"}
irb(main):091:0> h[[1,2]]
=> nil
irb(main):092:0> h[[1,99]]
=> "hee"
irb(main):093:0> h.rehash
=> {[1, 99]=>"hee"}
irb(main):094:0>


Example 2:

irb(main):094:0> h = {}
=> {}
irb(main):095:0> b = "haha"
=> "haha"
irb(main):096:0> h["a"] = b
=> "haha"
irb(main):097:0> h
=> {"a"=>"haha"}
irb(main):098:0> b[3] = "i"
=> "i"
irb(main):099:0> h
=> {"a"=>"hahi"}
irb(main):100:0> b = "another"
=> "another"
irb(main):101:0> h
=> {"a"=>"hahi"}
irb(main):102:0>
 
S

SpringFlowers AutumnMoon

Rick said:
On 9/18/07, SpringFlowers AutumnMoon <[email protected]> wrote:
You seem to be worried about the memory requirements for storing the
keys. I'm not sure there's much that can be done about that though,
If you want to have a hash with 10,000 entries, then all 10,000 keys
are going to have to be unique and will have to be stored.

fortunately, if the loop "dup" the key 100 million times, but the hash
only has 10,000 unique key values, then only 10,000 objects will exist.
the other "dup" keys will be garbage collected. (and quickly too, as
the memory usage never went up to 4 or 5 MB).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top