how to use tuple as hash key

Discussion in 'Ruby' started by SpringFlowers AutumnMoon, Sep 17, 2007.

  1. in Python, a hash key cannot be [1,2,3] but must be (1,2,3), a tuple.

    in Ruby, a list can actually be used as a hash key... but is it true
    that it is using the list's ID as the key, not the list's content...

    (that's why in Python, tuples can be used as hash keys, since they are
    unique by the values they contain)

    what about in Ruby, since there is no tuple, how can you similate using
    tuple as hash key? using list.join(",") as a key?
    --
    Posted via http://www.ruby-forum.com/.
     
    SpringFlowers AutumnMoon, Sep 17, 2007
    #1
    1. Advertising

  2. Alle luned=C3=AC 17 settembre 2007, SpringFlowers AutumnMoon ha scritto:
    > in Python, a hash key cannot be [1,2,3] but must be (1,2,3), a tuple.
    >
    > in Ruby, a list can actually be used as a hash key... but is it true
    > that it is using the list's ID as the key, not the list's content...
    >
    > (that's why in Python, tuples can be used as hash keys, since they are
    > unique by the values they contain)
    >
    > what about in Ruby, since there is no tuple, how can you similate using
    > tuple as hash key? using list.join(",") as a key?


    I think that ruby uses the content of the array, not its id:

    irb: 001> a=3D['a','b']
    ["a", "b"]
    irb: 002> h =3D {a =3D> 2}
    {["a", "b"]=3D>2}
    irb: 003> b =3D %w[a b]
    ["a", "b"]
    irb: 004> h
    2
    irb: 005> a.object_id =3D=3D b.object_id
    false

    If the object_id of the two arrays were used as hash keys, h would have=
    =20
    returned nil, since the two ids are different.

    You must be careful when using mutable objects as hash keys, since modifyin=
    g=20
    the object will also modify the key, and requires to re-index the hash (usi=
    ng=20
    Hash#rehash). For example, referring to the previous irb session:

    irb: 010> a << 'c'
    ["a", "b", "c"]
    irb: 011> h[a]
    nil
    irb: 012> h
    nil
    irb: 014> h.rehash
    {["a", "b", "c"]=3D>2}
    irb: 015> h[a]
    2

    I hope this helps

    Stefano
     
    Stefano Crocco, Sep 17, 2007
    #2
    1. Advertising

  3. On 9/17/07, SpringFlowers AutumnMoon <> wrote:
    > in Python, a hash key cannot be [1,2,3] but must be (1,2,3), a tuple.
    >
    > in Ruby, a list can actually be used as a hash key... but is it true
    > that it is using the list's ID as the key, not the list's content...


    Not sure that this is true, since I don't understand what you mean.


    > (that's why in Python, tuples can be used as hash keys, since they are
    > unique by the values they contain)
    >
    > what about in Ruby, since there is no tuple, how can you similate using
    > tuple as hash key? using list.join(",") as a key?


    There's a better way. You can use just about any object as a key in a
    ruby hash, but you need to be aware of a few key things.

    1) Ruby hashes use two methods on a key to insert it or to search
    for it. Those two methods are
    hash and eql? Two objects are considered the same as far as the
    hash is concerned if first
    their hash values are equal, and second if they compare using
    the :eql? method. When you
    insert into a hash using a key, it's your responsibility to
    ensure that if two objects are :eql? they
    will return the same hash value. Ruby arrays (and most if not
    all standard Ruby objects do)

    2) In most cases Ruby hashes capture a reference to a key when it's inserted.

    3) If you change an object which is referenced as a key in a hash in
    such a way that it's hash
    value changes, then the hash won't know about it without help,
    and you'll likely be surprised.

    Here's some code to walk through to illustrate the above:

    # Make two different arrays, referenced by variables a and b respectively
    a = [1, 2, 3]
    b = [1, 2, 3]

    # They are eql
    a.eql? b # => true
    # But not the same object
    a.equal? b # => false
    a.hash .eql? b.hash # => true

    # Now create a hash using a as a key
    h = {a => "Fred"}

    # We can retrieve using the object referenced by a
    h[a] # => "Fred"
    # And using the variable referenced by b
    h # => "Fred"
    # Or using an new literal array which is eql
    h[[1,2,3]] # => "Fred"
    # Note that a and b STILL refer to different objects
    a.eql? b # => true
    a.equal? b # => false

    # Note also that hash.keys does not return the original object.
    h.keys.first.eql? a # => true
    h.keys.first.eql? b # => true

    # Now lets change a
    a << 4
    # a and b should no longer be eql
    a.eql? b # => false
    # What happens if we access the hash now
    h[a] # => nil
    # And using the variable referenced by b
    h # => nil
    # Or using an new literal array which is eql
    h[[1,2,3]] # => nil
    # What's happened is that the hash actually DID capture the key originally.
    # The internal key changed so we can't find a value using [1,2,3]

    # Notice that none of the above found a value even h[a], despite the
    fact that a does
    # in fact reference the new key:
    a # => [1, 2, 3, 4]

    # And we can't find it with an eql literal array either
    h[[1,2,3,4]] # => nil

    # The hash didn't know about the change to the key, so the hashing is
    out of sync with reality
    # Rehashing the hash rights things:
    h.rehash
    h[[1,2,3,4]] # => "Fred"

    # So if we WANT to allow changing keys after putting them in a hash we
    need to rehash after a change
    # If we don't want to allow changing keys, then dup the key when you
    put it in the hash.

    h2 = {a.dup => "Ethel"}

    h2[a] # => "Ethel"
    a # => [1, 2, 3, 4]
    h2[[1,2,3,4]] #=>
    a << 5

    # Since a has changed we don't expect to find it
    h2[a] # => nil
    # But since we duped it the key in the hash didn't change
    h2[[1,2,3,4,5]] # => nil
    h2[[1,2,3,4]] # =>

    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
     
    Rick DeNatale, Sep 17, 2007
    #3
  4. Stefano Crocco wrote:
    >
    > I think that ruby uses the content of the array, not its id:
    >
    > irb: 001> a=['a','b']
    > ["a", "b"]
    > irb: 002> h = {a => 2}
    > {["a", "b"]=>2}
    > irb: 003> b = %w[a b]
    > ["a", "b"]
    > irb: 004> h
    > 2
    > irb: 005> a.object_id == b.object_id
    > false


    wow... this is deep... the reason I thought it uses the ID instead of
    the value is this:

    the code:
    =======================

    h = {}

    for i in 1..2
    for j in 1..2
    h[[i,j]] = "ha"
    end
    end

    p h


    h2 = {}
    a = [0, 0]

    for i in 1..2
    for j in 1..2
    a[0] = i
    a[1] = j
    h2[a] = "ha"
    end
    end

    p h2

    =======================
    will produce:

    {[1, 2]=>"ha", [2, 1]=>"ha", [1, 1]=>"ha", [2, 2]=>"ha"}
    {[2, 2]=>"ha", [2, 2]=>"ha", [2, 2]=>"ha", [2, 2]=>"ha"}


    --
    Posted via http://www.ruby-forum.com/.
     
    SpringFlowers AutumnMoon, Sep 17, 2007
    #4
  5. Alle luned=C3=AC 17 settembre 2007, SpringFlowers AutumnMoon ha scritto:
    > h2 =3D {}
    > a =3D [0, 0]
    >
    > for i in 1..2
    > =C2=A0 for j in 1..2
    > =C2=A0 =C2=A0 a[0] =3D i
    > =C2=A0 =C2=A0 a[1] =3D j
    > =C2=A0 =C2=A0 h2[a] =3D "ha"
    > =C2=A0 end
    > end
    >
    > p h2
    >
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > will produce:
    >
    >...
    > {[2, 2]=3D>"ha", [2, 2]=3D>"ha", [2, 2]=3D>"ha", [2, 2]=3D>"ha"}


    This happens because you don't call h2.rehash after modifying the array you=
    =20
    use as key (look at the ri documentation for Hash#rehash for a longer=20
    explaination). If you insert the line=20

    h2.rehash

    after=20

    h2[a] =3D "ha"

    you'll get the expected behaviour.

    Stefano
     
    Stefano Crocco, Sep 18, 2007
    #5
  6. Stefano Crocco wrote:
    >
    > insert the line
    >
    > h2.rehash
    >
    > after
    >
    > h2[a] = "ha"
    >
    > you'll get the expected behaviour.


    actually if the code is

    for i in 1..2
    for j in 1..2
    a[0] = i
    a[1] = j
    h2[a] = "ha"
    h2.rehash
    end
    end

    p h2

    it will print out:

    {[2, 2]=>"ha"}

    =========================
    if i use

    h2 = {}
    a = [0, 0]

    for i in 1..2
    for j in 1..2
    a[0] = i
    a[1] = j
    h2[a.dup] = "ha"
    end
    end

    p h2

    then it will print out

    {[1, 2]=>"ha", [2, 1]=>"ha", [1, 1]=>"ha", [2, 2]=>"ha"}


    but my question is, if i run the loop 10 million times (for some
    statistics), then it will actually duplicate the array 10 million times,
    which is memory hungry. so i wonder what is a better way. if a is just
    a list of numbers, i can certainly use a.join(",") as the key... but
    looping 10 million times will also create "frozen strings" in the hash
    table 10 million times too?



    --
    Posted via http://www.ruby-forum.com/.
     
    SpringFlowers AutumnMoon, Sep 18, 2007
    #6
  7. SpringFlowers AutumnMoon wrote:

    > but my question is, if i run the loop 10 million times (for some
    > statistics), then it will actually duplicate the array 10 million times,
    > which is memory hungry.


    actually, i tried running it and watch ruby.exe in Windows or ruby in
    Mac keep on being 2 or 3MB without every growing. (i set the loop to
    100 million and dup the key and watched it keep on running with the
    program size about the same while running).

    I think it is due to the hash table's size is constant. so when Hash
    called hash() to find the slot, and do the eql? and find that they are
    the same. and so it either places that dup key in that slot or don't
    place it there at all (doesn't matter), and the un-referenced
    dupulicates are garbage collected. so that's why the size of the
    program keep on being 2 to 3MB in the RAM.

    if however, i set an array just_test and always do a just_test <<
    key where key is the dup, then the program will grow from 3MB to 6MB,
    to 12MB, to 20MB and it just keeps growing. (just_test is an array to
    keep a reference to the dup objects so that they won't be garbage
    collected). So looks like the garbage collection in Ruby is doing a
    good job.

    --
    Posted via http://www.ruby-forum.com/.
     
    SpringFlowers AutumnMoon, Sep 18, 2007
    #7
  8. Alle marted=C3=AC 18 settembre 2007, SpringFlowers AutumnMoon ha scritto:
    > SpringFlowers AutumnMoon wrote:
    > > but my question is, if i run the loop 10 million times (for some
    > > statistics), then it will actually duplicate the array 10 million times,
    > > which is memory hungry.

    >
    > actually, i tried running it and watch ruby.exe in Windows or ruby in
    > Mac keep on being 2 or 3MB without every growing. (i set the loop to
    > 100 million and dup the key and watched it keep on running with the
    > program size about the same while running).
    >
    > I think it is due to the hash table's size is constant. so when Hash
    > called hash() to find the slot, and do the eql? and find that they are
    > the same. and so it either places that dup key in that slot or don't
    > place it there at all (doesn't matter), and the un-referenced
    > dupulicates are garbage collected. so that's why the size of the
    > program keep on being 2 to 3MB in the RAM.
    >
    > if however, i set an array just_test and always do a just_test <<
    > key where key is the dup, then the program will grow from 3MB to 6MB,
    > to 12MB, to 20MB and it just keeps growing. (just_test is an array to
    > keep a reference to the dup objects so that they won't be garbage
    > collected). So looks like the garbage collection in Ruby is doing a
    > good job.


    I think it's easier (and more efficient) to create a new array with the=20
    current values of i and j at each iteration, instead of modifying one singl=
    e=20
    array and duping it.

    for i in 1..2
    for j in 1..2
    h2[[i,j]] =3D "ha"
    end
    end

    Stefano
     
    Stefano Crocco, Sep 18, 2007
    #8
  9. Stefano Crocco wrote:
    >
    > I think it's easier (and more efficient) to create a new array with the
    > current values of i and j at each iteration, instead of modifying one
    > single
    > array and duping it.
    >
    > for i in 1..2
    > for j in 1..2
    > h2[[i,j]] = "ha"
    > end
    > end


    that is already listed in the initial program. the main requirement is
    that I have a list (an array), and how do I use it as a hash key.

    --
    Posted via http://www.ruby-forum.com/.
     
    SpringFlowers AutumnMoon, Sep 18, 2007
    #9
  10. On 9/18/07, SpringFlowers AutumnMoon <> wrote:
    > Stefano Crocco wrote:
    > >
    > > I think it's easier (and more efficient) to create a new array with the
    > > current values of i and j at each iteration, instead of modifying one
    > > single
    > > array and duping it.
    > >
    > > for i in 1..2
    > > for j in 1..2
    > > h2[[i,j]] = "ha"
    > > end
    > > end

    >
    > that is already listed in the initial program. the main requirement is
    > that I have a list (an array), and how do I use it as a hash key.



    But this code does meet this requirement.

    [i,j] IS an array, so

    h2[[i,j]] = "ha"

    is storing "ha" in the hash with a key which is an array containing
    references to the objects currently referenced by i and j.

    I think you might be tripping over the difference between a variable
    and an object here. I know that lots of posters to this forum do.
    Variables aren't objects per se. A variable references an object.
    Variable assignment copies the reference NOT the object.

    So, at the risk of being pedantic.

    a = [i,j] # a now refers to the object which was generated by the
    literal [i,j]
    h[a] = "foo" # The hash now has a key which references the same object
    a = "Something else" # a now refers to the string, but the hash
    still references the array through the key.

    h[["betty", "boop"] = "boop, boop, a doo" # h now ALSO references
    the object generated by the array literal as a new key.

    I would say that best practice for hash keys is to use either literal
    values or expressions for keys, OR variables whose scope doesn't last
    outside of the code which stores the key,value, you want to control
    cases where an object being used as a key in a hash changes outside of
    the hash.

    You seem to be worried about the memory requirements for storing the
    keys. I'm not sure there's much that can be done about that though,
    If you want to have a hash with 10,000 entries, then all 10,000 keys
    are going to have to be unique and will have to be stored.

    When that becomes an issue, you probably want to explore other
    implementations which store the key->value mapping externally, say a
    database or something in between.

    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
     
    Rick DeNatale, Sep 18, 2007
    #10
  11. Rick Denatale wrote:
    > On 9/18/07, SpringFlowers AutumnMoon <> wrote:
    >> > end
    >> > end

    >>
    >> that is already listed in the initial program. the main requirement is
    >> that I have a list (an array), and how do I use it as a hash key.

    >
    >
    > But this code does meet this requirement.
    >
    > [i,j] IS an array, so
    >
    > h2[[i,j]] = "ha"
    >
    > is storing "ha" in the hash with a key which is an array containing
    > references to the objects currently referenced by i and j.


    my application is this: i have an array with numbers in it.
    and i will have a shuffle method to shuffle numbers in that array.
    and then i have to use that array's value as a hash key, so as to

    h[a] ||= 0 # if nil, then set to 0
    h[a] += 1 # to tally up the count

    (and repeat the shuffling 100,000 times and keep the count)

    so in this case, i don't think rehash will do the job, as rehash merely
    look at what "a" currently references to, and rebuild the hash. it will
    only have 1 key.

    so what I will do is either to use

    key = a.dup
    h[key] ||= 0
    h[key] += 1

    or even safer:

    key = a.dup
    key.freeze
    h[key] ||= 0
    h[key] += 1

    so this is like, to use an array that can change as a tuple.


    --
    Posted via http://www.ruby-forum.com/.
     
    SpringFlowers AutumnMoon, Sep 18, 2007
    #11
  12. Rick Denatale wrote:
    > On 9/18/07, SpringFlowers AutumnMoon <> wrote:


    > I think you might be tripping over the difference between a variable
    > and an object here. I know that lots of posters to this forum do.
    > Variables aren't objects per se. A variable references an object.
    > Variable assignment copies the reference NOT the object.


    actually, my posting of "The meaning of a = b in object oriented
    languages"
    reminded me of what I learned in Java... that the variables are not
    objects
    themselves. the variable merely references an object, like a pointer to
    an
    object.

    the main confusion I had with hash (and i think now clear) is that Hash
    first
    used hash() on the key's value to find a hash slot. and then at the
    slot, the
    mapping is done not for the key's value to the final value.
    the mapping merely stores the pointer to the key object, and the pointer
    to the value object.

    so if the key object changes, then the hash() will most likely hash to
    another
    hash slot, and then found that there is no mapping there, and therefore
    store
    another pointer to key, pointer to value pair there.

    thus the result:

    irb(main):084:0> a = [1,2]
    => [1, 2]
    irb(main):085:0> h = {}
    => {}
    irb(main):086:0> h[a] = "ha"
    => "ha"
    irb(main):087:0> a[1] = 99
    => 99
    irb(main):088:0> a
    => [1, 99]
    irb(main):089:0> h[a] = "hee"
    => "hee"
    irb(main):090:0> h
    => {[1, 99]=>"ha", [1, 99]=>"hee"}
    irb(main):091:0> h[[1,2]]
    => nil
    irb(main):092:0> h[[1,99]]
    => "hee"
    irb(main):093:0> h.rehash
    => {[1, 99]=>"hee"}
    irb(main):094:0>


    Example 2:

    irb(main):094:0> h = {}
    => {}
    irb(main):095:0> b = "haha"
    => "haha"
    irb(main):096:0> h["a"] = b
    => "haha"
    irb(main):097:0> h
    => {"a"=>"haha"}
    irb(main):098:0> b[3] = "i"
    => "i"
    irb(main):099:0> h
    => {"a"=>"hahi"}
    irb(main):100:0> b = "another"
    => "another"
    irb(main):101:0> h
    => {"a"=>"hahi"}
    irb(main):102:0>

    --
    Posted via http://www.ruby-forum.com/.
     
    SpringFlowers AutumnMoon, Sep 18, 2007
    #12
  13. Rick Denatale wrote:
    > On 9/18/07, SpringFlowers AutumnMoon <> wrote:


    > You seem to be worried about the memory requirements for storing the
    > keys. I'm not sure there's much that can be done about that though,
    > If you want to have a hash with 10,000 entries, then all 10,000 keys
    > are going to have to be unique and will have to be stored.


    fortunately, if the loop "dup" the key 100 million times, but the hash
    only has 10,000 unique key values, then only 10,000 objects will exist.
    the other "dup" keys will be garbage collected. (and quickly too, as
    the memory usage never went up to 4 or 5 MB).

    --
    Posted via http://www.ruby-forum.com/.
     
    SpringFlowers AutumnMoon, Sep 18, 2007
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michal Mikolajczyk
    Replies:
    1
    Views:
    813
    Larry Bates
    Apr 20, 2004
  2. Jeff Epler
    Replies:
    0
    Views:
    972
    Jeff Epler
    Apr 20, 2004
  3. Bill Scherer
    Replies:
    0
    Views:
    618
    Bill Scherer
    Apr 20, 2004
  4. rp
    Replies:
    1
    Views:
    543
    red floyd
    Nov 10, 2011
  5. Une bévue
    Replies:
    5
    Views:
    153
    Une bévue
    Aug 10, 2006
Loading...

Share This Page