Newbie question > Hash workings

Discussion in 'Ruby' started by AD60, Feb 28, 2009.

  1. AD60

    AD60 Guest

    Hi,

    I am puzzled by the workings a the following program. I placed the
    output after the corresponding lines. To me it lookes like a sort of
    bug in the workings of de p method. Can anybody set me straight?

    h = Hash.new([])
    h[0] = [1,2,3]
    p h #> {0=>[1, 2, 3]}
    h[0] += [4]
    p h #> {0=>[1, 2, 3, 4]}
    h[1] += [10,11]
    p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
    h[1] << 12
    p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
    h[2] << 20
    p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
    12] ?
    p h[2] #>
    [20] ???
    h[2] << 21
    p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
    12]} ?
    h[2] += [22]
    p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
    21, 22]} ????????


    Grtz. André
    AD60, Feb 28, 2009
    #1
    1. Advertising

  2. AD60

    Stefan Lang Guest

    2009/2/28 AD60 <>:
    > Hi,
    >
    > I am puzzled by the workings a the following program. I placed the
    > output after the corresponding lines. To me it lookes like a sort of
    > bug in the workings of de p method. Can anybody set me straight?


    There's no bug involved.

    > h = Hash.new([])


    Here you create Hash with an empty array as
    default value. For every lookup of a key that is
    not in the hash, this single array instance is
    returned, but not automatically stored in the hash.
    Again: It's a single object, not a new one on every
    lookup.

    > h[0] = [1,2,3]


    Here you associate 0 with an array. The hash default
    value is not involved.

    > p h #> {0=>[1, 2, 3]}
    > h[0] += [4]
    > p h #> {0=>[1, 2, 3, 4]}


    This is a syntactic shortcut for "h[0] = h[0] + [4]".
    It looks up the value of 0 in the hash, which is
    an array. array + array creates a new array,
    which overwrites the previous value of h[0].

    > h[1] += [10,11]
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}


    Again, this is a shortcut for "h[1] = h[1] + [10, 11]".
    The hash doesn't have a value for 1 yet, so
    the default value ([]) is returned. [] + [10, 11] is
    the new array [10, 11] which is then assigned
    to h[1].

    > h[1] << 12
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}


    Look up the existing array stored in h[1] and append
    12. No new array created.

    > h[2] << 20
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12] ?


    The hash doesn't have a value for 2, so the default
    value, [], is returned. The default value is destructively
    modified by appending 20. _No new hash entry is
    created (there is no assignment!)._

    Future lookups of non-existent keys will return
    the modified default value!

    > p h[2] #> [20] ???


    The hash still doesn't have an entry for 2, so
    the default value is returned, which by now is
    our modified array.

    > h[2] << 21
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]} ?


    Still no entry for 2, default value is returned and modified
    by appending 21.

    > h[2] += [22]
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20, 21, 22]} ????????


    Again, this is equivalent to "h[2] = h[2] + [22]".
    Still no entry for h[2], so the default value is returned,
    which by now is [20, 21]. Then the two arrays
    [20, 21] and [22] are concatenated, returning the
    new array [20, 21, 22] which is then finally associated
    with 2 in the hash (see: here is an assignment!).

    What you probably want to achieve can be
    done using Hash's block constructor:

    h = Hash.new { |hash, key| hash[key] = [] }

    This automatically stores a new empty array
    on every lookup of a non-existent key.

    HTH,
    Stefan
    Stefan Lang, Feb 28, 2009
    #2
    1. Advertising

  3. On 28.02.2009 22:15, AD60 wrote:
    > I am puzzled by the workings a the following program. I placed the
    > output after the corresponding lines. To me it lookes like a sort of
    > bug in the workings of de p method. Can anybody set me straight?


    p works correctly but...

    > h = Hash.new([])
    > h[0] = [1,2,3]
    > p h #> {0=>[1, 2, 3]}
    > h[0] += [4]
    > p h #> {0=>[1, 2, 3, 4]}
    > h[1] += [10,11]
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
    > h[1] << 12
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
    > h[2] << 20
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
    > 12] ?
    > p h[2] #>
    > [20] ???
    > h[2] << 21
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
    > 12]} ?
    > h[2] += [22]
    > p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
    > 21, 22]} ????????


    .... you are modifying the default value of the Hash in those cases where
    you place the question marks. The default value is the one returned in
    case a key does not exist. But there is no automatic insertion, that's
    why you do not see key 2 after doing h[2] << 20.

    You probably rather want this famous idiom:

    h = Hash.new {|ha,ke| ha[ke] = []}

    Now run your tests again. :)

    Kind regards

    robert
    Robert Klemme, Feb 28, 2009
    #3
  4. AD60

    AD60 Guest

    Thank you both for seeing the light at the end of the tunnel. I fully
    understand what I have to do to get it working but just to clarify;

    By doing "h[2] << 20" I modified the original empty array I set as a
    default value and inserted no hash key because it is not an assignment
    to h?

    Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
    []}" and just doing "p h[2]" before any assigment to h[2] has the
    surprising/logical side effect of creating a new key "2", thank god
    with value "[]" so no problems there, but still... I am now thinking
    that stating "h = Hash.new([])" is problematic idiom under all
    circumstances. Am I correct?

    Kind regards,
    André
    AD60, Mar 1, 2009
    #4
  5. 2009/3/1 AD60 <>:
    > Thank you both for seeing the light at the end of the tunnel. I fully
    > understand what I have to do to get it working but just to clarify;


    Good!

    > By doing "h[2] << 20" I modified the original empty array I set as a
    > default value and inserted no hash key because it is not an assignment
    > to h?


    Yes. Although "assignment to h" is a bit ambiguous. I guess you
    meant the correct thing but keep in mind that normally "assignment to
    h" means "h = something" and not "h[key] = something". The latter is
    actually a method defined in class Hash.

    > Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
    > []}" and just doing "p h[2]" before any assigment to h[2] has the
    > surprising/logical side effect of creating a new key "2", thank god
    > with value "[]" so no problems there, but still... I am now thinking
    > that stating "h = Hash.new([])" is problematic idiom under all
    > circumstances. Am I correct?


    No, because it works well in other cases - the most popular probably being

    counters = Hash.new 0
    ...
    counters[foo] += 1

    Kind regards

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    Robert Klemme, Mar 1, 2009
    #5
  6. AD60

    AD60 Guest

    I see the use with an integer default, I specifically meant the
    Hash.new([]) syntax as problematic, but it probably all depends on
    knowing what your doing.

    Just to detail the difference:


    hav = Hash.new { |h,k| h[k] = []}
    hav[0] += [1]
    p hav # {0=>[1]}
    p hav[1] # []
    p hav # {0=>[1], 1=>[]}

    hiv = Hash.new 0
    hiv[0] += 1
    p hiv # {0=>1}
    p hiv[1] # 0
    p hiv # {0=>1}

    it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
    had me wondering. Anyway thanks again, I understand how to use it and
    that's more than enough for now.

    Kind regards,
    André
    AD60, Mar 1, 2009
    #6
  7. On 01.03.2009 12:35, AD60 wrote:
    > I see the use with an integer default, I specifically meant the
    > Hash.new([]) syntax as problematic, but it probably all depends on
    > knowing what your doing.


    Ah. Since you talked about "idiom" I thought you were referring to the
    general pattern.

    > Just to detail the difference:
    >
    >
    > hav = Hash.new { |h,k| h[k] = []}
    > hav[0] += [1]


    Actually this is very inefficient. With this idiom you should rather do

    hav[0] << [1]

    because otherwise you'll end up creating and deleting Array instances
    all the time without any benefit.

    > p hav # {0=>[1]}
    > p hav[1] # []
    > p hav # {0=>[1], 1=>[]}
    >
    > hiv = Hash.new 0
    > hiv[0] += 1
    > p hiv # {0=>1}
    > p hiv[1] # 0
    > p hiv # {0=>1}
    >
    > it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
    > had me wondering. Anyway thanks again, I understand how to use it and
    > that's more than enough for now.


    Good.

    Kind regards

    robert
    Robert Klemme, Mar 1, 2009
    #7
  8. AD60

    Adam Gardner Guest

    Robert Klemme wrote:
    > On 01.03.2009 12:35, AD60 wrote:
    >> I see the use with an integer default, I specifically meant the
    >> Hash.new([]) syntax as problematic, but it probably all depends on
    >> knowing what your doing.

    >
    > Ah. Since you talked about "idiom" I thought you were referring to the
    > general pattern.
    >
    >> Just to detail the difference:
    >>
    >>
    >> hav = Hash.new { |h,k| h[k] = []}
    >> hav[0] += [1]

    >
    > Actually this is very inefficient. With this idiom you should rather do
    >
    > hav[0] << [1]
    >
    > because otherwise you'll end up creating and deleting Array instances
    > all the time without any benefit.
    >
    >> it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
    >> had me wondering. Anyway thanks again, I understand how to use it and
    >> that's more than enough for now.

    >
    > Good.
    >
    > Kind regards
    >
    > robert


    The method I would use is this:

    hav = Hash.new {[]}

    This gives you the benefit of creating a new array when necessary
    (rather than a single default array which can be destructively modified
    in troubling ways), but means that you only get permanent effects if you
    explicitly use assignment, so you don't get this

    p hav # {0=>[1]}
    p hav[1] # []
    p hav # {0=>[1], 1=>[]}

    Of course, it means you can't use the << operator to stick values into
    non-present keys, but I rarely use the << operator anyway.

    --
    Posted via http://www.ruby-forum.com/.
    Adam Gardner, Mar 2, 2009
    #8
  9. On 02.03.2009 19:18, Adam Gardner wrote:
    > Robert Klemme wrote:
    >> On 01.03.2009 12:35, AD60 wrote:
    >>> I see the use with an integer default, I specifically meant the
    >>> Hash.new([]) syntax as problematic, but it probably all depends on
    >>> knowing what your doing.

    >> Ah. Since you talked about "idiom" I thought you were referring to the
    >> general pattern.
    >>
    >>> Just to detail the difference:
    >>>
    >>> hav = Hash.new { |h,k| h[k] = []}
    >>> hav[0] += [1]

    >> Actually this is very inefficient. With this idiom you should rather do
    >>
    >> hav[0] << [1]
    >>
    >> because otherwise you'll end up creating and deleting Array instances
    >> all the time without any benefit.
    >>
    >>> it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
    >>> had me wondering. Anyway thanks again, I understand how to use it and
    >>> that's more than enough for now.

    >> Good.


    > The method I would use is this:
    >
    > hav = Hash.new {[]}
    >
    > This gives you the benefit of creating a new array when necessary
    > (rather than a single default array which can be destructively modified
    > in troubling ways), but means that you only get permanent effects if you
    > explicitly use assignment, so you don't get this
    >
    > p hav # {0=>[1]}
    > p hav[1] # []
    > p hav # {0=>[1], 1=>[]}
    >
    > Of course, it means you can't use the << operator to stick values into
    > non-present keys, but I rarely use the << operator anyway.


    You probably should get rid of that habit. And here's why: your
    approach has no means to efficiently work with Arrays as Hash values.
    Either, you have to use += all the time which is inefficient because it
    will create a lot of garbage - or you have to explicitly test for
    presence of the key, which is less efficient than letting Hash's C
    implementation do it. It is much more elegant to just always do
    hash[key] << value.

    If you just want to make sure that you always get an Array back, you
    could as well use Hash.new([].freeze) to avoid accidentally appending to
    the single default all the time.

    But it's a free country so if you like to stick with your habits...

    Cheers

    robert
    Robert Klemme, Mar 2, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. digz
    Replies:
    1
    Views:
    502
    Ondra Holub
    Apr 6, 2007
  2. Replies:
    3
    Views:
    385
    Joseph Kesselman
    Apr 11, 2007
  3. rp
    Replies:
    1
    Views:
    512
    red floyd
    Nov 10, 2011
  4. Tom Werner

    Object#extend inner workings

    Tom Werner, Jul 20, 2006, in forum: Ruby
    Replies:
    5
    Views:
    138
    Tom Werner
    Jul 20, 2006
  5. Nick Grandy

    workings of the freaky freaky sandbox

    Nick Grandy, May 3, 2008, in forum: Ruby
    Replies:
    0
    Views:
    109
    Nick Grandy
    May 3, 2008
Loading...

Share This Page