Unexpected problem: hash[key] << value

Discussion in 'Ruby' started by Joey Zhou, Mar 9, 2011.

  1. Joey Zhou

    Joey Zhou Guest

    # ruby 1.9.2p180 (2011-02-18) [i386-mingw32]

    get_page_hash = {}
    get_page_hash.default = []

    File.foreach("page.txt") do |line|
    word, page = line.chomp.split(':')
    get_page_hash[word] << page # the problem is here
    end

    p get_page_hash['Aword'] # => ["1", "2", "3", "4", "5"]
    p get_page_hash['Bword'] # => ["1", "2", "3", "4", "5"]
    p get_page_hash.default # => ["1", "2", "3", "4", "5"]

    __END__

    content of page.txt:
    Aword:1
    Bword:2
    Cword:3
    Aword:4
    Dword:5


    Simple program, clear purpose. I don't know why get_page_hash.default
    becomes ["1", "2", "3", "4", "5"], it seems radiculous.

    Only if I modify the very line to:

    get_page_hash[word] += [page]

    I get what I want:

    p get_page_hash['Aword'] # => ["1", "4"]
    p get_page_hash['Bword'] # => ["2"]
    p get_page_hash.default # => []

    I think use "<<" maybe intuitive, but the result is unexpected. What's
    wrong with it?

    Thank you!

    Joey

    --
    Posted via http://www.ruby-forum.com/.
     
    Joey Zhou, Mar 9, 2011
    #1
    1. Advertising

  2. Joey Zhou

    Haruka YAGNI Guest

    On Wed, Mar 9, 2011 at 12:45 PM, Joey Zhou <> wrote:
    > I think use "<<" maybe intuitive, but the result is unexpected. What's
    > wrong with it?


    Hash.default refers to the same Array at each element.
    You cannot use the first code, because "<<" changes the default array,
    not each element.

    Here is a good example from the manual, but it is Japanese... sorry
    http://www.ruby-lang.org/ja/man/html/trap_Hash.html

    --
    Haruka YAGNI
     
    Haruka YAGNI, Mar 9, 2011
    #2
    1. Advertising

  3. Joey Zhou

    Mark Beek Guest

    I just stumbled across this surprising behavior myself. It's the first
    counter-intuitive mechanism I have come across in my short sweet
    experience with Ruby.

    Check this thread for an elaborate discussion (in English) of this
    behavior:

    http://www.ruby-forum.com/topic/134424#new

    Here's my take:

    Before we look at your case, let's look at a case that actually works as
    you'd expect: initializing a hash with a Fixnum:

    Code
    h = Hash.new(0)
    puts "h['key1']: #{h['key1']}"
    puts "h['key2']: #{h['key2']}"
    h['key1'] += 1
    puts "after updating key1"
    puts "h['key1']: #{h['key1']}"
    puts "h['key2']: #{h['key2']}"

    Result:
    h['key1']: 0
    h['key2']: 0
    after updating key1
    h['key1']: 1
    h['key2']: 0

    Perfect! Mighty handy for word count programs and all sorts of other use
    cases.

    Which would lead you to expect the following behavior when you
    initialize a hash with an empty array, then append:

    Code
    h = Hash.new([])
    puts "h['key1']: #{h['key1']}"
    puts "h['key2']: #{h['key2']}"
    h['key1'] << 1
    puts "after updating key1"
    puts "h['key1']: #{h['key1']}"
    puts "h['key2']: #{h['key2']}"

    Result
    h['key1']: []
    h['key2']: []
    after updating key1
    h['key1']: [1]
    h['key2']: [] #<-- what you'd expect, but NOT what you get

    The actual result is the following:

    h['key1']: []
    h['key2']: []
    after updating key1
    h['key1']: [1]
    h['key2']: [1]
    . . and so on

    The problem is that when you initialize a hash with a mutable default
    value, all of the defaults are actually references to THE SAME OBJECT.
    So when you append to the default array in one hash value, you're
    actually changing them all. Witness:

    puts "#{h['key1'].object_id}"
    puts "#{h['key2'].object_id}"
    puts "#{h['key3'].object_id}"

    Result:
    116528
    116528
    116528

    By contrast, when you update a value with the += construction rather
    than <<, you're actually creating a new array object for that value. So
    that particular one is no longer referring to the default value.

    The thread referred to above mentions other ways to get what you'd
    expect with a default empty array. Still, I gotta admit that I simply
    don't understand why Hash.new([]) works the way it does. Who would want
    to create a Hash table where changing a single value can potentially
    change all other values, past, present, and to come. Talk about side
    effects gone wild!

    If anyone can explain the rationale for this behavior,I'd really
    appreciate it. I'm probably just missing something.

    --
    Posted via http://www.ruby-forum.com/.
     
    Mark Beek, Mar 9, 2011
    #3
  4. Mark Beek wrote in post #986380:
    > Which would lead you to expect the following behavior when you
    > initialize a hash with an empty array, then append:
    >
    > Code
    > h = Hash.new([])
    > puts "h['key1']: #{h['key1']}"
    > puts "h['key2']: #{h['key2']}"
    > h['key1'] << 1
    > puts "after updating key1"
    > puts "h['key1']: #{h['key1']}"
    > puts "h['key2']: #{h['key2']}"
    >
    > Result
    > h['key1']: []
    > h['key2']: []
    > after updating key1
    > h['key1']: [1]
    > h['key2']: [] #<-- what you'd expect, but NOT what you get


    To get that behaviour, you need the Hash to create a *new* empty array
    for every unknown element. What I do is:

    h = Hash.new { |o,k| o[k] = [] }

    > The problem is that when you initialize a hash with a mutable default
    > value, all of the defaults are actually references to THE SAME OBJECT.

    ...
    > If anyone can explain the rationale for this behavior,I'd really
    > appreciate it. I'm probably just missing something.


    The question is, how else could it work in the general case?

    Perhaps you pass a prototype object, and the Hash constructor would call
    dup on that object every time it needs a new distinct instance? No,
    that doesn't work, because .dup is only a shallow copy. Check out:

    a = [[1,2],[3,4]]
    b = a.dup
    b[0] << 3
    a
    b

    Perhaps you could pass a Class, and then Hash would call your class's
    new method every time it wanted an instance? Sure, you could pass Array
    in this case, but it's quite restrictive. And the simple case of
    Hash.new(0) wouldn't work.

    So to work in the general case you have to give it some code to execute
    to create a new object every time one is needed - a factory block.

    The same applies with arrays: compare

    a = Array.new(5, [])
    b = Array.new(5) { [] }
    puts a.map { |x| x.object_id }
    puts b.map { |x| x.object_id }

    Regards,

    Brian.

    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Mar 9, 2011
    #4
  5. On Wed, Mar 9, 2011 at 6:35 AM, Mark Beek <> wrote:
    > I just stumbled across this surprising behavior myself. It's the first
    > counter-intuitive mechanism I have come across in my short sweet
    > experience with Ruby.
    >
    > Check this thread for an elaborate discussion (in English) of this
    > behavior:
    >
    > http://www.ruby-forum.com/topic/134424#new
    >
    > Here's my take:
    >
    > Before we look at your case, let's look at a case that actually works as
    > you'd expect: initializing a hash with a Fixnum:
    >
    > Code
    > h =3D Hash.new(0)
    > puts "h['key1']: #{h['key1']}"
    > puts "h['key2']: #{h['key2']}"
    > h['key1'] +=3D 1
    > puts "after updating key1"
    > puts "h['key1']: #{h['key1']}"
    > puts "h['key2']: #{h['key2']}"
    >
    > Result:
    > h['key1']: 0
    > h['key2']: 0
    > after updating key1
    > h['key1']: 1
    > h['key2']: 0
    >
    > Perfect! Mighty handy for word count programs and all sorts of other use
    > cases.
    >
    > Which would lead you to expect the following behavior when you
    > initialize a hash with an empty array, then append:
    >
    > Code
    > h =3D Hash.new([])
    > puts "h['key1']: #{h['key1']}"
    > puts "h['key2']: #{h['key2']}"
    > h['key1'] << 1
    > puts "after updating key1"
    > puts "h['key1']: #{h['key1']}"
    > puts "h['key2']: #{h['key2']}"
    >
    > Result
    > h['key1']: []
    > h['key2']: []
    > after updating key1
    > h['key1']: [1]
    > h['key2']: [] =A0#<-- what you'd expect, but NOT what you get
    >
    > The actual result is the following:
    >
    > h['key1']: []
    > h['key2']: []
    > after updating key1
    > h['key1']: [1]
    > h['key2']: [1]
    > . . . and so on
    >
    > The problem is that when you initialize a hash with a mutable default
    > value, all of the defaults are actually references to THE SAME OBJECT.
    > So when you append to the default array in one hash value, you're
    > actually changing them all. Witness:
    >
    > puts "#{h['key1'].object_id}"
    > puts "#{h['key2'].object_id}"
    > puts "#{h['key3'].object_id}"
    >
    > Result:
    > 116528
    > 116528
    > 116528
    >
    > By contrast, when you update a value with the +=3D construction rather
    > than <<, you're actually creating a new array object for that value. So
    > that particular one is no longer referring to the default value.


    Well, in this case actually the better idiom is this:

    h =3D Hash.new {|h,k| h[k] =3D []}
    ...

    h[key] << something

    Reason: Array#+ will create a new object every time you add something
    while the idiom presented above only ever creates one Array per key.

    > The thread referred to above mentions other ways to get what you'd
    > expect with a default empty array. Still, I gotta admit that I simply
    > don't understand why Hash.new([]) works the way it does. Who would want
    > to create a Hash table where changing a single value can potentially
    > change all other values, past, present, and to come. Talk about side
    > effects gone wild!


    Well, first of all this is the default return value. This does not
    necessarily mean that it will be modified. You might do something
    like

    h =3D Hash.new("missing".freeze)
    ...

    puts h[key]

    And then of course there is a very common idiom

    counters =3D Hash.new 0
    ...
    counters[key] +=3D 1

    > If anyone can explain the rationale for this behavior,I'd really
    > appreciate it. I'm probably just missing something.


    Hopefully that explanation helps.

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Mar 9, 2011
    #5
  6. Joey Zhou

    Joey Zhou Guest

    Re: Unexpected problem: hash<< value

    Haruka YAGNI wrote in post #986377:
    > On Wed, Mar 9, 2011 at 12:45 PM, Joey Zhou <> wrote:
    > Hash.default refers to the same Array at each element.
    > You cannot use the first code, because "<<" changes the default array,
    > not each element.
    >
    > Here is a good example from the manual, but it is Japanese... sorry
    > http://www.ruby-lang.org/ja/man/html/trap_Hash.html


    Thank you. I can read the codes :)

    --
    Posted via http://www.ruby-forum.com/.
     
    Joey Zhou, Mar 9, 2011
    #6
  7. Joey Zhou

    Joey Zhou Guest

    Re: Unexpected problem: hash<< value

    Robert Klemme wrote in post #986401:
    > Well, in this case actually the better idiom is this:
    >
    > h = Hash.new {|h,k| h[k] = []}

    This is actually what I need. Thank you.

    --
    Posted via http://www.ruby-forum.com/.
     
    Joey Zhou, Mar 9, 2011
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. M P
    Replies:
    1
    Views:
    494
  2. rp
    Replies:
    1
    Views:
    544
    red floyd
    Nov 10, 2011
  3. Une bévue
    Replies:
    5
    Views:
    153
    Une bévue
    Aug 10, 2006
  4. Antonio Quinonez
    Replies:
    2
    Views:
    176
    Antonio Quinonez
    Aug 14, 2003
  5. Ralf Baerwaldt
    Replies:
    1
    Views:
    136
    Paul Lalli
    Jul 20, 2004
Loading...

Share This Page