Newbie question > Hash workings

A

AD60

Hi,

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

h = Hash.new([])
h[0] = [1,2,3]
p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}
h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12] ?
p h[2] #>
[20] ???
h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]} ?
h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
21, 22]} ????????


Grtz. André
 
S

Stefan Lang

2009/2/28 AD60 said:
Hi,

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

There's no bug involved.
h = Hash.new([])

Here you create Hash with an empty array as
default value. For every lookup of a key that is
not in the hash, this single array instance is
returned, but not automatically stored in the hash.
Again: It's a single object, not a new one on every
lookup.
h[0] = [1,2,3]

Here you associate 0 with an array. The hash default
value is not involved.
p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}

This is a syntactic shortcut for "h[0] = h[0] + [4]".
It looks up the value of 0 in the hash, which is
an array. array + array creates a new array,
which overwrites the previous value of h[0].
h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}

Again, this is a shortcut for "h[1] = h[1] + [10, 11]".
The hash doesn't have a value for 1 yet, so
the default value ([]) is returned. [] + [10, 11] is
the new array [10, 11] which is then assigned
to h[1].
h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}

Look up the existing array stored in h[1] and append
12. No new array created.
h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12] ?

The hash doesn't have a value for 2, so the default
value, [], is returned. The default value is destructively
modified by appending 20. _No new hash entry is
created (there is no assignment!)._

Future lookups of non-existent keys will return
the modified default value!
p h[2] #> [20] ???

The hash still doesn't have an entry for 2, so
the default value is returned, which by now is
our modified array.
h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]} ?

Still no entry for 2, default value is returned and modified
by appending 21.
h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20, 21, 22]} ????????

Again, this is equivalent to "h[2] = h[2] + [22]".
Still no entry for h[2], so the default value is returned,
which by now is [20, 21]. Then the two arrays
[20, 21] and [22] are concatenated, returning the
new array [20, 21, 22] which is then finally associated
with 2 in the hash (see: here is an assignment!).

What you probably want to achieve can be
done using Hash's block constructor:

h = Hash.new { |hash, key| hash[key] = [] }

This automatically stores a new empty array
on every lookup of a non-existent key.

HTH,
Stefan
 
R

Robert Klemme

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

p works correctly but...
h = Hash.new([])
h[0] = [1,2,3]
p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}
h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12] ?
p h[2] #>
[20] ???
h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]} ?
h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
21, 22]} ????????

.... you are modifying the default value of the Hash in those cases where
you place the question marks. The default value is the one returned in
case a key does not exist. But there is no automatic insertion, that's
why you do not see key 2 after doing h[2] << 20.

You probably rather want this famous idiom:

h = Hash.new {|ha,ke| ha[ke] = []}

Now run your tests again. :)

Kind regards

robert
 
A

AD60

Thank you both for seeing the light at the end of the tunnel. I fully
understand what I have to do to get it working but just to clarify;

By doing "h[2] << 20" I modified the original empty array I set as a
default value and inserted no hash key because it is not an assignment
to h?

Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
[]}" and just doing "p h[2]" before any assigment to h[2] has the
surprising/logical side effect of creating a new key "2", thank god
with value "[]" so no problems there, but still... I am now thinking
that stating "h = Hash.new([])" is problematic idiom under all
circumstances. Am I correct?

Kind regards,
André
 
R

Robert Klemme

2009/3/1 AD60 said:
Thank you both for seeing the light at the end of the tunnel. I fully
understand what I have to do to get it working but just to clarify;
Good!

By doing "h[2] << 20" I modified the original empty array I set as a
default value and inserted no hash key because it is not an assignment
to h?

Yes. Although "assignment to h" is a bit ambiguous. I guess you
meant the correct thing but keep in mind that normally "assignment to
h" means "h = something" and not "h[key] = something". The latter is
actually a method defined in class Hash.
Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
[]}" and just doing "p h[2]" before any assigment to h[2] has the
surprising/logical side effect of creating a new key "2", thank god
with value "[]" so no problems there, but still... I am now thinking
that stating "h = Hash.new([])" is problematic idiom under all
circumstances. Am I correct?

No, because it works well in other cases - the most popular probably being

counters = Hash.new 0
...
counters[foo] += 1

Kind regards

robert
 
A

AD60

I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Just to detail the difference:


hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]
p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

hiv = Hash.new 0
hiv[0] += 1
p hiv # {0=>1}
p hiv[1] # 0
p hiv # {0=>1}

it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Kind regards,
André
 
R

Robert Klemme

I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Ah. Since you talked about "idiom" I thought you were referring to the
general pattern.
Just to detail the difference:


hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]

Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you'll end up creating and deleting Array instances
all the time without any benefit.
p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

hiv = Hash.new 0
hiv[0] += 1
p hiv # {0=>1}
p hiv[1] # 0
p hiv # {0=>1}

it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Good.

Kind regards

robert
 
A

Adam Gardner

Robert said:
I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Ah. Since you talked about "idiom" I thought you were referring to the
general pattern.
Just to detail the difference:


hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]

Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you'll end up creating and deleting Array instances
all the time without any benefit.
it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Good.

Kind regards

robert

The method I would use is this:

hav = Hash.new {[]}

This gives you the benefit of creating a new array when necessary
(rather than a single default array which can be destructively modified
in troubling ways), but means that you only get permanent effects if you
explicitly use assignment, so you don't get this

p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

Of course, it means you can't use the << operator to stick values into
non-present keys, but I rarely use the << operator anyway.
 
R

Robert Klemme

Robert said:
I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.
Ah. Since you talked about "idiom" I thought you were referring to the
general pattern.
Just to detail the difference:

hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]
Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you'll end up creating and deleting Array instances
all the time without any benefit.
it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now. Good.

The method I would use is this:

hav = Hash.new {[]}

This gives you the benefit of creating a new array when necessary
(rather than a single default array which can be destructively modified
in troubling ways), but means that you only get permanent effects if you
explicitly use assignment, so you don't get this

p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

Of course, it means you can't use the << operator to stick values into
non-present keys, but I rarely use the << operator anyway.

You probably should get rid of that habit. And here's why: your
approach has no means to efficiently work with Arrays as Hash values.
Either, you have to use += all the time which is inefficient because it
will create a lot of garbage - or you have to explicitly test for
presence of the key, which is less efficient than letting Hash's C
implementation do it. It is much more elegant to just always do
hash[key] << value.

If you just want to make sure that you always get an Array back, you
could as well use Hash.new([].freeze) to avoid accidentally appending to
the single default all the time.

But it's a free country so if you like to stick with your habits...

Cheers

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top