Hash#collate

P

Phrogz

I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that's in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

So, I wrote my own. Comments welcome. Efficiency patches particularly
welcome. Under a different name, perhaps Trans might consider it for
inclusion in Facets.

class Hash
# Merge the values of this hash with those from another, setting all
values
# to be arrays representing the values from both hashes.
# { :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5
# #=> { :a=>[1,3], :b=>[2,4], :c=>[5] }
#
# The 'uniq' option allows you to ensure all values are unique:
# { :a=>1, :b=>2 }.collate( { :a=>1, :b=>3 }, :uniq=>true )
# #=> { :a=>[1], :b=>[2,3] }
#
# By default, array values in either side are merged:
# foo = { :a=>[1,2], :b=>[3] }
# bar = { :a=>[4,5], :c=>[6,7] }
# foo.collate( bar )
# #=> { :a=>[1,2,4,5], :b=>[3], :c=>[6,7] }
#
# Use the 'preserve_arrays' option to prevent them from being
merged:
# foo = { :a=>[1,2], :b=>[3] }
# bar = { :a=>[4,5], :c=>[6,7] }
# foo.collate( bar, :preserve_arrays=>true )
# #=> { :a=>[[1,2],[4,5]], :b=>[[3]], :c=>[[6,7]] }
#
# Note that, as shown above, preserving arrays will cause array
values
# to be wrapped up in another array.
def collate( other_hash, options={} )
dup.collate!( other_hash, options )
end

# The same as #collate, but modifies the receiver in place.
def collate!( other_hash, options={} )
# Prepare, ensuring every existing key is already an Array
each{ |key, value|
if value.is_a?( Array ) && !options[ :preserve_arrays ]
self[key] = value
else
self[key] = [ value ]
end
}

# Collate with values from other_hash
other_hash.each{ |key, value|
if self[ key ]
if value.is_a?( Array ) && !options[ :preserve_arrays ]
self[ key ].concat( value )
else
self[ key ] << value
end
elsif value.is_a?( Array ) && !options[ :preserve_arrays ]
self[ key ] = value
else
self[ key ] = [ value ]
end
}

each{ |key, value| value.uniq! } if options[ :uniq ]

self
end
end

if __FILE__ == $0
require 'test/unit'
class TestHashCollation < Test::Unit::TestCase
def setup
$a = { :a=>1, :b=>2, :z=>26, :all=>%w|a b z|, :stuff1=>%w|foo
bar|, :whee=>%w|a b| }
$b = { :a=>1, :b=>4, :c=>9, :all=>%w|a b c|, :stuff2=>%w|jim
jam|, :whee=>%w|a b| }
$c = { :a=>1, :b=>8, :c=>27 }
end
def test1_defaults
collated = $a.collate( $b )
assert_equal( 8, collated.keys.length, "There are 7 unique
keys" )
assert_equal( [1,1], collated[ :a ] )
assert_equal( [2,4], collated[ :b ] )
assert_equal( [9], collated[ :c ] )
assert_equal( [26], collated[ :z ] )
assert_equal( %w|a b z a b c|, collated[ :all ], "Arrays are
merged by default." )
assert_equal( %w|foo bar|, collated[ :stuff1 ] )
assert_equal( %w|jim jam|, collated[ :stuff2 ] )
assert_equal( %w|a b a b|, collated[ :whee ] )
end
def test2_uniq
collated = $a.collate( $b, :uniq=>true )
assert_equal( 8, collated.keys.length, "There are 7 unique
keys" )
assert_equal( [1], collated[ :a ] )
assert_equal( [2,4], collated[ :b ] )
assert_equal( [9], collated[ :c ] )
assert_equal( [26], collated[ :z ] )
assert_equal( %w|a b z c|, collated[ :all ], "Arrays are merged
by default." )
assert_equal( %w|foo bar|, collated[ :stuff1 ] )
assert_equal( %w|jim jam|, collated[ :stuff2 ] )
assert_equal( %w|a b|, collated[ :whee ] )
end
def test3_preserve_arrays
collated = $a.collate( $b, :preserve_arrays=>true )
assert_equal( 8, collated.keys.length, "There are 7 unique
keys" )
assert_equal( [1,1], collated[ :a ] )
assert_equal( [2,4], collated[ :b ] )
assert_equal( [9], collated[ :c ] )
assert_equal( [26], collated[ :z ] )
assert_equal( [ %w|a b z|, %w|a b c|], collated[ :all ], "Two
arrays are not merged." )
assert_equal( [%w|foo bar|], collated[ :stuff1 ],
"Arrays unique to one side are wrapped" )
assert_equal( [%w|jim jam|], collated[ :stuff2 ],
"Arrays unique to one side are wrapped" )
assert_equal( [%w|a b|, %w|a b|], collated[ :whee ] )
end
def test4_preserve_and_uniq
collated = $a.collate( $b, :preserve_arrays=>true, :uniq=>true )
assert_equal( 8, collated.keys.length, "There are 7 unique
keys" )
assert_equal( [1], collated[ :a ] )
assert_equal( [2,4], collated[ :b ] )
assert_equal( [9], collated[ :c ] )
assert_equal( [26], collated[ :z ] )
assert_equal( [ %w|a b z|, %w|a b c|], collated[ :all ], "Two
arrays are not merged." )
assert_equal( [%w|foo bar|], collated[ :stuff1 ],
"Arrays unique to one side are wrapped" )
assert_equal( [%w|jim jam|], collated[ :stuff2 ],
"Arrays unique to one side are wrapped" )
assert_equal( [%w|a b|], collated[ :whee ], "Preserve arrays +
uniq == duplicate arrays are removed" )
end
def test5_multi_collate
collated = $a.collate( $b ).collate( $c )
assert_equal( [1,1,1], collated[ :a ] )
assert_equal( [2,4,8], collated[ :b ] )
assert_equal( [9,27], collated[ :c ] )
end
def test6_multi_collate_with_preserve
collated = $a.collate( $b, :preserve_arrays=>1 ).collate( $c )
assert_equal( [1,1,1], collated[ :a ] )
assert_equal( [2,4,8], collated[ :b ] )
assert_equal( [9,27], collated[ :c ] )

collated = $a.collate( $b ).collate( $c, :preserve_arrays=>1 )
assert_equal( [[1,1],1], collated[ :a ] )
assert_equal( [[2,4],8], collated[ :b ] )
assert_equal( [[9],27], collated[ :c ] )

collated =
$a.collate( $b, :preserve_arrays=>1 ).collate( $c, :preserve_arrays=>1 )
assert_equal( [[1,1],1], collated[ :a ] )
assert_equal( [[2,4],8], collated[ :b ] )
assert_equal( [[9],27], collated[ :c ] )
end
end
end
 
P

Phrogz

I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that's in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

So, I wrote my own. Comments welcome. Efficiency patches particularly
welcome. Under a different name, perhaps Trans might consider it for
inclusion in Facets.

<snip stupidly-wrapped code>

Please find properly-formatted code @ http://pastie.caboo.se/130291
Sorry for the extra noise.
 
J

Joel VanderWerf

Phrogz wrote:
...
# { :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5
# #=> { :a=>[1,3], :b=>[2,4], :c=>[5] }

Do these two give the same result? Does it matter?

{ :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5
{ :a=>1, :b=>2, :c=>5 }.collate :a=>3, :b=>4
 
P

Phrogz

Phrogz said:
# { :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5
# #=> { :a=>[1,3], :b=>[2,4], :c=>[5] }

Do these two give the same result? Does it matter?

{ :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5
{ :a=>1, :b=>2, :c=>5 }.collate :a=>3, :b=>4

They don't. In my particular use case today, I only used the result as
a set, so a proper Set might have been more appropriate. But I don't
know; I think that preserving the order is probably useful, at least
when not using the #uniq option. (I'm thinking perhaps of a case where
you're specifying a series of fallback results for a variety of
options.)

Totally up for grabs, though, if there's a faster, more elegant
solution that doesn't use that.
 
T

Trans

I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that's in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

That's from Facets, probably. But the latest version of Facets renamed
it to #mash, for "map hash", which is more descriptive of what it
does. (#collate remains an alias for the time being).

I like your definition --actually I'm surprised I haven't worked this
functionality into Facets yet. I guess I thought #weave took care of
it, but that's slightly different b/c it only combines arrays if the
value is already an array. So I'm going to add this to Facets. A
couple thoughts though...

The options don't feel quite right. Maybe it would more versatile to
define #uniq on Hash? So then

{ :a=>1, :b=>2 }.collate( { :a=>1, :b=>3 } ).uniq
#=> { :a=>[1], :b=>[2,3] }

As for preserving the arrays, I'm not sure. Is that really all that
useful? Well, if it is it seems like a better definition for Hash#zip.

T.
 
P

Phrogz

I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that's in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

That's from Facets, probably. But the latest version of Facets renamed
it to #mash, for "map hash", which is more descriptive of what it
does. (#collate remains an alias for the time being).

I like your definition --actually I'm surprised I haven't worked this
functionality into Facets yet. I guess I thought #weave took care of
it, but that's slightly different b/c it only combines arrays if the
value is already an array. So I'm going to add this to Facets. A
couple thoughts though...

The options don't feel quite right. Maybe it would more versatile to
define #uniq on Hash? So then

{ :a=>1, :b=>2 }.collate( { :a=>1, :b=>3 } ).uniq
#=> { :a=>[1], :b=>[2,3] }

That's an excellent point. I needed this functionality today and so I
included it in the script; however, since it's a simple one-line (as
seen in the implementation) post-process step, perhaps it's
appropriate to keep it out of this method.

As for preserving the arrays, I'm not sure. Is that really all that
useful? Well, if it is it seems like a better definition for Hash#zip.

The reason I made the arrays not be preserved by default is to enable
chained collation of 3 or more hashes. (test5_multicollate in the unit
tests.) I was actually collating hundreds today. However, I put in the
'preserve arrays' because it seemed almost arbitrary to treat them
differently from every other type of value. I don't personally have a
use case that needs it now, but I know from experience (like #flatten
versus #flatten_once) how sometimes arrays of arrays can suddenly crop
up and need to be preserved.

I would dearly love to get rid of the options hash altogether,
though. :)
 
P

Phrogz

That's from Facets, probably. But the latest version of Facets renamed
it to #mash, for "map hash", which is more descriptive of what it
does. (#collate remains an alias for the time being).
I like your definition --actually I'm surprised I haven't worked this
functionality into Facets yet. I guess I thought #weave took care of
it, but that's slightly different b/c it only combines arrays if the
value is already an array. So I'm going to add this to Facets. A
couple thoughts though...
The options don't feel quite right. Maybe it would more versatile to
define #uniq on Hash? So then
{ :a=>1, :b=>2 }.collate( { :a=>1, :b=>3 } ).uniq
#=> { :a=>[1], :b=>[2,3] }

That's an excellent point. I needed this functionality today and so I
included it in the script; however, since it's a simple one-line (as
seen in the implementation) post-process step, perhaps it's
appropriate to keep it out of this method.
As for preserving the arrays, I'm not sure. Is that really all that
useful? Well, if it is it seems like a better definition for Hash#zip.

The reason I made the arrays not be preserved by default is to enable
chained collation of 3 or more hashes. (test5_multicollate in the unit
tests.) I was actually collating hundreds today. However, I put in the
'preserve arrays' because it seemed almost arbitrary to treat them
differently from every other type of value. I don't personally have a
use case that needs it now, but I know from experience (like #flatten
versus #flatten_once) how sometimes arrays of arrays can suddenly crop
up and need to be preserved.

I would dearly love to get rid of the options hash altogether,
though. :)

One alternative would be to drop the idea of preserving collation
order altogether, and instead accumulate the results as a Set.
Although the method would still need to branch on value type (since
set1 << set2 isn't the same as set1.merge set2), it seems far less
likely that someone would have a Hash whose values were Sets and
wanted to maintain each set as a distinct 'value' during collation.
 
N

Nobuyoshi Nakada

Hi,

At Wed, 19 Dec 2007 09:05:11 +0900,
Phrogz wrote in [ruby-talk:284104]:
I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that's in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

{:a=>1, :b=>2 }.update:)a=>3, :b=>4, :c=>5) {|key, *values| values}
 
T

Trans

Hi,

At Wed, 19 Dec 2007 09:05:11 +0900,
Phrogz wrote in [ruby-talk:284104]:
I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that's in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

{:a=>1, :b=>2 }.update:)a=>3, :b=>4, :c=>5) {|key, *values| values}

Woh! Little known is this karate!

You can even do:

{:a=>1, :b=>2 }.update:)a=>[1,3], :b=>4, :c=>5) {|key, *values|
values.flatten.uniq}
=> {:a=>[1, 3], :b=>[2, 4], :c=>5}

T.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top