Finding shared elements between to arrays.

Sebastian probst Eide · Oct 9, 2007

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

Well... here are the two solutions that seemed the most obvious to me. I
profiled them and the second one performed a little better:

a1 = ['a', 'b', 'c', 'd']
a2 = ['c', 'd', 'e', 'f']
a3 = []

#Solution 1
a1.each { |e| a3 << e if a2.include?(e) }

#Solution 2
a3 = a1.map { |e| e if a2.include?(e) }
a3.compact!

Info from the profiler:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
41.56 29.29 29.29 400000 0.07 0.11 Array#include?
28.31 49.24 19.95 100000 0.20 0.68 Array#each
22.06 64.79 15.55 1100000 0.01 0.01 String#==
4.19 67.74 2.95 200000 0.01 0.01 Array#<<
3.89 70.48 2.74 1 2740.00 70480.00 Integer#upto
0.00 70.48 0.00 1 0.00 70480.00 #toplevel

% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.82 28.24 28.24 400000 0.07 0.11 Array#include?
23.32 43.27 15.03 1100000 0.01 0.01 String#==
22.25 57.61 14.34 100000 0.14 0.58 Array#map
8.50 63.09 5.48 1 5480.00 64450.00 Integer#upto
2.11 64.45 1.36 100000 0.01 0.01 Array#compact!
0.00 64.45 0.00 1 0.00 64450.00 #toplevel

Thanks in advance!

Best regards
Sebastian

Carl Porth · Oct 9, 2007

I think the operation you are looking for is a set intersection. In
Ruby, it is the & method.

['a', 'b', 'c', 'd'] & ['c', 'd', 'e', 'f']

Click to expand...

=> ["c", "d"]

Carl

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

Well... here are the two solutions that seemed the most obvious to me. I
profiled them and the second one performed a little better:

a1 = ['a', 'b', 'c', 'd']
a2 = ['c', 'd', 'e', 'f']
a3 = []

#Solution 1
a1.each { |e| a3 << e if a2.include?(e) }

#Solution 2
a3 = a1.map { |e| e if a2.include?(e) }
a3.compact!

Info from the profiler:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
41.56 29.29 29.29 400000 0.07 0.11 Array#include?
28.31 49.24 19.95 100000 0.20 0.68 Array#each
22.06 64.79 15.55 1100000 0.01 0.01 String#==
4.19 67.74 2.95 200000 0.01 0.01 Array#<<
3.89 70.48 2.74 1 2740.00 70480.00 Integer#upto
0.00 70.48 0.00 1 0.00 70480.00 #toplevel

% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.82 28.24 28.24 400000 0.07 0.11 Array#include?
23.32 43.27 15.03 1100000 0.01 0.01 String#==
22.25 57.61 14.34 100000 0.14 0.58 Array#map
8.50 63.09 5.48 1 5480.00 64450.00 Integer#upto
2.11 64.45 1.36 100000 0.01 0.01 Array#compact!
0.00 64.45 0.00 1 0.00 64450.00 #toplevel

Thanks in advance!

Best regards
Sebastian

Eric I. · Oct 9, 2007

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

The & operator, as defined on Arrays, performs an intersection
operation.

From the RDoc:

array & other_array
------------------------------------------------------------------------
Set Intersection---Returns a new array containing elements common
to the two arrays, with no duplicates.

[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]

Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:

a3 = a1.select { |e| a2.include?(e) }

Best,

Eric

Sebastian probst Eide · Oct 9, 2007

Nice Carl!
Thank you very much!

Sebastian

Sebastian probst Eide · Oct 9, 2007

Yeah, I was wondering if there wasn't a better way than using map and
compact!
Ruby has so many functions that it's often hard for me to know which one
to chose. I guess I'll learn as I use Ruby more!

Thanks for the reply Eric!

Sebastian

Robert Klemme · Oct 10, 2007

2007/10/10 said:
Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

Click to expand...

The & operator, as defined on Arrays, performs an intersection
operation.

From the RDoc:

Click to expand...

array & other_array
------------------------------------------------------------------------
Set Intersection---Returns a new array containing elements common
to the two arrays, with no duplicates.

[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]

Click to expand...

Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:

a3 = a1.select { |e| a2.include?(e) }

Just a caveat: for large Arrays it might be inefficient (I don't know
how it's implemented). You might want to look at class Set:

irb(main):001:0> require 'set'
=> true
irb(main):002:0> [ 1, 1, 3, 5 ].to_set.intersection [ 1, 2, 3 ]
=> #<Set: {1, 3}>

Kind regards

robert

Sebastian probst Eide · Oct 10, 2007

I'll check it out!
Thanks Robert.

By the way: What a great forum this is! So many helpful, friendly and
knowledgeable people!

Thanks to all of you!

Best regards
Sebastian

Robert Dober · Oct 10, 2007

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

Click to expand...

The & operator, as defined on Arrays, performs an intersection
operation.

From the RDoc:

Click to expand...

array & other_array
------------------------------------------------------------------------
Set Intersection---Returns a new array containing elements common
to the two arrays, with no duplicates.

[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]

Click to expand...

Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:

a3 = a1.select { |e| a2.include?(e) }

Best,

Eric

----

Oops I just made a fool out of myself in a different thread, funny
nobody hollered at me so far, please let me repair the damage here
a3 = a1.select { |e| a2.include?(e) }
is *not* a1 & a2
[1,1,2].select{ |x| [1,1,3].include? x }
is *not*
[1,1,2] & [1,1,3]

Well so much for being clever, I believe it was Kent Back, "clever is
an insult".

Cheers
Robert

Rick DeNatale · Oct 10, 2007

----

Click to expand...

Oops I just made a fool out of myself in a different thread, funny
nobody hollered at me so far, please let me repair the damage here
a3 = a1.select { |e| a2.include?(e) }
is *not* a1 & a2
[1,1,2].select{ |x| [1,1,3].include? x }
is *not*
[1,1,2] & [1,1,3]

The difference being that & eliminates duplicates from the result.

Another semantic equivalent for a1.select{ |e| a2.include?(e) }

is either
a1 - (a1 - a2)
or
a2 - (a2 - a1)

Whether or not one of these performs better than the select depends on
the 'statistical' properites of the arrays:

require "benchmark"

TESTS = 10_000
a1 = [1, 2, 3, 1, 1, 4, 5] * 100
a2 = [1, 3, 5]

r1 = a1.select { |e| a2.include?(e) }
r2 = a2.select { |e| a1.include?(e) }
r3 = a1 - (a1 - a2)
r4 = a2 - (a2 - a1)

puts r1 = r2 ? "Good" : "Bad"
puts r2 = r3 ? "Good" : "Bad"
puts r3 = r4 ? "Good" : "Bad"

Benchmark.bmbm do |results|
results.report("select a1 in a2:") { TESTS.times { a1.select { |e|
a2.include?(e) } } }
results.report("select a2 in a1:") { TESTS.times { a2.select { |e|
a1.include?(e) } } }
results.report("a1 - (a1 - a2) :") { TESTS.times { a1 - (a1 - a2 ) } }
results.report("a2 - (a2 - a1) :") { TESTS.times { a2 - (a2 - a1) } }
end

Produces:

Good
Good
Good
Rehearsal ----------------------------------------------------
select a1 in a2: 3.360000 0.020000 3.380000 ( 3.417788)
select a2 in a1: 0.030000 0.000000 0.030000 ( 0.026397)
a1 - (a1 - a2) : 0.670000 0.000000 0.670000 ( 0.680342)
a2 - (a2 - a1) : 0.240000 0.010000 0.250000 ( 0.250795)
------------------------------------------- total: 4.330000sec

user system total real
select a1 in a2: 3.360000 0.010000 3.370000 ( 3.386324)
select a2 in a1: 0.020000 0.000000 0.020000 ( 0.024740)
a1 - (a1 - a2) : 0.660000 0.000000 0.660000 ( 0.670034)
a2 - (a2 - a1) : 0.240000 0.000000 0.240000 ( 0.242316)

Robert Dober · Oct 10, 2007

----

Click to expand...

Oops I just made a fool out of myself in a different thread, funny
nobody hollered at me so far, please let me repair the damage here
a3 = a1.select { |e| a2.include?(e) }
is *not* a1 & a2
[1,1,2].select{ |x| [1,1,3].include? x }
is *not*
[1,1,2] & [1,1,3]

Click to expand...

The difference being that & eliminates duplicates from the result.

Another semantic equivalent for a1.select{ |e| a2.include?(e) }

is either
a1 - (a1 - a2)
or
a2 - (a2 - a1)

ah interesting
I do however feel very bad about the duplicate elimination of Array#&
this really seems to be worth a RCR, Arrays are just not sets from a
conceptional POV and from a practical POV we have uniq if we want to
be the result unique.

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it

Maybe Array#intersect would be nice to have too?

Cheers
Robert

Rick DeNatale · Oct 10, 2007

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it

Heck, I use it so infrequently that I just thought of it. <G>

Eric I. · Oct 11, 2007

I do however feel very bad about the duplicate elimination of Array#&
this really seems to be worth a RCR, Arrays are just not sets from a
conceptional POV and from a practical POV we have uniq if we want to
be the result unique.

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it
Maybe Array#intersect would be nice to have too?

Well a1 - (a1 - a2) is not necessarily the same as a2 - (a2 - a1).
The difference is whether the duplicates in a1 are maintained (the
former) or the duplicates is a2 are maintained.

For example, suppose a1 is [1, 1, 2, 3, 4] and a2 is [1, 2, 2, 3, 5].
The result would be either [1, 1, 2, 3] or [1, 2, 2, 3].

If you wanted the & operator to maintain duplicates, would it maintain
duplicates on the left-hand side, the right-hand side, or both? And
if it is the LHS or RHS, then we've lost commuatativity (a1 & a2 is
not necessarily equal to a2 & a1).

Eric

P.S. What is "RCR"?

Wolfgang NÃ¡dasi-Donner · Oct 11, 2007

Eric said:
P.S. What is "RCR"?

A "Ruby change request" - see http://rcrchive.net/

Wolfgang NÃ¡dasi-Donner

Rick DeNatale · Oct 12, 2007

Well a1 - (a1 - a2) is not necessarily the same as a2 - (a2 - a1).
The difference is whether the duplicates in a1 are maintained (the
former) or the duplicates is a2 are maintained.

For example, suppose a1 is [1, 1, 2, 3, 4] and a2 is [1, 2, 2, 3, 5].
The result would be either [1, 1, 2, 3] or [1, 2, 2, 3].

Yep, you're right, not enough test cases! <G>

I realize now that the OPs request for "an array which has the
elements shared by two other arrays" is a little ambiguous as to
duplications.
Note that the OPs solutions and the solutions using select produce the
same different results if you reverse the two arrays.

a1 = [1,1,2,3,4]
a2 = [1,2,2,3,5]
a1 - (a1-a2) => [1, 1, 2, 3]
a2 - (a2-a1)=> [1, 2, 2, 3]
a1.select {|e| a2.include? e} => [1, 1, 2, 3]
a2.select {|e| a1.include? e}=> [1, 2, 2, 3]

Should the elements shared between [1,1,2,3,4] and [1,2,2,3,5] be
different than the elements shared between [1,2,2,3,5] and
[1,1,2,3,4]?

Sebastian probst Eide · Oct 12, 2007

I realize now that the OPs request for "an array which has the

elements shared by two other arrays" is a little ambiguous as to
duplications.
Note that the OPs solutions and the solutions using select produce the
same different results if you reverse the two arrays.

Well, I never really thought about the possibility of duplicates. In my
case the arrays wouldn't contain any dupes, but I wouldn't want my
resulting array to have any anyway!

S

Robert Klemme · Oct 13, 2007

Well, I never really thought about the possibility of duplicates. In my
case the arrays wouldn't contain any dupes, but I wouldn't want my
resulting array to have any anyway!

In that case you should probably use Set anyway as typical set
operations are more efficient with Set. It is just the right data type
to model a set, which seems what you are trying to do.

Kind regards

robert

ruby profile gave different result in Windows and Linux	2	Nov 29, 2006
Performance question: Rational	4	Apr 29, 2005
Can't get ZenProfile to work	2	Nov 30, 2005
A problem while using RadRails to debug	0	Jan 2, 2010
ruby -r profile gives wrong percentage?	2	Dec 20, 2005
Profiling question	3	Jan 22, 2005
Profiler bug in 1.8.2?	4	Jan 7, 2005
Performance and Profiling help	3	Dec 24, 2007

Finding shared elements between to arrays.

Sebastian probst Eide

Carl Porth

Eric I.

Sebastian probst Eide

Sebastian probst Eide

Robert Klemme

Sebastian probst Eide

Robert Dober

Rick DeNatale

Robert Dober

Rick DeNatale

Eric I.

Wolfgang NÃ¡dasi-Donner

Rick DeNatale

Sebastian probst Eide

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads