Finding shared elements between to arrays.

  • Thread starter Sebastian probst Eide
  • Start date
S

Sebastian probst Eide

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

Well... here are the two solutions that seemed the most obvious to me. I
profiled them and the second one performed a little better:

a1 = ['a', 'b', 'c', 'd']
a2 = ['c', 'd', 'e', 'f']
a3 = []

#Solution 1
a1.each { |e| a3 << e if a2.include?(e) }

#Solution 2
a3 = a1.map { |e| e if a2.include?(e) }
a3.compact!


Info from the profiler:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
41.56 29.29 29.29 400000 0.07 0.11 Array#include?
28.31 49.24 19.95 100000 0.20 0.68 Array#each
22.06 64.79 15.55 1100000 0.01 0.01 String#==
4.19 67.74 2.95 200000 0.01 0.01 Array#<<
3.89 70.48 2.74 1 2740.00 70480.00 Integer#upto
0.00 70.48 0.00 1 0.00 70480.00 #toplevel

% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.82 28.24 28.24 400000 0.07 0.11 Array#include?
23.32 43.27 15.03 1100000 0.01 0.01 String#==
22.25 57.61 14.34 100000 0.14 0.58 Array#map
8.50 63.09 5.48 1 5480.00 64450.00 Integer#upto
2.11 64.45 1.36 100000 0.01 0.01 Array#compact!
0.00 64.45 0.00 1 0.00 64450.00 #toplevel

Thanks in advance!

Best regards
Sebastian
 
C

Carl Porth

I think the operation you are looking for is a set intersection. In
Ruby, it is the & method.
['a', 'b', 'c', 'd'] & ['c', 'd', 'e', 'f']
=> ["c", "d"]

Carl

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

Well... here are the two solutions that seemed the most obvious to me. I
profiled them and the second one performed a little better:

a1 = ['a', 'b', 'c', 'd']
a2 = ['c', 'd', 'e', 'f']
a3 = []

#Solution 1
a1.each { |e| a3 << e if a2.include?(e) }

#Solution 2
a3 = a1.map { |e| e if a2.include?(e) }
a3.compact!

Info from the profiler:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
41.56 29.29 29.29 400000 0.07 0.11 Array#include?
28.31 49.24 19.95 100000 0.20 0.68 Array#each
22.06 64.79 15.55 1100000 0.01 0.01 String#==
4.19 67.74 2.95 200000 0.01 0.01 Array#<<
3.89 70.48 2.74 1 2740.00 70480.00 Integer#upto
0.00 70.48 0.00 1 0.00 70480.00 #toplevel

% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.82 28.24 28.24 400000 0.07 0.11 Array#include?
23.32 43.27 15.03 1100000 0.01 0.01 String#==
22.25 57.61 14.34 100000 0.14 0.58 Array#map
8.50 63.09 5.48 1 5480.00 64450.00 Integer#upto
2.11 64.45 1.36 100000 0.01 0.01 Array#compact!
0.00 64.45 0.00 1 0.00 64450.00 #toplevel

Thanks in advance!

Best regards
Sebastian
 
E

Eric I.

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

The & operator, as defined on Arrays, performs an intersection
operation.
From the RDoc:
array & other_array
------------------------------------------------------------------------
Set Intersection---Returns a new array containing elements common
to the two arrays, with no duplicates.

[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]

Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:

a3 = a1.select { |e| a2.include?(e) }

Best,

Eric
 
S

Sebastian probst Eide

Yeah, I was wondering if there wasn't a better way than using map and
compact!
Ruby has so many functions that it's often hard for me to know which one
to chose. I guess I'll learn as I use Ruby more!

Thanks for the reply Eric!

Sebastian
 
R

Robert Klemme

2007/10/10 said:
Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

The & operator, as defined on Arrays, performs an intersection
operation.
From the RDoc:
array & other_array
------------------------------------------------------------------------
Set Intersection---Returns a new array containing elements common
to the two arrays, with no duplicates.

[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]

Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:

a3 = a1.select { |e| a2.include?(e) }

Just a caveat: for large Arrays it might be inefficient (I don't know
how it's implemented). You might want to look at class Set:

irb(main):001:0> require 'set'
=> true
irb(main):002:0> [ 1, 1, 3, 5 ].to_set.intersection [ 1, 2, 3 ]
=> #<Set: {1, 3}>

Kind regards

robert
 
S

Sebastian probst Eide

I'll check it out!
Thanks Robert.

By the way: What a great forum this is! So many helpful, friendly and
knowledgeable people!

Thanks to all of you!

Best regards
Sebastian
 
R

Robert Dober

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
"solutions" myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

The & operator, as defined on Arrays, performs an intersection
operation.
From the RDoc:
array & other_array
------------------------------------------------------------------------
Set Intersection---Returns a new array containing elements common
to the two arrays, with no duplicates.

[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]

Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:

a3 = a1.select { |e| a2.include?(e) }

Best,

Eric

----
Oops I just made a fool out of myself in a different thread, funny
nobody hollered at me so far, please let me repair the damage here
a3 = a1.select { |e| a2.include?(e) }
is *not* a1 & a2
[1,1,2].select{ |x| [1,1,3].include? x }
is *not*
[1,1,2] & [1,1,3]

Well so much for being clever, I believe it was Kent Back, "clever is
an insult".

Cheers
Robert
 
R

Rick DeNatale

Oops I just made a fool out of myself in a different thread, funny
nobody hollered at me so far, please let me repair the damage here
a3 = a1.select { |e| a2.include?(e) }
is *not* a1 & a2
[1,1,2].select{ |x| [1,1,3].include? x }
is *not*
[1,1,2] & [1,1,3]


The difference being that & eliminates duplicates from the result.

Another semantic equivalent for a1.select{ |e| a2.include?(e) }

is either
a1 - (a1 - a2)
or
a2 - (a2 - a1)

Whether or not one of these performs better than the select depends on
the 'statistical' properites of the arrays:

require "benchmark"

TESTS = 10_000
a1 = [1, 2, 3, 1, 1, 4, 5] * 100
a2 = [1, 3, 5]

r1 = a1.select { |e| a2.include?(e) }
r2 = a2.select { |e| a1.include?(e) }
r3 = a1 - (a1 - a2)
r4 = a2 - (a2 - a1)

puts r1 = r2 ? "Good" : "Bad"
puts r2 = r3 ? "Good" : "Bad"
puts r3 = r4 ? "Good" : "Bad"


Benchmark.bmbm do |results|
results.report("select a1 in a2:") { TESTS.times { a1.select { |e|
a2.include?(e) } } }
results.report("select a2 in a1:") { TESTS.times { a2.select { |e|
a1.include?(e) } } }
results.report("a1 - (a1 - a2) :") { TESTS.times { a1 - (a1 - a2 ) } }
results.report("a2 - (a2 - a1) :") { TESTS.times { a2 - (a2 - a1) } }
end

Produces:

Good
Good
Good
Rehearsal ----------------------------------------------------
select a1 in a2: 3.360000 0.020000 3.380000 ( 3.417788)
select a2 in a1: 0.030000 0.000000 0.030000 ( 0.026397)
a1 - (a1 - a2) : 0.670000 0.000000 0.670000 ( 0.680342)
a2 - (a2 - a1) : 0.240000 0.010000 0.250000 ( 0.250795)
------------------------------------------- total: 4.330000sec

user system total real
select a1 in a2: 3.360000 0.010000 3.370000 ( 3.386324)
select a2 in a1: 0.020000 0.000000 0.020000 ( 0.024740)
a1 - (a1 - a2) : 0.660000 0.000000 0.660000 ( 0.670034)
a2 - (a2 - a1) : 0.240000 0.000000 0.240000 ( 0.242316)
 
R

Robert Dober

Oops I just made a fool out of myself in a different thread, funny
nobody hollered at me so far, please let me repair the damage here
a3 = a1.select { |e| a2.include?(e) }
is *not* a1 & a2
[1,1,2].select{ |x| [1,1,3].include? x }
is *not*
[1,1,2] & [1,1,3]


The difference being that & eliminates duplicates from the result.

Another semantic equivalent for a1.select{ |e| a2.include?(e) }

is either
a1 - (a1 - a2)
or
a2 - (a2 - a1)
ah interesting
I do however feel very bad about the duplicate elimination of Array#&
this really seems to be worth a RCR, Arrays are just not sets from a
conceptional POV and from a practical POV we have uniq if we want to
be the result unique.

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it :(
Maybe Array#intersect would be nice to have too?

Cheers
Robert
 
R

Rick DeNatale

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it :(

Heck, I use it so infrequently that I just thought of it. <G>
 
E

Eric I.

I do however feel very bad about the duplicate elimination of Array#&
this really seems to be worth a RCR, Arrays are just not sets from a
conceptional POV and from a practical POV we have uniq if we want to
be the result unique.

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it :(
Maybe Array#intersect would be nice to have too?

Well a1 - (a1 - a2) is not necessarily the same as a2 - (a2 - a1).
The difference is whether the duplicates in a1 are maintained (the
former) or the duplicates is a2 are maintained.

For example, suppose a1 is [1, 1, 2, 3, 4] and a2 is [1, 2, 2, 3, 5].
The result would be either [1, 1, 2, 3] or [1, 2, 2, 3].

If you wanted the & operator to maintain duplicates, would it maintain
duplicates on the left-hand side, the right-hand side, or both? And
if it is the LHS or RHS, then we've lost commuatativity (a1 & a2 is
not necessarily equal to a2 & a1).

Eric

P.S. What is "RCR"?
 
R

Rick DeNatale

Well a1 - (a1 - a2) is not necessarily the same as a2 - (a2 - a1).
The difference is whether the duplicates in a1 are maintained (the
former) or the duplicates is a2 are maintained.

For example, suppose a1 is [1, 1, 2, 3, 4] and a2 is [1, 2, 2, 3, 5].
The result would be either [1, 1, 2, 3] or [1, 2, 2, 3].

Yep, you're right, not enough test cases! <G>

I realize now that the OPs request for "an array which has the
elements shared by two other arrays" is a little ambiguous as to
duplications.
Note that the OPs solutions and the solutions using select produce the
same different results if you reverse the two arrays.

a1 = [1,1,2,3,4]
a2 = [1,2,2,3,5]
a1 - (a1-a2) => [1, 1, 2, 3]
a2 - (a2-a1)=> [1, 2, 2, 3]
a1.select {|e| a2.include? e} => [1, 1, 2, 3]
a2.select {|e| a1.include? e}=> [1, 2, 2, 3]

Should the elements shared between [1,1,2,3,4] and [1,2,2,3,5] be
different than the elements shared between [1,2,2,3,5] and
[1,1,2,3,4]?
 
S

Sebastian probst Eide

I realize now that the OPs request for "an array which has the
elements shared by two other arrays" is a little ambiguous as to
duplications.
Note that the OPs solutions and the solutions using select produce the
same different results if you reverse the two arrays.
Well, I never really thought about the possibility of duplicates. In my
case the arrays wouldn't contain any dupes, but I wouldn't want my
resulting array to have any anyway!

S
 
R

Robert Klemme

Well, I never really thought about the possibility of duplicates. In my
case the arrays wouldn't contain any dupes, but I wouldn't want my
resulting array to have any anyway!

In that case you should probably use Set anyway as typical set
operations are more efficient with Set. It is just the right data type
to model a set, which seems what you are trying to do.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top