Finding shared elements between to arrays.

Discussion in 'Ruby' started by Sebastian probst Eide, Oct 9, 2007.

  1. Hi
    I am wondering if there is a really smart built in way to get an array
    which has the elements shared by two other arrays? I made two
    "solutions" myself, but I am wondering what other people are using, and
    if there is a preferred solution to the task? array.union(other_array)
    or something like that?

    Well... here are the two solutions that seemed the most obvious to me. I
    profiled them and the second one performed a little better:

    a1 = ['a', 'b', 'c', 'd']
    a2 = ['c', 'd', 'e', 'f']
    a3 = []

    #Solution 1
    a1.each { |e| a3 << e if a2.include?(e) }

    #Solution 2
    a3 = a1.map { |e| e if a2.include?(e) }
    a3.compact!


    Info from the profiler:
    % cumulative self self total
    time seconds seconds calls ms/call ms/call name
    41.56 29.29 29.29 400000 0.07 0.11 Array#include?
    28.31 49.24 19.95 100000 0.20 0.68 Array#each
    22.06 64.79 15.55 1100000 0.01 0.01 String#==
    4.19 67.74 2.95 200000 0.01 0.01 Array#<<
    3.89 70.48 2.74 1 2740.00 70480.00 Integer#upto
    0.00 70.48 0.00 1 0.00 70480.00 #toplevel

    % cumulative self self total
    time seconds seconds calls ms/call ms/call name
    43.82 28.24 28.24 400000 0.07 0.11 Array#include?
    23.32 43.27 15.03 1100000 0.01 0.01 String#==
    22.25 57.61 14.34 100000 0.14 0.58 Array#map
    8.50 63.09 5.48 1 5480.00 64450.00 Integer#upto
    2.11 64.45 1.36 100000 0.01 0.01 Array#compact!
    0.00 64.45 0.00 1 0.00 64450.00 #toplevel

    Thanks in advance!

    Best regards
    Sebastian
     
    Sebastian probst Eide, Oct 9, 2007
    #1
    1. Advertisements

  2. Sebastian probst Eide

    Carl Porth Guest

    I think the operation you are looking for is a set intersection. In
    Ruby, it is the & method.
    => ["c", "d"]

    Carl

     
    Carl Porth, Oct 9, 2007
    #2
    1. Advertisements

  3. Sebastian probst Eide

    Eric I. Guest

    The & operator, as defined on Arrays, performs an intersection
    operation.
    Also, another tip: rather than using a combination of map and
    compact!, you could instead use select, as in:

    a3 = a1.select { |e| a2.include?(e) }

    Best,

    Eric
     
    Eric I., Oct 9, 2007
    #3
  4. Nice Carl!
    Thank you very much!

    Sebastian
     
    Sebastian probst Eide, Oct 9, 2007
    #4
  5. Yeah, I was wondering if there wasn't a better way than using map and
    compact!
    Ruby has so many functions that it's often hard for me to know which one
    to chose. I guess I'll learn as I use Ruby more!

    Thanks for the reply Eric!

    Sebastian
     
    Sebastian probst Eide, Oct 9, 2007
    #5
  6. Just a caveat: for large Arrays it might be inefficient (I don't know
    how it's implemented). You might want to look at class Set:

    irb(main):001:0> require 'set'
    => true
    irb(main):002:0> [ 1, 1, 3, 5 ].to_set.intersection [ 1, 2, 3 ]
    => #<Set: {1, 3}>

    Kind regards

    robert
     
    Robert Klemme, Oct 10, 2007
    #6
  7. I'll check it out!
    Thanks Robert.

    By the way: What a great forum this is! So many helpful, friendly and
    knowledgeable people!

    Thanks to all of you!

    Best regards
    Sebastian
     
    Sebastian probst Eide, Oct 10, 2007
    #7
  8. Sebastian probst Eide

    Robert Dober Guest

    Oops I just made a fool out of myself in a different thread, funny
    nobody hollered at me so far, please let me repair the damage here
    a3 = a1.select { |e| a2.include?(e) }
    is *not* a1 & a2
    [1,1,2].select{ |x| [1,1,3].include? x }
    is *not*
    [1,1,2] & [1,1,3]

    Well so much for being clever, I believe it was Kent Back, "clever is
    an insult".

    Cheers
    Robert
     
    Robert Dober, Oct 10, 2007
    #8

  9. The difference being that & eliminates duplicates from the result.

    Another semantic equivalent for a1.select{ |e| a2.include?(e) }

    is either
    a1 - (a1 - a2)
    or
    a2 - (a2 - a1)

    Whether or not one of these performs better than the select depends on
    the 'statistical' properites of the arrays:

    require "benchmark"

    TESTS = 10_000
    a1 = [1, 2, 3, 1, 1, 4, 5] * 100
    a2 = [1, 3, 5]

    r1 = a1.select { |e| a2.include?(e) }
    r2 = a2.select { |e| a1.include?(e) }
    r3 = a1 - (a1 - a2)
    r4 = a2 - (a2 - a1)

    puts r1 = r2 ? "Good" : "Bad"
    puts r2 = r3 ? "Good" : "Bad"
    puts r3 = r4 ? "Good" : "Bad"


    Benchmark.bmbm do |results|
    results.report("select a1 in a2:") { TESTS.times { a1.select { |e|
    a2.include?(e) } } }
    results.report("select a2 in a1:") { TESTS.times { a2.select { |e|
    a1.include?(e) } } }
    results.report("a1 - (a1 - a2) :") { TESTS.times { a1 - (a1 - a2 ) } }
    results.report("a2 - (a2 - a1) :") { TESTS.times { a2 - (a2 - a1) } }
    end

    Produces:

    Good
    Good
    Good
    Rehearsal ----------------------------------------------------
    select a1 in a2: 3.360000 0.020000 3.380000 ( 3.417788)
    select a2 in a1: 0.030000 0.000000 0.030000 ( 0.026397)
    a1 - (a1 - a2) : 0.670000 0.000000 0.670000 ( 0.680342)
    a2 - (a2 - a1) : 0.240000 0.010000 0.250000 ( 0.250795)
    ------------------------------------------- total: 4.330000sec

    user system total real
    select a1 in a2: 3.360000 0.010000 3.370000 ( 3.386324)
    select a2 in a1: 0.020000 0.000000 0.020000 ( 0.024740)
    a1 - (a1 - a2) : 0.660000 0.000000 0.660000 ( 0.670034)
    a2 - (a2 - a1) : 0.240000 0.000000 0.240000 ( 0.242316)
     
    Rick DeNatale, Oct 10, 2007
    #9
  10. Sebastian probst Eide

    Robert Dober Guest

    ah interesting
    I do however feel very bad about the duplicate elimination of Array#&
    this really seems to be worth a RCR, Arrays are just not sets from a
    conceptional POV and from a practical POV we have uniq if we want to
    be the result unique.

    a1 - (a1 - a2 )
    which is the best effort one can make ( I guess ) just seems not so
    readable anymore, and it is not an idiom frequently used so one does
    not get easily acquainted to it :(
    Maybe Array#intersect would be nice to have too?

    Cheers
    Robert
     
    Robert Dober, Oct 10, 2007
    #10
  11. Heck, I use it so infrequently that I just thought of it. <G>
     
    Rick DeNatale, Oct 10, 2007
    #11
  12. Sebastian probst Eide

    Eric I. Guest

    Well a1 - (a1 - a2) is not necessarily the same as a2 - (a2 - a1).
    The difference is whether the duplicates in a1 are maintained (the
    former) or the duplicates is a2 are maintained.

    For example, suppose a1 is [1, 1, 2, 3, 4] and a2 is [1, 2, 2, 3, 5].
    The result would be either [1, 1, 2, 3] or [1, 2, 2, 3].

    If you wanted the & operator to maintain duplicates, would it maintain
    duplicates on the left-hand side, the right-hand side, or both? And
    if it is the LHS or RHS, then we've lost commuatativity (a1 & a2 is
    not necessarily equal to a2 & a1).

    Eric

    P.S. What is "RCR"?
     
    Eric I., Oct 11, 2007
    #12
  13. A "Ruby change request" - see http://rcrchive.net/

    Wolfgang Nádasi-Donner
     
    Wolfgang Nádasi-Donner, Oct 11, 2007
    #13
  14. Yep, you're right, not enough test cases! <G>

    I realize now that the OPs request for "an array which has the
    elements shared by two other arrays" is a little ambiguous as to
    duplications.
    Note that the OPs solutions and the solutions using select produce the
    same different results if you reverse the two arrays.

    a1 = [1,1,2,3,4]
    a2 = [1,2,2,3,5]
    a1 - (a1-a2) => [1, 1, 2, 3]
    a2 - (a2-a1)=> [1, 2, 2, 3]
    a1.select {|e| a2.include? e} => [1, 1, 2, 3]
    a2.select {|e| a1.include? e}=> [1, 2, 2, 3]

    Should the elements shared between [1,1,2,3,4] and [1,2,2,3,5] be
    different than the elements shared between [1,2,2,3,5] and
    [1,1,2,3,4]?
     
    Rick DeNatale, Oct 12, 2007
    #14
  15. I realize now that the OPs request for "an array which has the
    Well, I never really thought about the possibility of duplicates. In my
    case the arrays wouldn't contain any dupes, but I wouldn't want my
    resulting array to have any anyway!

    S
     
    Sebastian probst Eide, Oct 12, 2007
    #15
  16. In that case you should probably use Set anyway as typical set
    operations are more efficient with Set. It is just the right data type
    to model a set, which seems what you are trying to do.

    Kind regards

    robert
     
    Robert Klemme, Oct 13, 2007
    #16
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.