Set operations on arrays of non-trivial objects

Discussion in 'Ruby' started by Jon Egil Stand, Mar 7, 2007.

  1. # MAIN QUESTION

    # Is there a nice and fast way to do set operations on arrays of
    # non-trivial objects.


    # BACKGROUND

    # Set operations are beatiful:

    a = [1,2,3,4,5]
    b = [2,4,6,8,10]

    (a & b)
    # => [2,4]


    # When comparing lists, which is more or less what administration of
    # pension schemes is all about, this is very, very nice.

    # I very often use stuff like
    (a - (a&b))

    # => [1,3,5]

    # The array library is blazingly fast, I'm really happy with it.


    # The challenge arises when my lists don't comprise of fixnums, but instead of
    # more specified objects. To simplify:

    class Person
    def initialize(name, ssid)
    @name = name
    @ssid = ssid
    end

    attr_reader :name, :ssid
    end

    list = []
    list << Person.new('Peter Zapffe', 1)
    list << Person.new('Peter Pan', 2)
    list << Person.new('Saint Peter', 3)

    # Let's say I want to UNION that to (b = [2,4,6,8,10]) from above, using ssid
    # as key.It should in that case return the 'Peter Pan'-person, since he is the
    # only one with an ssid included i the list.

    (list & b)
    # => []

    # I can do stuff like:

    list_ssid = list.collect{|p| p.ssid}
    # => [1,2,3]

    union = list_ssid & b
    # => [2]

    list.select{|p| union.include? p.ssid}
    # => [#<Person:0x2b8b300 @name="Peter Pan", @ssid=2>]


    # This works, and correctness is always nice, but it doesn't scale very nice,
    # basicly because union.include? scans the union-list from scratch every time.

    # I have implemented this before, by sorting both lists and stepping through
    # them one at a time. That's still correct and much faster, but I have
    a feeling
    # it's possible to do it in a much more elegant way, using some sort of
    # set-operations.

    # I would appreciate any hints and or pointers.

    # Thank's for reading.
    # JE
    Jon Egil Stand, Mar 7, 2007
    #1
    1. Advertising

  2. On 07.03.2007 08:56, Jon Egil Stand wrote:
    > # MAIN QUESTION
    >
    > # Is there a nice and fast way to do set operations on arrays of
    > # non-trivial objects.
    >
    >
    > # BACKGROUND
    >
    > # Set operations are beatiful:
    >
    > a = [1,2,3,4,5]
    > b = [2,4,6,8,10]
    >
    > (a & b)
    > # => [2,4]
    >
    >
    > # When comparing lists, which is more or less what administration of
    > # pension schemes is all about, this is very, very nice.
    >
    > # I very often use stuff like
    > (a - (a&b))
    >
    > # => [1,3,5]
    >
    > # The array library is blazingly fast, I'm really happy with it.
    >
    >
    > # The challenge arises when my lists don't comprise of fixnums, but
    > instead of
    > # more specified objects. To simplify:
    >
    > class Person
    > def initialize(name, ssid)
    > @name = name
    > @ssid = ssid
    > end
    >
    > attr_reader :name, :ssid
    > end
    >
    > list = []
    > list << Person.new('Peter Zapffe', 1)
    > list << Person.new('Peter Pan', 2)
    > list << Person.new('Saint Peter', 3)
    >
    > # Let's say I want to UNION that to (b = [2,4,6,8,10]) from above, using
    > ssid
    > # as key.It should in that case return the 'Peter Pan'-person, since he
    > is the
    > # only one with an ssid included i the list.
    >
    > (list & b)
    > # => []
    >
    > # I can do stuff like:
    >
    > list_ssid = list.collect{|p| p.ssid}
    > # => [1,2,3]
    >
    > union = list_ssid & b
    > # => [2]
    >
    > list.select{|p| union.include? p.ssid}
    > # => [#<Person:0x2b8b300 @name="Peter Pan", @ssid=2>]


    You can do this simpler:

    list.select {|p| b.include? p.ssid}

    Note, make b a Set for more efficiency (see below).

    > # This works, and correctness is always nice, but it doesn't scale very
    > nice,
    > # basicly because union.include? scans the union-list from scratch
    > every time.
    >
    > # I have implemented this before, by sorting both lists and stepping
    > through
    > # them one at a time. That's still correct and much faster, but I have
    > a feeling
    > # it's possible to do it in a much more elegant way, using some sort of
    > # set-operations.
    >
    > # I would appreciate any hints and or pointers.
    >
    > # Thank's for reading.
    > # JE


    First, when you work with sets you should probably use Set - if your
    sets get large you will notice the speed difference.

    irb(main):002:0> a = [1,2,3,4,5].to_set
    => #<Set: {5, 1, 2, 3, 4}>
    irb(main):003:0> b = [2,4,6,8,10].to_set
    => #<Set: {6, 2, 8, 4, 10}>
    irb(main):004:0> a & b
    => #<Set: {2, 4}>

    Secondly, it seems you're basically creating subsets based on some
    criteria. If you just had a single key, you could define your class's
    equivalence (#hash, #eql?, #==) to reflect that and use the approach you
    tried, i.e. using set unions.

    The ad hoc solution is to actually use #select as you did. But if the
    criteria change, you rather need a structure that resembles an RDBMS
    table which can have any number of indexes. If you have to do this
    multiple times probably also with varying criteria you might want to
    create a wrapper around a Set that can support arbitrary number of
    indexes (Hashes) and keeps those consistent much like an RDBMS does.

    You might find some more information in the archives of this list.

    Kind regards

    robert
    Robert Klemme, Mar 7, 2007
    #2
    1. Advertising

  3. Robert

    Thank you, I will heed your advice. Set is a nice library, and it's
    use of Hash for blazingly fast lookup scales very well.
    Jon Egil Stand, Mar 8, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jesus M. Salvo Jr.
    Replies:
    2
    Views:
    4,043
    robert
    Feb 11, 2006
  2. William Payne
    Replies:
    5
    Views:
    454
    William Payne
    Oct 12, 2004
  3. Michal Raburski

    wxPtyhon 2.4 non trivial problems :D

    Michal Raburski, Mar 11, 2005, in forum: Python
    Replies:
    0
    Views:
    266
    Michal Raburski
    Mar 11, 2005
  4. baibaichen

    trivial or non-trivial object

    baibaichen, Jan 12, 2006, in forum: C++
    Replies:
    3
    Views:
    885
    osmium
    Jan 12, 2006
  5. Eric Lilja
    Replies:
    2
    Views:
    325
    Eric Lilja
    Aug 14, 2007
Loading...

Share This Page