remove duplicates of array of object based on a attribute

Discussion in 'Ruby' started by senthil, Mar 6, 2007.

  1. senthil

    senthil Guest

    hi all,
    how to remove duplicates of an array of objects based a
    attribute of the object. For ex
    i am having an array of ruby beans named diagnoses . i want
    remove duplicates from the based on the diagnoses id. assume diagnoses
    have attributes id and weightage .So for two diagnoses with same id and
    different weightage , the diagnoses with lower weightage should be
    removed.
    Can anyone help me??

    --
    Posted via http://www.ruby-forum.com/.
    senthil, Mar 6, 2007
    #1
    1. Advertising

  2. senthil

    Phrogz Guest

    Re: remove duplicates of array of object based on a attribute

    On Mar 6, 7:03 am, senthil <> wrote:
    > hi all,
    > how to remove duplicates of an array of objects based a
    > attribute of the object. For ex
    > i am having an array of ruby beans named diagnoses . i want
    > remove duplicates from the based on the diagnoses id. assume diagnoses
    > have attributes id and weightage .So for two diagnoses with same id and
    > different weightage , the diagnoses with lower weightage should be
    > removed.


    Here's my best shot at it:

    require 'set'
    class Array
    def uniq_by
    seen = Set.new
    select{ |x| seen.add?( yield( x ) ) }
    end
    end

    a = [ {:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3} ]
    p a, a.uniq, a.uniq_by{ |h| h[:a] }
    #=> [{:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3}]
    #=> [{:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3}]
    #=> [{:a=>1, :d=>1}, {:b=>2}]

    (Note how :b=>2 and :c=>3 have the same value for :a (nil), so only
    one is included.)

    Here's another (assumedly slower) version that doesn't rely on Set:

    class Array
    def uniq_by
    seen = {}
    select{ |x|
    v = yield(x)
    !seen[v] && (seen[v]=true)
    }
    end
    end
    Phrogz, Mar 6, 2007
    #2
    1. Advertising

  3. senthil

    Guest

    Re: remove duplicates of array of object based on a attribute

    On 3/6/07, senthil <> wrote:
    > i am having an array of ruby beans named diagnoses . i want
    > remove duplicates from the based on the diagnoses id. assume diagnoses
    > have attributes id and weightage .So for two diagnoses with same id and
    > different weightage , the diagnoses with lower weightage should be
    > removed.
    > Can anyone help me??


    From: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/228538

    module Enumerable
    def group_by &b
    h = Hash.new{|h,k| h[k] = []}
    each{|x| h[x.instance_eval(&b)] << x}
    h.values
    end
    end

    old_diagnoses = [
    {:id => 1, :w => 30},
    {:id => 2, :w => 20},
    {:id => 3, :w => 10},
    {:id => 1, :w => 10},
    {:id => 1, :w => 40},
    {:id => 2, :w => 50},
    {:id => 4, :w => 60},
    {:id => 4, :w => 30},
    {:id => 2, :w => 20},
    {:id => 3, :w => 10}
    ]
    new_diagnoses = []

    groups = old_diagnoses.group_by{ |d| d[:id] }

    groups.each do |group|
    new_diagnoses << group.sort_by{ |g| g[:w] }.last
    end

    p old_diagnoses
    p new_diagnoses

    [{:w=>30, :id=>1}, {:w=>20, :id=>2}, {:w=>10, :id=>3}, {:w=>10, :id=>1},
    {:w=>40, :id=>1}, {:w=>50, :id=>2}, {:w=>60, :id=>4}, {:w=>30, :id=>4},
    {:w=>20, :id=>2}, {:w=>10, :id=>3}]

    [{:w=>40, :id=>1}, {:w=>50, :id=>2}, {:w=>10, :id=>3}, {:w=>60, :id=>4}]
    , Mar 6, 2007
    #3
  4. senthil

    Phrogz Guest

    Re: remove duplicates of array of object based on a attribute

    On Mar 6, 7:27 am, "Phrogz" <> wrote:
    > Here's another (assumedly slower) version that doesn't rely on Set:


    Huh...actually, the hash-based one seems faster than the Set-based
    one:

    require 'set'
    class Array
    def uniq_by1
    seen = Set.new
    select{ |x| seen.add?( yield( x ) ) }
    end
    def uniq_by2
    seen = {}
    select{ |x| !seen[v=yield(x)] && (seen[v]=true) }
    end
    end

    require 'benchmark'
    a = [ {:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3},
    {:a=>2, :e=>7}, {:a=>3, :b=>2}, {:a=>1}, {:a=>4}, {:f=>6} ]
    N = 10_000
    Benchmark.bmbm{ |x|
    x.report( 'with_set' ){
    N.times{
    a.uniq_by1{ |h| h[:a] }
    a.uniq_by1{ |h| h[:b] }
    }
    }
    x.report( 'with_hash' ){
    N.times{
    a.uniq_by2{ |h| h[:a] }
    a.uniq_by2{ |h| h[:b] }
    }
    }
    }

    #=> Rehearsal ---------------------------------------------
    #=> with_set 1.840000 0.030000 1.870000 ( 2.401238)
    #=> with_hash 1.270000 0.030000 1.300000 ( 1.701307)
    #=> ------------------------------------ total: 3.170000sec
    #=>
    #=> user system total real
    #=> with_set 1.820000 0.020000 1.840000 ( 2.187477)
    #=> with_hash 1.250000 0.020000 1.270000 ( 1.555490)

    (Yes, my laptop is rather old and slow.)
    Phrogz, Mar 6, 2007
    #4
  5. senthil

    Pit Capitain Guest

    senthil, please don't take this personally, your question is OK, but the
    following sounds so very wrong:

    > i am having an array of ruby beans (...)


    All we have in Ruby are objects. No beans, POROs, ERBs, and all this cruft.

    Regards,
    Pit
    Pit Capitain, Mar 6, 2007
    #5
  6. Re: remove duplicates of array of object based on a attribute

    And here's the inevitable one-liner... :}

    (But I do prefer the group_by version...)

    gegroet,
    Erik V. - http://www.erikveen.dds.nl/

    ----------------------------------------------------------------

    ################################################################

    arr = [
    {:id => 1, :w => 30},
    {:id => 2, :w => 20},
    {:id => 3, :w => 10},
    {:id => 1, :w => 10},
    {:id => 1, :w => 40},
    {:id => 2, :w => 50},
    {:id => 4, :w => 60},
    {:id => 4, :w => 30},
    {:id => 2, :w => 20},
    {:id => 3, :w => 10}
    ]

    ################################################################

    res1=arr.inject({}){|h,o|(h[o[:id]]||=[])<<o;h}.values.map{|a|
    a.sort_by{|o|o[:w]}.pop}

    ################################################################

    res2 =
    arr.inject({}) do |h,o|
    (h[o[:id]] ||= []) << o ; h
    end.values.collect do |a|
    a.sort_by do |o|
    o[:w]
    end.pop
    end

    ################################################################

    module Enumerable
    def hash_by(&block)
    inject({}){|h, o| (h[block.call(o)] ||= []) << o ; h}
    end

    def group_by(&block)
    hash_by(&block).sort.transpose.pop
    end
    end

    res3 =
    arr.group_by do |o|
    o[:id]
    end.collect do |a|
    a.sort_by do |o|
    o[:w]
    end.pop
    end

    ################################################################

    p res1
    p res2
    p res3

    ################################################################

    ----------------------------------------------------------------
    Erik Veenstra, Mar 6, 2007
    #6
  7. senthil

    Phrogz Guest

    Re: remove duplicates of array of object based on a attribute

    Erik Veenstra wrote:
    > And here's the inevitable one-liner... :}


    Not that we're golfing, but I like this one better in terms of one-
    linedness:
    Hash[ *map{ |o| [ o[:id], o ] }.flatten ].values
    Phrogz, Mar 6, 2007
    #7
  8. senthil

    Phrogz Guest

    Re: remove duplicates of array of object based on a attribute

    On Mar 6, 1:47 pm, "Phrogz" <> wrote:
    > Erik Veenstra wrote:
    > > And here's the inevitable one-liner... :}

    >
    > Not that we're golfing, but I like this one better in terms of one-
    > linedness:
    > Hash[ *map{ |o| [ o[:id], o ] }.flatten ].values


    Oops, I meant:
    Hash[ *a.map{ |o| [ o[:id], o ] }.flatten ].values
    Phrogz, Mar 6, 2007
    #8
  9. senthil

    Phrogz Guest

    Re: remove duplicates of array of object based on a attribute

    On Mar 6, 7:40 am, "Phrogz" <> wrote:
    > On Mar 6, 7:27 am, "Phrogz" <> wrote:
    >
    > > Here's another (assumedly slower) version that doesn't rely on Set:

    >
    > Huh...actually, the hash-based one seems faster than the Set-based
    > one:


    And faster still, by a hair, is a last-in approach. Upon reflection,
    all these techniques rely only on methods already in Enumerable, so
    they can be put there instead of being Array-specific.

    module Enumerable
    require 'set'
    def uniq_by1
    seen = Set.new
    select{ |x| seen.add?( yield( x ) ) }
    end
    def uniq_by2
    seen = {}
    select{ |x| !seen[v=yield(x)] && (seen[v]=true) }
    end
    def uniq_by3
    Hash[ *map{ |x| [ yield(x), x ] }.flatten ].values
    end

    def uniq_by4
    # fastest, preserves last-seen value for a key
    h = {}
    each{ |x| h[yield(x)] = x }
    h.values
    end

    def uniq_by5
    # near-fastest, preserves first-seen value for a key
    h = {}
    each{ |x| v=yield(x); h[v]=x unless h.include?(v) }
    h.values
    end
    end

    a = [ {:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3},
    {:a=>2, :e=>7}, {:a=>3, :b=>2}, {:a=>1}, {:a=>4}, {:f=>6} ]

    require 'benchmark'
    N = 20_000
    Benchmark.bmbm{ |x|
    x.report( 'with set' ){
    N.times{
    a.uniq_by1{ |h| h[:a] }
    a.uniq_by1{ |h| h[:b] }
    }
    }
    x.report( 'with hash' ){
    N.times{
    a.uniq_by2{ |h| h[:a] }
    a.uniq_by2{ |h| h[:b] }
    }
    }
    x.report( 'Hash.[].values' ){
    N.times{
    a.uniq_by3{ |h| h[:a] }
    a.uniq_by3{ |h| h[:b] }
    }
    }
    x.report( '#values (last in)' ){
    N.times{
    a.uniq_by4{ |h| h[:a] }
    a.uniq_by4{ |h| h[:b] }
    }
    }
    x.report( '#values (first in)' ){
    N.times{
    a.uniq_by5{ |h| h[:a] }
    a.uniq_by5{ |h| h[:b] }
    }
    }
    }

    #=> Rehearsal ------------------------------------------------------
    #=> with set 2.500000 0.016000 2.516000 ( 2.547000)
    #=> with hash 1.312000 0.000000 1.312000 ( 1.313000)
    #=> Hash.[].values 2.453000 0.000000 2.453000 ( 2.453000)
    #=> #values (last in) 1.110000 0.000000 1.110000 ( 1.109000)
    #=> #values (first in) 1.296000 0.000000 1.296000 ( 1.297000)
    #=> --------------------------------------------- total: 8.687000sec
    #=>
    #=> user system total real
    #=> with set 2.000000 0.000000 2.000000 ( 1.999000)
    #=> with hash 1.297000 0.000000 1.297000 ( 1.297000)
    #=> Hash.[].values 2.531000 0.000000 2.531000 ( 2.532000)
    #=> #values (last in) 1.125000 0.015000 1.140000 ( 1.140000)
    #=> #values (first in) 1.344000 0.000000 1.344000 ( 1.344000)
    Phrogz, Mar 6, 2007
    #9
  10. Re: remove duplicates of array of object based on a attribute

    > Hash[ *a.map{ |o| [ o[:id], o ] }.flatten ].values

    Not bad...

    How does this ensure that the maximum :w is used?

    gegroet,
    Erik V. - http://www.erikveen.dds.nl/
    Erik Veenstra, Mar 6, 2007
    #10
  11. senthil

    Guest

    Re: remove duplicates of array of object based on a attribute

    On 3/6/07, Erik Veenstra <> wrote:
    > > Hash[ *a.map{ |o| [ o[:id], o ] }.flatten ].values

    >
    > Not bad...
    >
    > How does this ensure that the maximum :w is used?


    Hash[ *a.map{ |o| [ o[:id], o ] }.flatten ].values
    => [{:id=>1, :w=>40}, {:id=>2, :w=>20}, {:id=>3, :w=>10}, {:id=>4, :w=>30}]

    Hash[*(a.sort_by{|z|z[:id]}).map{|o|[o[:id],o]}.flatten].values
    => [{:id=>1, :w=>40}, {:id=>2, :w=>50}, {:id=>3, :w=>10}, {:id=>4, :w=>60}]
    , Mar 6, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven Bethard
    Replies:
    11
    Views:
    1,002
    Alex Martelli
    Feb 7, 2005
  2. Jesper Mortensen
    Replies:
    1
    Views:
    472
  3. Josselin
    Replies:
    3
    Views:
    146
    Peña, Botp
    Aug 17, 2007
  4. andrea
    Replies:
    2
    Views:
    121
    andrea
    May 12, 2008
  5. Susan
    Replies:
    6
    Views:
    138
    Xicheng
    Jan 27, 2006
Loading...

Share This Page