Array::uniq { block } ?

Discussion in 'Ruby' started by Belorion, Jan 26, 2005.

  1. Belorion

    Belorion Guest

    I have an array of arrays. I want to be able to do a uniq operation
    on my array, and have it accept a block so I can specify what to
    examine for uniqueness. For example, in IRB

    a = [ [1,2], [3,4], [1,3] ]
    a.sort { |x,y| x[0] <=> y[0] }

    returns => [ [1,2], [1,3], [3,4] ]

    However, it does not appear uniq accepts blocks:

    a = [ [1,2], [3,4], [1,3] ]
    a.uniq { |x,y| x[0] == y[0] }

    returns => [ [1,2], [3,4], [1,3] ]
    and a.uniq! with the above block returns nil.

    Am I stuck writing my own deep_uniq? Granted, it would be hard, but
    certainly not as clean as the above.
    Belorion, Jan 26, 2005
    #1
    1. Advertising

  2. Belorion wrote:
    > I have an array of arrays. I want to be able to do a uniq operation
    > on my array, and have it accept a block so I can specify what to
    > examine for uniqueness. For example, in IRB
    >
    > a = [ [1,2], [3,4], [1,3] ]
    > a.sort { |x,y| x[0] <=> y[0] }
    >
    > returns => [ [1,2], [1,3], [3,4] ]
    >
    > However, it does not appear uniq accepts blocks:
    >
    > a = [ [1,2], [3,4], [1,3] ]
    > a.uniq { |x,y| x[0] == y[0] }
    >
    > returns => [ [1,2], [3,4], [1,3] ]
    > and a.uniq! with the above block returns nil.
    >
    > Am I stuck writing my own deep_uniq? Granted, it would be hard, but
    > certainly not as clean as the above.


    Not that this is a bad idea, but why do I get this feeling that we're
    eventually going to find an excuse to allow blocks to virtually every
    method in the core library?

    Where does it all end?!

    Dan
    Daniel Berger, Jan 26, 2005
    #2
    1. Advertising

  3. "Belorion" <> schrieb im Newsbeitrag
    news:...
    >I have an array of arrays. I want to be able to do a uniq operation
    > on my array, and have it accept a block so I can specify what to
    > examine for uniqueness. For example, in IRB
    >
    > a = [ [1,2], [3,4], [1,3] ]
    > a.sort { |x,y| x[0] <=> y[0] }
    >
    > returns => [ [1,2], [1,3], [3,4] ]
    >
    > However, it does not appear uniq accepts blocks:
    >
    > a = [ [1,2], [3,4], [1,3] ]
    > a.uniq { |x,y| x[0] == y[0] }
    >
    > returns => [ [1,2], [3,4], [1,3] ]
    > and a.uniq! with the above block returns nil.
    >
    > Am I stuck writing my own deep_uniq? Granted, it would be hard, but
    > certainly not as clean as the above.


    The block is accepted but apparently not used:

    >> %w{a b c d a d}.uniq {|*x| p x}

    => ["a", "b", "c", "d"]

    Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

    How about using a Hash in the meantime?

    >> h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h}

    => {1=>[1, 2], 3=>[3, 4]}
    >> h.values

    => [[1, 2], [3, 4]]

    Or

    >> h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h}

    => {1=>[1, 2], 3=>[3, 4]}
    >> h.values

    => [[1, 2], [3, 4]]

    Kind regards

    robert
    Robert Klemme, Jan 26, 2005
    #3
  4. "Robert Klemme" <> schrieb im Newsbeitrag
    news:...
    >
    > "Belorion" <> schrieb im Newsbeitrag
    > news:...
    >>I have an array of arrays. I want to be able to do a uniq operation
    >> on my array, and have it accept a block so I can specify what to
    >> examine for uniqueness. For example, in IRB
    >>
    >> a = [ [1,2], [3,4], [1,3] ]
    >> a.sort { |x,y| x[0] <=> y[0] }
    >>
    >> returns => [ [1,2], [1,3], [3,4] ]
    >>
    >> However, it does not appear uniq accepts blocks:
    >>
    >> a = [ [1,2], [3,4], [1,3] ]
    >> a.uniq { |x,y| x[0] == y[0] }
    >>
    >> returns => [ [1,2], [3,4], [1,3] ]
    >> and a.uniq! with the above block returns nil.
    >>
    >> Am I stuck writing my own deep_uniq? Granted, it would be hard, but
    >> certainly not as clean as the above.

    >
    > The block is accepted but apparently not used:
    >
    >>> %w{a b c d a d}.uniq {|*x| p x}

    > => ["a", "b", "c", "d"]
    >
    > Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)


    Delete that: you can provide a block for *every* method - it's simply
    ignored. So no bug, but unique obviously is not built to use it. Stupid
    me...

    robert


    >
    > How about using a Hash in the meantime?
    >
    >>> h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h}

    > => {1=>[1, 2], 3=>[3, 4]}
    >>> h.values

    > => [[1, 2], [3, 4]]
    >
    > Or
    >
    >>> h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h}

    > => {1=>[1, 2], 3=>[3, 4]}
    >>> h.values

    > => [[1, 2], [3, 4]]
    >
    > Kind regards
    >
    > robert
    >
    Robert Klemme, Jan 26, 2005
    #4
  5. Belorion

    ts Guest

    >>>>> "i" == itsme213 <> writes:

    i> I agree, #uniq is certainly not the only candidate. I think it would be very
    i> useful to compile a list of all such methods that could use a block to
    i> override some default internal method (#== in the case of uniq), and
    i> consider these for 2.0.

    #uniq don't use #==, it use #hash and #eql?


    Guy Decoux
    ts, Jan 27, 2005
    #5
  6. "ts" <> schrieb im Newsbeitrag
    news:...
    > >>>>> "i" == itsme213 <> writes:

    >
    > i> I agree, #uniq is certainly not the only candidate. I think it would

    be very
    > i> useful to compile a list of all such methods that could use a block

    to
    > i> override some default internal method (#== in the case of uniq), and
    > i> consider these for 2.0.
    >
    > #uniq don't use #==, it use #hash and #eql?


    Hm...

    class Dummy
    def eql?(x) puts "eql?"; super end
    def equal?(x) puts "equal?"; super end
    def ==(x) puts "=="; super end
    def hash() puts "hash"; super end
    def id() puts "id"; super end
    end

    >> h={Dummy.new => 1, Dummy.new =>2}

    hash
    hash
    => {#<Dummy:0x10187428>=>2, #<Dummy:0x10187440>=>1}
    >> h.keys.concat(h.keys).uniq

    hash
    hash
    hash
    hash
    hash
    hash
    hash
    hash
    => [#<Dummy:0x10187428>, #<Dummy:0x10187440>]

    I don't see eql? called here - maybe it's an internal optimization that
    short circuits identity. Or did I miss something?

    Kind regards

    robert
    Robert Klemme, Jan 27, 2005
    #6
  7. ts <> wrote:
    > >>>>> "i" == itsme213 <> writes:

    >
    > i> I agree, #uniq is certainly not the only candidate. I think it would be very
    > i> useful to compile a list of all such methods that could use a block to
    > i> override some default internal method (#== in the case of uniq), and
    > i> consider these for 2.0.
    >
    > #uniq don't use #==, it use #hash and #eql?


    Which raises the problem that it works globally, unlike the unix uniq
    which only uniqs consecutive elements. Passing in a block would lead to
    a performance hit since you'd need to compare every pair of elements
    without the benefit of a hash function. A uniq_by would probably cover
    most of the use cases anyway.

    martin
    Martin DeMello, Jan 27, 2005
    #7
  8. Belorion

    ts Guest

    >>>>> "R" == Robert Klemme <> writes:

    R> I don't see eql? called here - maybe it's an internal optimization that
    R> short circuits identity. Or did I miss something?

    If 2 object have different hash value, it don't need to call eql?

    Now if they have the same hash value :
    * if eql? return true, this is the "same" object (i.e. it will use the
    same key)
    * otherwise there is collision

    Object#hash return the id of object



    Guy Decoux
    ts, Jan 27, 2005
    #8
  9. "ts" <> schrieb im Newsbeitrag
    news:...
    > >>>>> "R" == Robert Klemme <> writes:

    >
    > R> I don't see eql? called here - maybe it's an internal optimization

    that
    > R> short circuits identity. Or did I miss something?
    >
    > If 2 object have different hash value, it don't need to call eql?


    Ah, yes of course. That's it! Thanks for reminding me.

    > Now if they have the same hash value :
    > * if eql? return true, this is the "same" object (i.e. it will use the
    > same key)
    > * otherwise there is collision


    Of all what you said I don't understand your last point: Why is there a
    collision?

    >> class Foo; def hash() 0 end end

    => nil
    >> h = {Foo.new => 1, Foo.new => 2}

    => {#<Foo:0x101a9808>=>2, #<Foo:0x101a9898>=>1}
    >> h.keys.uniq

    => [#<Foo:0x101a9808>, #<Foo:0x101a9898>]

    And indeed a redefined Dummy clarifies this:

    class Dummy
    def eql?(x) puts "eql?"; super end
    def equal?(x) puts "equal?"; super end
    def ==(x) puts "=="; super end
    def hash() puts "hash"; 0 end
    def id() puts "id"; super end
    end

    >> h={Dummy.new => 1, Dummy.new => 2}

    hash
    hash
    eql?
    => {#<Dummy:0x1018da28>=>2, #<Dummy:0x1018da88>=>1}
    >> h.keys.uniq

    hash
    hash
    eql?
    => [#<Dummy:0x1018da28>, #<Dummy:0x1018da88>]

    Again a reminder that it's always a good idea to override #hash, #== and
    #eql? in one go for classes that should be used as hash keys or in sets.

    > Object#hash return the id of object


    Yeah, but that might be overridden in an arbitrary way. Normally one
    can't rely on that #hash tells anything about the identity of an instance
    other than probably that instances with different hash are not identical
    and not equivalent.

    Kind regards

    robert
    Robert Klemme, Jan 27, 2005
    #9
  10. Belorion

    ts Guest

    >>>>> "R" == Robert Klemme <> writes:

    R> Of all what you said I don't understand your last point: Why is there a
    R> collision?

    Array#uniq just store the element of the array as the key of a hash.

    For an hash, 2 values are stored in differents keys if they have a
    different hash value, or if eql? return false.

    In hash "terminology", there is collision when 2 elements have the same hash
    value but are different (eql? return false) (see st.c)



    Guy Decoux
    ts, Jan 27, 2005
    #10
  11. "ts" <> schrieb im Newsbeitrag
    news:...
    > >>>>> "R" == Robert Klemme <> writes:

    >
    > R> Of all what you said I don't understand your last point: Why is there

    a
    > R> collision?
    >
    > Array#uniq just store the element of the array as the key of a hash.
    >
    > For an hash, 2 values are stored in differents keys if they have a


    I think the term is "bucket" and not "key". Maybe that was confusing me.

    > different hash value, or if eql? return false.
    >
    > In hash "terminology", there is collision when 2 elements have the same

    hash
    > value but are different (eql? return false) (see st.c)


    Ok, I now see what "collision" you meant. Of course there is a collision
    but still both keys are stored.

    Thx for clarifying!

    robert

    >
    >
    >
    > Guy Decoux
    >
    >
    >
    Robert Klemme, Jan 27, 2005
    #11
  12. Hi,

    Am Donnerstag, 27. Jan 2005, 20:11:35 +0900 schrieb ts:
    > >>>>> "R" == Robert Klemme <> writes:

    >
    > R> I don't see eql? called here - maybe it's an internal optimization that
    > R> short circuits identity. Or did I miss something?
    >
    > If 2 object have different hash value, it don't need to call eql?
    >
    > Now if they have the same hash value :
    > * if eql? return true, this is the "same" object (i.e. it will use the
    > same key)
    > * otherwise there is collision


    What a look at the interpreter source code will validate.
    The elements addes as hash keys.

    Strings are treated special, by the way.

    Bertram

    --
    Bertram Scharpf
    Stuttgart, Deutschland/Germany
    http://www.bertram-scharpf.de
    Bertram Scharpf, Jan 27, 2005
    #12
  13. itsme213 <> wrote:
    > "Martin DeMello" <> wrote in message
    >
    > > A uniq_by would probably cover
    > > most of the use cases anyway.

    >
    > Why a different method? Is that the way it is done elsewhere? Why not
    > def uniq
    > if block_given?
    > ... less_efficient
    > else default_way
    > end


    By analogy with sort and sort_by

    a.uniq_by {|i| i[1]} would work like uniq, but hashing i rather than
    i. a.uniq {|i,j| f(i,j) == true } would be the generalised method, but
    as noted that's O(n^2) rather than O(n). (Also I'm not sure it'd be
    algorithmically sound - you could introduce order dependencies with a
    sufficiently adversarial comparison function.)

    martin
    Martin DeMello, Jan 28, 2005
    #13
  14. Belorion

    Trans Guest

    > By analogy with sort and sort_by

    > a.uniq_by {|i| i[1]} would work like uniq, but hashing i
    > rather than i. a.uniq {|i,j| f(i,j) == true } would be the
    > generalised method, but as noted that's O(n^2) rather
    > than O(n). (Also I'm not sure it'd be algorithmically sound
    > - you could introduce order dependencies with a sufficiently
    > adversarial comparison function.)


    Has anyone, could anyone, write a Ruby version of this method? I'm not
    sure what its supposed to do exaclty. Id like to see source.
    Thanks,
    T.
    Trans, Jan 30, 2005
    #14
  15. Belorion

    Pit Capitain Guest

    Trans schrieb:
    > Has anyone, could anyone, write a Ruby version of this method? I'm not
    > sure what its supposed to do exaclty. Id like to see source.


    No problem:

    require 'set'

    class Array
    def uniq_by
    result = []
    values = Set.new
    each do |elem|
    value = yield elem
    unless values.include? value
    values << value
    result << elem
    end
    end
    result
    end
    end

    p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

    I bet Robert will transform this into a version using inject ;-)

    Regards,
    Pit
    Pit Capitain, Jan 30, 2005
    #15
  16. Trans <> wrote:
    >
    > Has anyone, could anyone, write a Ruby version of this method? I'm not
    > sure what its supposed to do exaclty. Id like to see source.


    module Enumerable
    def uniq_by(*args, &blk)
    blk = lambda {|i| i.send(*args)} unless args.empty?
    raise "needs args or block" unless blk
    h = {}
    a = []
    each {|i|
    k = blk.call(i)
    a << i unless h[k]
    h[k] = true
    }
    a
    end
    end

    p (-5..5).to_a.uniq_by {|i| i*i}
    p [[1, 2], [3, 4], [5, 6], [7, 2]].uniq_by:)at, 1)

    martin
    Martin DeMello, Jan 30, 2005
    #16
  17. Belorion

    Trans Guest

    Trans, Jan 30, 2005
    #17
  18. Pit Capitain wrote:

    > Trans schrieb:
    >
    >> Has anyone, could anyone, write a Ruby version of this method? I'm not
    >> sure what its supposed to do exaclty. Id like to see source.

    >
    > No problem:
    >
    > require 'set'
    >
    > class Array
    > def uniq_by
    > result = []
    > values = Set.new
    > each do |elem|
    > value = yield elem
    > unless values.include? value
    > values << value
    > result << elem
    > end
    > end
    > result
    > end
    > end
    >
    > p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]
    >
    > I bet Robert will transform this into a version using inject ;-)


    module Enumerable
    def uniq_by()
    inject([]) do |state, item|
    value = yield(item)
    state.include?(value) ? state : state + [item]
    end
    end
    end
    Florian Gross, Jan 30, 2005
    #18
  19. "Pit Capitain" <> schrieb im Newsbeitrag
    news:...
    > Trans schrieb:
    >> Has anyone, could anyone, write a Ruby version of this method? I'm not
    >> sure what its supposed to do exaclty. Id like to see source.

    >
    > No problem:
    >
    > require 'set'
    >
    > class Array
    > def uniq_by
    > result = []
    > values = Set.new
    > each do |elem|
    > value = yield elem
    > unless values.include? value
    > values << value
    > result << elem
    > end
    > end
    > result
    > end
    > end
    >
    > p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]
    >
    > I bet Robert will transform this into a version using inject ;-)


    Since you asked... :))

    module Enumerable
    def uniq_by
    inject({}) {|h,x| h[yield(x)] ||= x; h}.values
    end
    end

    >> a=(1..10).map {rand 10}

    => [2, 7, 4, 1, 9, 0, 0, 2, 5, 0]
    >> a.uniq_by {|x| x%3}

    => [9, 7, 2]

    Pro:
    - shorter
    - needs one less temp instace

    Con:
    - not stable, original order is not maintained

    A stable version:

    module Enumerable
    def uniq_by
    h = {}; inject([]) {|a,x| h[yield(x)] ||= a << x}
    end
    end

    >> a.uniq_by {|x| x%3}

    => [2, 7, 9]

    Beautiful about this version is that it does not need the "; h" at the end
    of the inject block and no ".values", too... :)

    Kind regards

    robert
    Robert Klemme, Jan 30, 2005
    #19
  20. Belorion

    Trans Guest

    Thanks, now an offical part of Facets (as of next release).

    T
    Trans, Jan 30, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Markus

    Array#uniq

    Markus, Sep 29, 2004, in forum: Ruby
    Replies:
    1
    Views:
    98
    Yukihiro Matsumoto
    Sep 30, 2004
  2. Marcel Molina Jr.

    Customizing Array#uniq by defining eql?

    Marcel Molina Jr., Mar 7, 2005, in forum: Ruby
    Replies:
    4
    Views:
    114
    Yukihiro Matsumoto
    Mar 8, 2005
  3. Wolfgang Nádasi-Donner

    Array#uniq - Comparison doesn't use 'eql?' and 'hash'

    Wolfgang Nádasi-Donner, Feb 20, 2007, in forum: Ruby
    Replies:
    0
    Views:
    91
    Wolfgang Nádasi-Donner
    Feb 20, 2007
  4. Wolfgang Nádasi-Donner

    Array#uniq - Comparison doesn't use 'eql?' and 'hash'

    Wolfgang Nádasi-Donner, Feb 20, 2007, in forum: Ruby
    Replies:
    3
    Views:
    130
    Daniel Finnie
    Feb 20, 2007
  5. Bil Kleb
    Replies:
    5
    Views:
    120
Loading...

Share This Page