# Array::uniq { block } ?

Discussion in 'Ruby' started by Belorion, Jan 26, 2005.

1. ### BelorionGuest

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Belorion, Jan 26, 2005

2. ### Daniel BergerGuest

Belorion wrote:
> I have an array of arrays. I want to be able to do a uniq operation
> on my array, and have it accept a block so I can specify what to
> examine for uniqueness. For example, in IRB
>
> a = [ [1,2], [3,4], [1,3] ]
> a.sort { |x,y| x[0] <=> y[0] }
>
> returns => [ [1,2], [1,3], [3,4] ]
>
> However, it does not appear uniq accepts blocks:
>
> a = [ [1,2], [3,4], [1,3] ]
> a.uniq { |x,y| x[0] == y[0] }
>
> returns => [ [1,2], [3,4], [1,3] ]
> and a.uniq! with the above block returns nil.
>
> Am I stuck writing my own deep_uniq? Granted, it would be hard, but
> certainly not as clean as the above.

Not that this is a bad idea, but why do I get this feeling that we're
eventually going to find an excuse to allow blocks to virtually every
method in the core library?

Where does it all end?!

Dan

Daniel Berger, Jan 26, 2005

3. ### Robert KlemmeGuest

"Belorion" <> schrieb im Newsbeitrag
news:...
>I have an array of arrays. I want to be able to do a uniq operation
> on my array, and have it accept a block so I can specify what to
> examine for uniqueness. For example, in IRB
>
> a = [ [1,2], [3,4], [1,3] ]
> a.sort { |x,y| x[0] <=> y[0] }
>
> returns => [ [1,2], [1,3], [3,4] ]
>
> However, it does not appear uniq accepts blocks:
>
> a = [ [1,2], [3,4], [1,3] ]
> a.uniq { |x,y| x[0] == y[0] }
>
> returns => [ [1,2], [3,4], [1,3] ]
> and a.uniq! with the above block returns nil.
>
> Am I stuck writing my own deep_uniq? Granted, it would be hard, but
> certainly not as clean as the above.

The block is accepted but apparently not used:

>> %w{a b c d a d}.uniq {|*x| p x}

=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

How about using a Hash in the meantime?

>> h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h}

=> {1=>[1, 2], 3=>[3, 4]}
>> h.values

=> [[1, 2], [3, 4]]

Or

>> h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h}

=> {1=>[1, 2], 3=>[3, 4]}
>> h.values

=> [[1, 2], [3, 4]]

Kind regards

robert

Robert Klemme, Jan 26, 2005
4. ### Robert KlemmeGuest

"Robert Klemme" <> schrieb im Newsbeitrag
news:...
>
> "Belorion" <> schrieb im Newsbeitrag
> news:...
>>I have an array of arrays. I want to be able to do a uniq operation
>> on my array, and have it accept a block so I can specify what to
>> examine for uniqueness. For example, in IRB
>>
>> a = [ [1,2], [3,4], [1,3] ]
>> a.sort { |x,y| x[0] <=> y[0] }
>>
>> returns => [ [1,2], [1,3], [3,4] ]
>>
>> However, it does not appear uniq accepts blocks:
>>
>> a = [ [1,2], [3,4], [1,3] ]
>> a.uniq { |x,y| x[0] == y[0] }
>>
>> returns => [ [1,2], [3,4], [1,3] ]
>> and a.uniq! with the above block returns nil.
>>
>> Am I stuck writing my own deep_uniq? Granted, it would be hard, but
>> certainly not as clean as the above.

>
> The block is accepted but apparently not used:
>
>>> %w{a b c d a d}.uniq {|*x| p x}

> => ["a", "b", "c", "d"]
>
> Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

Delete that: you can provide a block for *every* method - it's simply
ignored. So no bug, but unique obviously is not built to use it. Stupid
me...

robert

>
> How about using a Hash in the meantime?
>
>>> h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h}

> => {1=>[1, 2], 3=>[3, 4]}
>>> h.values

> => [[1, 2], [3, 4]]
>
> Or
>
>>> h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h}

> => {1=>[1, 2], 3=>[3, 4]}
>>> h.values

> => [[1, 2], [3, 4]]
>
> Kind regards
>
> robert
>

Robert Klemme, Jan 26, 2005
5. ### tsGuest

>>>>> "i" == itsme213 <> writes:

i> I agree, #uniq is certainly not the only candidate. I think it would be very
i> useful to compile a list of all such methods that could use a block to
i> override some default internal method (#== in the case of uniq), and
i> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Guy Decoux

ts, Jan 27, 2005
6. ### Robert KlemmeGuest

"ts" <> schrieb im Newsbeitrag
news:...
> >>>>> "i" == itsme213 <> writes:

>
> i> I agree, #uniq is certainly not the only candidate. I think it would

be very
> i> useful to compile a list of all such methods that could use a block

to
> i> override some default internal method (#== in the case of uniq), and
> i> consider these for 2.0.
>
> #uniq don't use #==, it use #hash and #eql?

Hm...

class Dummy
def eql?(x) puts "eql?"; super end
def equal?(x) puts "equal?"; super end
def ==(x) puts "=="; super end
def hash() puts "hash"; super end
def id() puts "id"; super end
end

>> h={Dummy.new => 1, Dummy.new =>2}

hash
hash
=> {#<Dummy:0x10187428>=>2, #<Dummy:0x10187440>=>1}
>> h.keys.concat(h.keys).uniq

hash
hash
hash
hash
hash
hash
hash
hash
=> [#<Dummy:0x10187428>, #<Dummy:0x10187440>]

I don't see eql? called here - maybe it's an internal optimization that
short circuits identity. Or did I miss something?

Kind regards

robert

Robert Klemme, Jan 27, 2005
7. ### Martin DeMelloGuest

ts <> wrote:
> >>>>> "i" == itsme213 <> writes:

>
> i> I agree, #uniq is certainly not the only candidate. I think it would be very
> i> useful to compile a list of all such methods that could use a block to
> i> override some default internal method (#== in the case of uniq), and
> i> consider these for 2.0.
>
> #uniq don't use #==, it use #hash and #eql?

Which raises the problem that it works globally, unlike the unix uniq
which only uniqs consecutive elements. Passing in a block would lead to
a performance hit since you'd need to compare every pair of elements
without the benefit of a hash function. A uniq_by would probably cover
most of the use cases anyway.

martin

Martin DeMello, Jan 27, 2005
8. ### tsGuest

>>>>> "R" == Robert Klemme <> writes:

R> I don't see eql? called here - maybe it's an internal optimization that
R> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
* if eql? return true, this is the "same" object (i.e. it will use the
same key)
* otherwise there is collision

Object#hash return the id of object

Guy Decoux

ts, Jan 27, 2005
9. ### Robert KlemmeGuest

"ts" <> schrieb im Newsbeitrag
news:...
> >>>>> "R" == Robert Klemme <> writes:

>
> R> I don't see eql? called here - maybe it's an internal optimization

that
> R> short circuits identity. Or did I miss something?
>
> If 2 object have different hash value, it don't need to call eql?

Ah, yes of course. That's it! Thanks for reminding me.

> Now if they have the same hash value :
> * if eql? return true, this is the "same" object (i.e. it will use the
> same key)
> * otherwise there is collision

Of all what you said I don't understand your last point: Why is there a
collision?

>> class Foo; def hash() 0 end end

=> nil
>> h = {Foo.new => 1, Foo.new => 2}

=> {#<Foo:0x101a9808>=>2, #<Foo:0x101a9898>=>1}
>> h.keys.uniq

=> [#<Foo:0x101a9808>, #<Foo:0x101a9898>]

And indeed a redefined Dummy clarifies this:

class Dummy
def eql?(x) puts "eql?"; super end
def equal?(x) puts "equal?"; super end
def ==(x) puts "=="; super end
def hash() puts "hash"; 0 end
def id() puts "id"; super end
end

>> h={Dummy.new => 1, Dummy.new => 2}

hash
hash
eql?
=> {#<Dummy:0x1018da28>=>2, #<Dummy:0x1018da88>=>1}
>> h.keys.uniq

hash
hash
eql?
=> [#<Dummy:0x1018da28>, #<Dummy:0x1018da88>]

Again a reminder that it's always a good idea to override #hash, #== and
#eql? in one go for classes that should be used as hash keys or in sets.

> Object#hash return the id of object

Yeah, but that might be overridden in an arbitrary way. Normally one
can't rely on that #hash tells anything about the identity of an instance
other than probably that instances with different hash are not identical
and not equivalent.

Kind regards

robert

Robert Klemme, Jan 27, 2005
10. ### tsGuest

>>>>> "R" == Robert Klemme <> writes:

R> Of all what you said I don't understand your last point: Why is there a
R> collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a
different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same hash
value but are different (eql? return false) (see st.c)

Guy Decoux

ts, Jan 27, 2005
11. ### Robert KlemmeGuest

"ts" <> schrieb im Newsbeitrag
news:...
> >>>>> "R" == Robert Klemme <> writes:

>
> R> Of all what you said I don't understand your last point: Why is there

a
> R> collision?
>
> Array#uniq just store the element of the array as the key of a hash.
>
> For an hash, 2 values are stored in differents keys if they have a

I think the term is "bucket" and not "key". Maybe that was confusing me.

> different hash value, or if eql? return false.
>
> In hash "terminology", there is collision when 2 elements have the same

hash
> value but are different (eql? return false) (see st.c)

Ok, I now see what "collision" you meant. Of course there is a collision
but still both keys are stored.

Thx for clarifying!

robert

>
>
>
> Guy Decoux
>
>
>

Robert Klemme, Jan 27, 2005
12. ### Bertram ScharpfGuest

Hi,

Am Donnerstag, 27. Jan 2005, 20:11:35 +0900 schrieb ts:
> >>>>> "R" == Robert Klemme <> writes:

>
> R> I don't see eql? called here - maybe it's an internal optimization that
> R> short circuits identity. Or did I miss something?
>
> If 2 object have different hash value, it don't need to call eql?
>
> Now if they have the same hash value :
> * if eql? return true, this is the "same" object (i.e. it will use the
> same key)
> * otherwise there is collision

What a look at the interpreter source code will validate.
The elements addes as hash keys.

Strings are treated special, by the way.

Bertram

--
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

Bertram Scharpf, Jan 27, 2005
13. ### Martin DeMelloGuest

itsme213 <> wrote:
> "Martin DeMello" <> wrote in message
>
> > A uniq_by would probably cover
> > most of the use cases anyway.

>
> Why a different method? Is that the way it is done elsewhere? Why not
> def uniq
> if block_given?
> ... less_efficient
> else default_way
> end

By analogy with sort and sort_by

a.uniq_by {|i| i[1]} would work like uniq, but hashing i rather than
i. a.uniq {|i,j| f(i,j) == true } would be the generalised method, but
as noted that's O(n^2) rather than O(n). (Also I'm not sure it'd be
algorithmically sound - you could introduce order dependencies with a

martin

Martin DeMello, Jan 28, 2005
14. ### TransGuest

> By analogy with sort and sort_by

> a.uniq_by {|i| i[1]} would work like uniq, but hashing i
> rather than i. a.uniq {|i,j| f(i,j) == true } would be the
> generalised method, but as noted that's O(n^2) rather
> than O(n). (Also I'm not sure it'd be algorithmically sound
> - you could introduce order dependencies with a sufficiently

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.
Thanks,
T.

Trans, Jan 30, 2005
15. ### Pit CapitainGuest

Trans schrieb:
> Has anyone, could anyone, write a Ruby version of this method? I'm not
> sure what its supposed to do exaclty. Id like to see source.

No problem:

require 'set'

class Array
def uniq_by
result = []
values = Set.new
each do |elem|
value = yield elem
unless values.include? value
values << value
result << elem
end
end
result
end
end

p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject ;-)

Regards,
Pit

Pit Capitain, Jan 30, 2005
16. ### Martin DeMelloGuest

Trans <> wrote:
>
> Has anyone, could anyone, write a Ruby version of this method? I'm not
> sure what its supposed to do exaclty. Id like to see source.

module Enumerable
def uniq_by(*args, &blk)
blk = lambda {|i| i.send(*args)} unless args.empty?
raise "needs args or block" unless blk
h = {}
a = []
each {|i|
k = blk.call(i)
a << i unless h[k]
h[k] = true
}
a
end
end

p (-5..5).to_a.uniq_by {|i| i*i}
p [[1, 2], [3, 4], [5, 6], [7, 2]].uniq_byat, 1)

martin

Martin DeMello, Jan 30, 2005
17. ### TransGuest

Trans, Jan 30, 2005
18. ### Florian GrossGuest

Pit Capitain wrote:

> Trans schrieb:
>
>> Has anyone, could anyone, write a Ruby version of this method? I'm not
>> sure what its supposed to do exaclty. Id like to see source.

>
> No problem:
>
> require 'set'
>
> class Array
> def uniq_by
> result = []
> values = Set.new
> each do |elem|
> value = yield elem
> unless values.include? value
> values << value
> result << elem
> end
> end
> result
> end
> end
>
> p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]
>
> I bet Robert will transform this into a version using inject ;-)

module Enumerable
def uniq_by()
inject([]) do |state, item|
value = yield(item)
state.include?(value) ? state : state + [item]
end
end
end

Florian Gross, Jan 30, 2005
19. ### Robert KlemmeGuest

"Pit Capitain" <> schrieb im Newsbeitrag
news:...
> Trans schrieb:
>> Has anyone, could anyone, write a Ruby version of this method? I'm not
>> sure what its supposed to do exaclty. Id like to see source.

>
> No problem:
>
> require 'set'
>
> class Array
> def uniq_by
> result = []
> values = Set.new
> each do |elem|
> value = yield elem
> unless values.include? value
> values << value
> result << elem
> end
> end
> result
> end
> end
>
> p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]
>
> I bet Robert will transform this into a version using inject ;-)

module Enumerable
def uniq_by
inject({}) {|h,x| h[yield(x)] ||= x; h}.values
end
end

>> a=(1..10).map {rand 10}

=> [2, 7, 4, 1, 9, 0, 0, 2, 5, 0]
>> a.uniq_by {|x| x%3}

=> [9, 7, 2]

Pro:
- shorter
- needs one less temp instace

Con:
- not stable, original order is not maintained

A stable version:

module Enumerable
def uniq_by
h = {}; inject([]) {|a,x| h[yield(x)] ||= a << x}
end
end

>> a.uniq_by {|x| x%3}

=> [2, 7, 9]

of the inject block and no ".values", too...

Kind regards

robert

Robert Klemme, Jan 30, 2005
20. ### TransGuest

Thanks, now an offical part of Facets (as of next release).

T

Trans, Jan 30, 2005