Array::uniq { block } ?

Belorion · Jan 26, 2005

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Daniel Berger · Jan 26, 2005

Belorion said:
I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Not that this is a bad idea, but why do I get this feeling that we're
eventually going to find an excuse to allow blocks to virtually every
method in the core library?

Where does it all end?!

Dan

Robert Klemme · Jan 26, 2005

Belorion said:
I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

The block is accepted but apparently not used:
=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

How about using a Hash in the meantime?

h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h} => {1=>[1, 2], 3=>[3, 4]}
h.values

Click to expand...

=> [[1, 2], [3, 4]]

Or

h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h} => {1=>[1, 2], 3=>[3, 4]}
h.values

Click to expand...

=> [[1, 2], [3, 4]]

Kind regards

robert

Robert Klemme · Jan 26, 2005

Robert Klemme said:
Belorion said:

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Click to expand...

The block is accepted but apparently not used:
=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

Delete that: you can provide a block for *every* method - it's simply
ignored. So no bug, but unique obviously is not built to use it. Stupid
me...

robert

How about using a Hash in the meantime?

h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h} => {1=>[1, 2], 3=>[3, 4]}
h.values

Click to expand...

Click to expand...

=> [[1, 2], [3, 4]]

Or

h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h} => {1=>[1, 2], 3=>[3, 4]}
h.values

Click to expand...

Click to expand...

=> [[1, 2], [3, 4]]

Kind regards

robert

ts · Jan 27, 2005

i> I agree, #uniq is certainly not the only candidate. I think it would be very
i> useful to compile a list of all such methods that could use a block to
i> override some default internal method (#== in the case of uniq), and
i> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Guy Decoux

Robert Klemme · Jan 27, 2005

ts said:
i> I agree, #uniq is certainly not the only candidate. I think it would be very
i> useful to compile a list of all such methods that could use a block to
i> override some default internal method (#== in the case of uniq), and
i> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Hm...

class Dummy
def eql?(x) puts "eql?"; super end
def equal?(x) puts "equal?"; super end
def ==(x) puts "=="; super end
def hash() puts "hash"; super end
def id() puts "id"; super end
end
hash
hash
hash
hash
hash
hash
hash
hash
hash
hash
=> [#<Dummy:0x10187428>, #<Dummy:0x10187440>]

I don't see eql? called here - maybe it's an internal optimization that
short circuits identity. Or did I miss something?

Kind regards

robert

Martin DeMello · Jan 27, 2005

ts said:
i> I agree, #uniq is certainly not the only candidate. I think it would be very
i> useful to compile a list of all such methods that could use a block to
i> override some default internal method (#== in the case of uniq), and
i> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Which raises the problem that it works globally, unlike the unix uniq
which only uniqs consecutive elements. Passing in a block would lead to
a performance hit since you'd need to compare every pair of elements
without the benefit of a hash function. A uniq_by would probably cover
most of the use cases anyway.

martin

ts · Jan 27, 2005

R> I don't see eql? called here - maybe it's an internal optimization that
R> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
* if eql? return true, this is the "same" object (i.e. it will use the
same key)
* otherwise there is collision

Object#hash return the id of object

Guy Decoux

Robert Klemme · Jan 27, 2005

ts said:
R> I don't see eql? called here - maybe it's an internal optimization that
R> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Ah, yes of course. That's it! Thanks for reminding me.

Now if they have the same hash value :
* if eql? return true, this is the "same" object (i.e. it will use the
same key)
* otherwise there is collision

Of all what you said I don't understand your last point: Why is there a
collision?
=> [#<Foo:0x101a9808>, #<Foo:0x101a9898>]

And indeed a redefined Dummy clarifies this:

class Dummy
def eql?(x) puts "eql?"; super end
def equal?(x) puts "equal?"; super end
def ==(x) puts "=="; super end
def hash() puts "hash"; 0 end
def id() puts "id"; super end
end
hash
hash
eql?
hash
hash
eql?
=> [#<Dummy:0x1018da28>, #<Dummy:0x1018da88>]

Again a reminder that it's always a good idea to override #hash, #== and
#eql? in one go for classes that should be used as hash keys or in sets.

Object#hash return the id of object

Yeah, but that might be overridden in an arbitrary way. Normally one
can't rely on that #hash tells anything about the identity of an instance
other than probably that instances with different hash are not identical
and not equivalent.

Kind regards

robert

ts · Jan 27, 2005

R> Of all what you said I don't understand your last point: Why is there a
R> collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a
different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same hash
value but are different (eql? return false) (see st.c)

Guy Decoux

Robert Klemme · Jan 27, 2005

ts said:
R> Of all what you said I don't understand your last point: Why is there a
R> collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a

I think the term is "bucket" and not "key". Maybe that was confusing me.

different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same hash
value but are different (eql? return false) (see st.c)

Ok, I now see what "collision" you meant. Of course there is a collision
but still both keys are stored.

Thx for clarifying!

robert

Bertram Scharpf · Jan 27, 2005

Hi,

Am Donnerstag, 27. Jan 2005, 20:11:35 +0900 schrieb ts:

R> I don't see eql? called here - maybe it's an internal optimization that
R> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
* if eql? return true, this is the "same" object (i.e. it will use the
same key)
* otherwise there is collision

What a look at the interpreter source code will validate.
The elements addes as hash keys.

Strings are treated special, by the way.

Bertram

Martin DeMello · Jan 28, 2005

itsme213 said:
Why a different method? Is that the way it is done elsewhere? Why not
def uniq
if block_given?
... less_efficient
else default_way
end

By analogy with sort and sort_by

a.uniq_by {|i| i[1]} would work like uniq, but hashing i rather than
i. a.uniq {|i,j| f(i,j) == true } would be the generalised method, but
as noted that's O(n^2) rather than O(n). (Also I'm not sure it'd be
algorithmically sound - you could introduce order dependencies with a
sufficiently adversarial comparison function.)

martin

Trans · Jan 30, 2005

By analogy with sort and sort_by

a.uniq_by {|i| i[1]} would work like uniq, but hashing i
rather than i. a.uniq {|i,j| f(i,j) == true } would be the
generalised method, but as noted that's O(n^2) rather
than O(n). (Also I'm not sure it'd be algorithmically sound
- you could introduce order dependencies with a sufficiently
adversarial comparison function.)

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.
Thanks,
T.

Pit Capitain · Jan 30, 2005

Trans said:
Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

No problem:

require 'set'

class Array
def uniq_by
result = []
values = Set.new
each do |elem|
value = yield elem
unless values.include? value
values << value
result << elem
end
end
result
end
end

p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject ;-)

Regards,
Pit

Martin DeMello · Jan 30, 2005

Trans said:
Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

module Enumerable
def uniq_by(*args, &blk)
blk = lambda {|i| i.send(*args)} unless args.empty?
raise "needs args or block" unless blk
h = {}
a = []
each {|i|
k = blk.call(i)
a << i unless h[k]
h[k] = true
}
a
end
end

p (-5..5).to_a.uniq_by {|i| i*i}
p [[1, 2], [3, 4], [5, 6], [7, 2]].uniq_by

at, 1)

martin

Trans · Jan 30, 2005

LOL

*http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/128868
Oh, that's rich. I think I need a vacation.

Florian Gross · Jan 30, 2005

Pit said:
Trans said:

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

Click to expand...

No problem:

require 'set'

class Array
def uniq_by
result = []
values = Set.new
each do |elem|
value = yield elem
unless values.include? value
values << value
result << elem
end
end
result
end
end

p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject ;-)

module Enumerable
def uniq_by()
inject([]) do |state, item|
value = yield(item)
state.include?(value) ? state : state + [item]
end
end
end

Robert Klemme · Jan 30, 2005

Pit Capitain said:
Trans said:

Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

Click to expand...

No problem:

require 'set'

class Array
def uniq_by
result = []
values = Set.new
each do |elem|
value = yield elem
unless values.include? value
values << value
result << elem
end
end
result
end
end

p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject ;-)

Since you asked...

)

module Enumerable
def uniq_by
inject({}) {|h,x| h[yield(x)] ||= x; h}.values
end
end

a=(1..10).map {rand 10} => [2, 7, 4, 1, 9, 0, 0, 2, 5, 0]
a.uniq_by {|x| x%3}

Click to expand...

=> [9, 7, 2]

Pro:
- shorter
- needs one less temp instace

Con:
- not stable, original order is not maintained

A stable version:

module Enumerable
def uniq_by
h = {}; inject([]) {|a,x| h[yield(x)] ||= a << x}
end
end
=> [2, 7, 9]

Beautiful about this version is that it does not need the "; h" at the end
of the inject block and no ".values", too...

Kind regards

robert

Trans · Jan 30, 2005

Thanks, now an offical part of Facets (as of next release).

T

Trouble creating multi dimensional array. 0 to 26 in 3 dimensions.	1	Oct 12, 2022
Hello guys, how do I do arithmetics with a certain index from an array ? JavaScript	3	Dec 7, 2022
compact or uniq that takes a block like sort?	5	Jan 7, 2008
array and hash combine methods	7	Dec 14, 2006
Grid Stacking Game Issues	5	Jun 30, 2023
[ANN] Multidimensional Array - MDArray 0.5.5	0	Nov 19, 2013
About 1.8.7 Array#map! without a block	0	Jun 3, 2008
Array.index with block not working	1	Jul 18, 2008

Array::uniq { block } ?

Belorion

Daniel Berger

Robert Klemme

Robert Klemme

ts

Robert Klemme

Martin DeMello

ts

Robert Klemme

ts

Robert Klemme

Bertram Scharpf

Martin DeMello

Trans

Pit Capitain

Martin DeMello

Trans

Florian Gross

Robert Klemme

Trans

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads