Array::uniq { block } ?

B

Belorion

I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.
 
D

Daniel Berger

Belorion said:
I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

Not that this is a bad idea, but why do I get this feeling that we're
eventually going to find an excuse to allow blocks to virtually every
method in the core library?

Where does it all end?!

Dan
 
R

Robert Klemme

Belorion said:
I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

The block is accepted but apparently not used:
=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

How about using a Hash in the meantime?
h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h} => {1=>[1, 2], 3=>[3, 4]}
h.values
=> [[1, 2], [3, 4]]

Or
h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h} => {1=>[1, 2], 3=>[3, 4]}
h.values
=> [[1, 2], [3, 4]]

Kind regards

robert
 
R

Robert Klemme

Robert Klemme said:
Belorion said:
I have an array of arrays. I want to be able to do a uniq operation
on my array, and have it accept a block so I can specify what to
examine for uniqueness. For example, in IRB

a = [ [1,2], [3,4], [1,3] ]
a.sort { |x,y| x[0] <=> y[0] }

returns => [ [1,2], [1,3], [3,4] ]

However, it does not appear uniq accepts blocks:

a = [ [1,2], [3,4], [1,3] ]
a.uniq { |x,y| x[0] == y[0] }

returns => [ [1,2], [3,4], [1,3] ]
and a.uniq! with the above block returns nil.

Am I stuck writing my own deep_uniq? Granted, it would be hard, but
certainly not as clean as the above.

The block is accepted but apparently not used:
=> ["a", "b", "c", "d"]

Could be a bug in the std lib. (Btw, I'm on 1.8.1 here)

Delete that: you can provide a block for *every* method - it's simply
ignored. So no bug, but unique obviously is not built to use it. Stupid
me...

robert

How about using a Hash in the meantime?
h = a.inject({}){|h,(k,v)| h[k] ||= [k,v];h} => {1=>[1, 2], 3=>[3, 4]}
h.values
=> [[1, 2], [3, 4]]

Or
h = a.inject({}){|h,pair| h[pair[0]] ||= pair;h} => {1=>[1, 2], 3=>[3, 4]}
h.values
=> [[1, 2], [3, 4]]

Kind regards

robert
 
T

ts

i> I agree, #uniq is certainly not the only candidate. I think it would be very
i> useful to compile a list of all such methods that could use a block to
i> override some default internal method (#== in the case of uniq), and
i> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?


Guy Decoux
 
R

Robert Klemme

ts said:
i> I agree, #uniq is certainly not the only candidate. I think it would be very
i> useful to compile a list of all such methods that could use a block to
i> override some default internal method (#== in the case of uniq), and
i> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Hm...

class Dummy
def eql?(x) puts "eql?"; super end
def equal?(x) puts "equal?"; super end
def ==(x) puts "=="; super end
def hash() puts "hash"; super end
def id() puts "id"; super end
end
hash
hash
hash
hash
hash
hash
hash
hash
hash
hash
=> [#<Dummy:0x10187428>, #<Dummy:0x10187440>]

I don't see eql? called here - maybe it's an internal optimization that
short circuits identity. Or did I miss something?

Kind regards

robert
 
M

Martin DeMello

ts said:
i> I agree, #uniq is certainly not the only candidate. I think it would be very
i> useful to compile a list of all such methods that could use a block to
i> override some default internal method (#== in the case of uniq), and
i> consider these for 2.0.

#uniq don't use #==, it use #hash and #eql?

Which raises the problem that it works globally, unlike the unix uniq
which only uniqs consecutive elements. Passing in a block would lead to
a performance hit since you'd need to compare every pair of elements
without the benefit of a hash function. A uniq_by would probably cover
most of the use cases anyway.

martin
 
T

ts

R> I don't see eql? called here - maybe it's an internal optimization that
R> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
* if eql? return true, this is the "same" object (i.e. it will use the
same key)
* otherwise there is collision

Object#hash return the id of object



Guy Decoux
 
R

Robert Klemme

ts said:
R> I don't see eql? called here - maybe it's an internal optimization that
R> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Ah, yes of course. That's it! Thanks for reminding me.
Now if they have the same hash value :
* if eql? return true, this is the "same" object (i.e. it will use the
same key)
* otherwise there is collision

Of all what you said I don't understand your last point: Why is there a
collision?
=> [#<Foo:0x101a9808>, #<Foo:0x101a9898>]

And indeed a redefined Dummy clarifies this:

class Dummy
def eql?(x) puts "eql?"; super end
def equal?(x) puts "equal?"; super end
def ==(x) puts "=="; super end
def hash() puts "hash"; 0 end
def id() puts "id"; super end
end
hash
hash
eql?
hash
hash
eql?
=> [#<Dummy:0x1018da28>, #<Dummy:0x1018da88>]

Again a reminder that it's always a good idea to override #hash, #== and
#eql? in one go for classes that should be used as hash keys or in sets.
Object#hash return the id of object

Yeah, but that might be overridden in an arbitrary way. Normally one
can't rely on that #hash tells anything about the identity of an instance
other than probably that instances with different hash are not identical
and not equivalent.

Kind regards

robert
 
T

ts

R> Of all what you said I don't understand your last point: Why is there a
R> collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a
different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same hash
value but are different (eql? return false) (see st.c)



Guy Decoux
 
R

Robert Klemme

ts said:
R> Of all what you said I don't understand your last point: Why is there a
R> collision?

Array#uniq just store the element of the array as the key of a hash.

For an hash, 2 values are stored in differents keys if they have a

I think the term is "bucket" and not "key". Maybe that was confusing me.
different hash value, or if eql? return false.

In hash "terminology", there is collision when 2 elements have the same hash
value but are different (eql? return false) (see st.c)

Ok, I now see what "collision" you meant. Of course there is a collision
but still both keys are stored.

Thx for clarifying!

robert
 
B

Bertram Scharpf

Hi,

Am Donnerstag, 27. Jan 2005, 20:11:35 +0900 schrieb ts:
R> I don't see eql? called here - maybe it's an internal optimization that
R> short circuits identity. Or did I miss something?

If 2 object have different hash value, it don't need to call eql?

Now if they have the same hash value :
* if eql? return true, this is the "same" object (i.e. it will use the
same key)
* otherwise there is collision

What a look at the interpreter source code will validate.
The elements addes as hash keys.

Strings are treated special, by the way.

Bertram
 
M

Martin DeMello

itsme213 said:
Why a different method? Is that the way it is done elsewhere? Why not
def uniq
if block_given?
... less_efficient
else default_way
end

By analogy with sort and sort_by

a.uniq_by {|i| i[1]} would work like uniq, but hashing i rather than
i. a.uniq {|i,j| f(i,j) == true } would be the generalised method, but
as noted that's O(n^2) rather than O(n). (Also I'm not sure it'd be
algorithmically sound - you could introduce order dependencies with a
sufficiently adversarial comparison function.)

martin
 
T

Trans

By analogy with sort and sort_by
a.uniq_by {|i| i[1]} would work like uniq, but hashing i
rather than i. a.uniq {|i,j| f(i,j) == true } would be the
generalised method, but as noted that's O(n^2) rather
than O(n). (Also I'm not sure it'd be algorithmically sound
- you could introduce order dependencies with a sufficiently
adversarial comparison function.)


Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.
Thanks,
T.
 
P

Pit Capitain

Trans said:
Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

No problem:

require 'set'

class Array
def uniq_by
result = []
values = Set.new
each do |elem|
value = yield elem
unless values.include? value
values << value
result << elem
end
end
result
end
end

p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject ;-)

Regards,
Pit
 
M

Martin DeMello

Trans said:
Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

module Enumerable
def uniq_by(*args, &blk)
blk = lambda {|i| i.send(*args)} unless args.empty?
raise "needs args or block" unless blk
h = {}
a = []
each {|i|
k = blk.call(i)
a << i unless h[k]
h[k] = true
}
a
end
end

p (-5..5).to_a.uniq_by {|i| i*i}
p [[1, 2], [3, 4], [5, 6], [7, 2]].uniq_by:)at, 1)

martin
 
F

Florian Gross

Pit said:
Trans said:
Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

No problem:

require 'set'

class Array
def uniq_by
result = []
values = Set.new
each do |elem|
value = yield elem
unless values.include? value
values << value
result << elem
end
end
result
end
end

p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject ;-)

module Enumerable
def uniq_by()
inject([]) do |state, item|
value = yield(item)
state.include?(value) ? state : state + [item]
end
end
end
 
R

Robert Klemme

Pit Capitain said:
Trans said:
Has anyone, could anyone, write a Ruby version of this method? I'm not
sure what its supposed to do exaclty. Id like to see source.

No problem:

require 'set'

class Array
def uniq_by
result = []
values = Set.new
each do |elem|
value = yield elem
unless values.include? value
values << value
result << elem
end
end
result
end
end

p ( 0 .. 9 ).to_a.uniq_by { |x| x % 4 } # => [0, 1, 2, 3]

I bet Robert will transform this into a version using inject ;-)

Since you asked... :))

module Enumerable
def uniq_by
inject({}) {|h,x| h[yield(x)] ||= x; h}.values
end
end
a=(1..10).map {rand 10} => [2, 7, 4, 1, 9, 0, 0, 2, 5, 0]
a.uniq_by {|x| x%3}
=> [9, 7, 2]

Pro:
- shorter
- needs one less temp instace

Con:
- not stable, original order is not maintained

A stable version:

module Enumerable
def uniq_by
h = {}; inject([]) {|a,x| h[yield(x)] ||= a << x}
end
end
=> [2, 7, 9]

Beautiful about this version is that it does not need the "; h" at the end
of the inject block and no ".values", too... :)

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top