Detecting duplicates in an array, anything in the standard library ?

T

Thibaut Barrère

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:
%w(a b c c).dups
=> ["c"]

Anything more elegant ?

cheers

Thibaut
 
W

Wolfgang Nádasi-Donner

Thibaut said:
Anything more elegant ?

No! :)) - I tried it only using Arrays...

a = [1,2,3,4,5,4,2,2]
p a.inject([[],a[1..-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1..-1]]
: [r[0], r[1][1..-1]]}[0].uniq # => [2, 4]
b = %w(a b c c)
p b.inject([[],b[1..-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1..-1]]
: [r[0], r[1][1..-1]]}[0].uniq # => ["c"]

Wolfgang Nádasi-Donner
 
D

David A. Black

--1926193751-1683877422-1187523218=:13998
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-1683877422-1187523218=:13998"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-1683877422-1187523218=:13998
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

Thibaut said:
Anything more elegant ?

No! :)) - I tried it only using Arrays...

a =3D [1,2,3,4,5,4,2,2]
p a.inject([[],a[1..-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1..-1]]
: [r[0], r[1][1..-1]]}[0].uniq # =3D> [2, 4]
b =3D %w(a b c c)
p b.inject([[],b[1..-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1..-1]]
: [r[0], r[1][1..-1]]}[0].uniq # =3D> ["c"]

How about:
a =3D [1,2,3,4,5,4,2,2] =3D> [1, 2, 3, 4, 5, 4, 2, 2]
a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=3D> [1, 2, 3, 4, 5]


David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-1683877422-1187523218=:13998--
--1926193751-1683877422-1187523218=:13998--
 
W

Wolfgang Nádasi-Donner

David said:
Hi --

: [r[0], r[1][1..-1]]}[0].uniq # => ["c"]
How about:
a = [1,2,3,4,5,4,2,2] => [1, 2, 3, 4, 5, 4, 2, 2]
a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]


David

The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn't care about double elements,
otherwise...

irb(main):004:0> a
=> [1, 2, 3, 4, 5, 4, 2, 2]
irb(main):005:0> b
=> [1, 2, 3, 4, 5]
irb(main):006:0> a-b
=> []

...would work. My solution is not recommended at all - it's sunday after
lunch time, and I had the decision between cleaning the dishes or to do
some nice things before...

Wolfgang Nádasi-Donner
 
D

David A. Black

--1926193751-1586217508-1187526827=:15294
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-1586217508-1187526827=:15294"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-1586217508-1187526827=:15294
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

David said:
Hi --

: [r[0], r[1][1..-1]]}[0].uniq # =3D> ["c"]
How about:
a =3D [1,2,3,4,5,4,2,2] =3D> [1, 2, 3, 4, 5, 4, 2, 2]
a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=3D> [1, 2, 3, 4, 5]


David

The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn't care about double elements,

Sorry, just ignore me. I've reinvented Array#uniq :) /me reaches for
coffee....


David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-1586217508-1187526827=:15294--
--1926193751-1586217508-1187526827=:15294--
 
A

Ari Brown

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
def dups
inject({}) {|h,v| h[v]=3Dh[v].to_i+1; h}.reject{|k,v| v=3D=3D1}.keys=
end
end

which will give:
%w(a b c c).dups
=3D> ["c"]

Anything more elegant ?

Couldn't you also just do a union with itself?

a =3D %w(a b c b a)
b =3D a & a #=3D> ["a", "b", "c"]

Score one for me :))
~ Ari
English is like a pseudo-random number generator - there are a =20
bajillion rules to it, but nobody cares.
 
D

David A. Black

--1926193751-1394418063-1187528815=:15936
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-1394418063-1187528815=:15936"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-1394418063-1187528815=:15936
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

Hi!
=20
Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:
=20
module Enumerable
def dups
inject({}) {|h,v| h[v]=3Dh[v].to_i+1; h}.reject{|k,v| v=3D=3D1}.keys
end
end
=20
which will give:
=20
%w(a b c c).dups
=3D> ["c"]
=20
Anything more elegant ?

Couldn't you also just do a union with itself?

a =3D %w(a b c b a)
b =3D a & a #=3D> ["a", "b", "c"]

Score one for me :))

I think that just reinvents uniq (see my previous reinvention :)

For what it's worth, here's a nice-looking but probably very
inefficient version:

module ArrayStuff
def count(e)
select {|f| f =3D=3D e }.size
end

def dups
select {|e| count(e) > 1 }.uniq
end
end

a =3D [1,2,3,3,4,5,2].extend(ArrayStuff)

p a.dups # [2,3]


David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-1394418063-1187528815=:15936--
--1926193751-1394418063-1187528815=:15936--
 
A

Ari Brown

I think that just reinvents uniq (see my previous reinvention :)

The only reason I'll accept that

is because you wrote the book I'm reading.

---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est
man alive
 
W

William James

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:
%w(a b c c).dups

=> ["c"]

Anything more elegant ?

cheers

Thibaut

Here's a modification of a technique used by
Simon Kroger:

class Array
def dups
values_at( * (0...size).to_a - uniq.map{|x| index(x)} )
end
end
==>nil

%w(a b a c c d).dups
==>["a", "c"]
 
T

Trans

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:
module Enumerable
def dups
inject({}) {|h,v| h[v]=3Dh[v].to_i+1; h}.reject{|k,v| v=3D=3D1}.keys
end
end
which will give:
=3D> ["c"]
Anything more elegant ?

Thibaut

Here's a modification of a technique used by
Simon Kroger:

class Array
def dups
values_at( * (0...size).to_a - uniq.map{|x| index(x)} )
end
end
=3D=3D>nil


Does everyone agree that #dups is the best name for this? I recently
added this to Facets as #duplicates to avoid proximity to #dup. Is
that reasonable?

(Facets already had #nonuniq, btw.)

T=2E
 
W

William James

Hi!
Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:
module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end
which will give:
%w(a b c c).dups
=> ["c"]
 
R

Robert Klemme

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:
%w(a b c c).dups
=> ["c"]

Actually you are not deleting duplicates as far as I can see. Here's
another one

irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
h[x]+=1;h}.inject([]){|h,(k,v)|h<<k if v>1;h}
=> ["c"]

You could even change that to need just one iteration through the
original array but it's too late and I'm too lazy. :)

Kind regards

robert
 
R

Robert Klemme

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:

module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:
%w(a b c c).dups
=> ["c"]

Actually you are not deleting duplicates as far as I can see.

Did I say it's too late? Man, I should've worn my glasses...
Here's another one

irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
h[x]+=1;h}.inject([]){|h,(k,v)|h<<k if v>1;h}
=> ["c"]

You could even change that to need just one iteration through the
original array but it's too late and I'm too lazy. :)

Cheers

robert
 
T

thomas.macklin

Hi!
Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:
module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end
which will give:
%w(a b c c).dups
=> ["c"]
Actually you are not deleting duplicates as far as I can see.

Did I say it's too late? Man, I should've worn my glasses...
Here's another one
irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
h[x]+=1;h}.inject([]){|h,(k,v)|h<<k if v>1;h}
=> ["c"]
You could even change that to need just one iteration through the
original array but it's too late and I'm too lazy. :)

Cheers

robert

or...

require 'set'

new_ary = ary.to_set.to_a #set strips dups.
 
R

Robert Klemme

2007/8/20 said:
On 19.08.2007 12:38, Thibaut Barr=E8re wrote:
Hi!
Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I've
seen and used various approaches, like:
module Enumerable
def dups
inject({}) {|h,v| h[v]=3Dh[v].to_i+1; h}.reject{|k,v| v=3D=3D1}.= keys
end
end
which will give:
%w(a b c c).dups
=3D> ["c"]
Actually you are not deleting duplicates as far as I can see.

Did I say it's too late? Man, I should've worn my glasses...
Here's another one
irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
h[x]+=3D1;h}.inject([]){|h,(k,v)|h<<k if v>1;h}
=3D> ["c"]
You could even change that to need just one iteration through the
original array but it's too late and I'm too lazy. :)

Cheers

robert

or...

require 'set'

new_ary =3D ary.to_set.to_a #set strips dups.

It does, but as far as I can see OP wanted exactly the duplicates back.

Cheers

robert
 
G

Gabriel Dragffy

It does, but as far as I can see OP wanted exactly the duplicates
back.

Cheers

robert


I'm a n00b, sorry if I'm poking nose in. Couldn't the op do something
using &, like so:

[1,2,3] & [2,3,4] == [2,3]


?

Regards Gabe
 
D

David A. Black

Hi --

It does, but as far as I can see OP wanted exactly the duplicates back.

Cheers

robert


I'm a n00b, sorry if I'm poking nose in. Couldn't the op do something using
&, like so:

[1,2,3] & [2,3,4] == [2,3]

The original question was how to get all dups occurring in one array:

[1,2,3,2,4,5,5,6] => [2,5]


David

--
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
 
G

Gabriel Dragffy

how about calling the uniq method:

[1,2,2,3].uniq

or did I miss the point again? ;)
 
P

Peña, Botp

From: Thibaut Barr=E8re [mailto:[email protected]]=20
# inject({}) {|h,v| h[v]=3Dh[v].to_i+1; h}.reject{|k,v| =
v=3D=3D1}.keys

sshhh, in ruby1.9, i think you just do=20

group_by{|e|e}.select{|_,v| v.size>1}.keys

yes, yes, hash#select now hopefully returns hash.
can't we have group_by now ? :)

kind regards -botp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top