Detecting duplicates in an array, anything in the standard library ?

Jimmy Kofler · Aug 21, 2007

Duplicates can also be extracted from an array like this:

class Array

def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end

end

(The faster, the better; http://snippets.dzone.com/posts/show/4148 )

Cheers,

j.k.

PeÃ±a, Botp · Aug 21, 2007

RnJvbTogSmltbXkgS29mbGVyIFttYWlsdG86a29mbGVyamltQG1haWxpbmF0b3IuY29tXSANCiMg
ICAgIHVuaXEubWFwIHt8dnwgKHNlbGYgLSBbdl0pLnNpemUgPCAoc2VsZi5zaXplIC0gMSkgPyB2
IDogbmlsfS5jb21wYWN0DQoNCmNvb2wuDQpjb3VsZCB3ZSBzaW1wbGlmeSBpdCBsaWtlLA0KDQpp
cmIobWFpbik6MDE0OjA+IGENCj0+IFsxLCAxLCAyLCAyLCAyLCA0LCAzXQ0KaXJiKG1haW4pOjAx
NTowPiBhLnNlbGVjdHt8ZXwgKGEtW2VdKS5zaXplIDwgYS5zaXplIC0gMX0udW5pcQ0KPT4gWzEs
IDJdDQoNCmtpbmQgcmVnYXJkcyAtYm90cA0K

PeÃ±a, Botp · Aug 21, 2007

RnJvbTogUGXDsWEsIEJvdHAgW21haWx0bzpib3RwQGRlbG1vbnRlLXBoaWwuY29tXSANCiMgaXJi
KG1haW4pOjAxNTowPiBhLnNlbGVjdHt8ZXwgKGEtW2VdKS5zaXplIDwgYS5zaXplIC0gMX0udW5p
cQ0KIyA9PiBbMSwgMl0NCg0Kb29wcywNCg0KaXJiKG1haW4pOjAxNDowPiBhDQo9PiBbMSwgMSwg
MiwgMiwgMiwgNCwgM10NCmlyYihtYWluKTowMTU6MD4gYS51bmlxLnNlbGVjdHt8ZXwgKGEtW2Vd
KS5zaXplIDwgYS5zaXplIC0gMX0NCj0+IFsxLCAyXQ0K

Robert Klemme · Aug 21, 2007

2007/8/21 said:
From: Jimmy Kofler [mailto:[email protected]]
# uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compac= t

cool.
could we simplify it like,

irb(main):014:0> a
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq
=3D> [1, 2]

Nice! But I'd think this is more efficient:

irb(main):001:0> a =3D [1, 1, 2, 2, 2, 4, 3]
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=3D> [1, 2]

Kind regards

robert

Peña, Botp · Aug 21, 2007

From: Robert Klemme [mailto:[email protected]]=20
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > 1}
=3D> [1, 2]

Ryan Davis · Aug 21, 2007

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > = 1}
=3D> [1, 2]

I came up with something vaguely similar:

class Array
def dupes
a =3D self.dup
self.partition { |o| a.delete(o) }.last
end
end

[1,2,2,3,4,4].dupes

Click to expand...

=3D> [2, 4]

David A. Black · Aug 21, 2007

Hi --

Duplicates can also be extracted from an array like this:

class Array

def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end

end

It's buggy, though:

[nil,1,2,2,3,nil].find_dups

Click to expand...

=> [2]

David

--
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

David A. Black · Aug 21, 2007

--1926193751-448758571-1187696956=:25684
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-448758571-1187696956=:25684"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-448758571-1187696956=:25684
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=20
compare also,
=20
irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > 1}
=3D> [1, 2]

Click to expand...

I came up with something vaguely similar:

class Array
def dupes
a =3D self.dup
self.partition { |o| a.delete(o) }.last
end
end

[1,2,2,3,4,4].dupes

Click to expand...

Click to expand...

=3D> [2, 4]

You'd want to throw a .uniq on there; otherwise, non-consecutive dupes
get processed twice:

[1,2,2,3,4,4,2].dupes

Click to expand...

=3D> [2, 4, 2]

David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-448758571-1187696956=:25684--
--1926193751-448758571-1187696956=:25684--

Jimmy Kofler · Aug 21, 2007

Posted by PeÃ±a, Botp (Guest) on 21.08.2007 10:31

could we simplify it like

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]

Sure.

ruby -e 'a = [nil,1,2,2,3,nil]' -e 'p a.uniq.select{|e| (a-[e]).size <
a.size - 1}'
=> [nil, 2]

So we do not need to fix the original version to handle nil correctly:

ruby -e 'a = [nil,1,2,2,3,nil]' -e 'p (a.size - a.nitems > 1) ? ([nil]
+ a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact) :
(a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact)'
=> [nil, 2]

Cheers,

j.k.

Ari Brown · Aug 21, 2007

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > = 1}
=3D> [1, 2]

I still think it's easier just to union itself...

a =3D [1,2,3,2,1]
b =3D a & a
b =3D [1,2,3]
---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est =20
man alive

Phrogz · Aug 21, 2007

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

Click to expand...

compare also,

Click to expand...

irb(main):056:0> b=a.dup
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f == e}).size > 1}
=> [1, 2]

Click to expand...

I still think it's easier just to union itself...

a = [1,2,3,2,1]
b = a & a
b = [1,2,3]

....but that's not what the OP wanted. What you've written is the same
as the #uniq method.

Don't feel bad, this thread has been filled with people answering the
wrong question.

The original question was roughly "How do I find
out all the elements in the array that are duplicates?"

Solutions to that question would not include '3' in the above results.
It's unclear to me if %w| a b b b | should include 'b' once or twice
in the output, though, and the original poster has not clarified that,
that I can see.

David A. Black · Aug 21, 2007

--1926193751-1964928007-1187713559=:30298
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-1964928007-1187713559=:30298"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-1964928007-1187713559=:30298
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

Click to expand...

compare also,

Click to expand...

irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > 1= }
=3D> [1, 2]

Click to expand...

I still think it's easier just to union itself...

a =3D [1,2,3,2,1]
b =3D a & a
b =3D [1,2,3]

Click to expand...

...but that's not what the OP wanted. What you've written is the same
as the #uniq method.

Don't feel bad, this thread has been filled with people answering the
wrong question. The original question was roughly "How do I find
out all the elements in the array that are duplicates?"

Solutions to that question would not include '3' in the above results.
It's unclear to me if %w| a b b b | should include 'b' once or twice
in the output, though, and the original poster has not clarified that,
that I can see.

I think once, since it's just the quality of being non-unique in
the array that qualifies an object for inclusion. At least, that's my
understanding, though as one of the people who reimplemented
Array#uniq, I may not be the right person to listen to

David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-1964928007-1187713559=:30298--
--1926193751-1964928007-1187713559=:30298--

Jeremy Woertink · Aug 21, 2007

I just thought I would put in my 2 cents. I actually had to create a
script that would run through a file and find all the duplicate account
numbers and the number of times they were duplicated and write that to a
new file.

@lines = Hash.new(0)
@group = Array.new
IO.readlines("C:/test/" + @file).each { |line|
@lines[line.split(';')[5].chomp] += 1 }
@lines.each_pair { |k,v| @group << k.to_s + " => " + v.to_s if v > 1 }

This is a part of the file that reads the file and grabs the duplicates

~Jeremy

Jimmy Kofler · Aug 21, 2007

Jeremy said:
I actually had to ... find all the duplicate account
numbers and the number of times they were duplicated and ... .
...
~Jeremy

A much less verbose 'nil' fix of the original version would be to use
[v] instead of v:

a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]

And with this fixed version it's also possible to count & grab duplicate
array items in one go:

a = [nil,1,2,2,3,nil,nil]
a = (a * 5 << "unique_obj1" << "unique_obj2").sort_by { rand }

p a.uniq.map {|v| diff = (a.size - (a-[v]).size); (diff > 1) ? [v, diff]
: nil}.compact

=> [[2, 10], [3, 5], [nil, 15], [1, 5]]

Cheers,

j.k.

Jimmy Kofler · Aug 22, 2007

Jimmy said:
Jeremy Woertink wrote:
I actually had to ... find all the duplicate account
numbers and the number of times they were duplicated and ... .
...
~Jeremy

Click to expand...

A much less verbose 'nil' fix of the original version would be to use
[v] instead of v:

a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]

This fix does not work for a = [nil,1,2,[7],2,[7],3,nil], but the
previous version using "(a.size - a.nitems > 1) ? ..." does. Ruby 1.9
though is said to introduce a non-greedy Array#flatten:

# Ruby 1.9
a = [nil,1,[7],2,2,[7],3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten(1)
=> [nil, [7], 2]

Cheers,

j.k.

Removing duplicates and substrings from an array	10	Nov 26, 2007
Somone's SO question: "Is there an existing library for dynamically-determineddimensional array in c	1	Dec 9, 2013
Counting values in an array, storing in a hash then making an arrayof hashes?	3	Mar 21, 2011
Count the number of times an element occurs in an array	11	Oct 5, 2009
non-destructive merging of hashes in array	13	Mar 13, 2007
The threading specs in the standard: a new catastrophe	57	Jul 7, 2011
array slicing	11	May 3, 2007
syntax error, unexpected '}', expecting kEND	4	Aug 11, 2010

Detecting duplicates in an array, anything in the standard library ?

Jimmy Kofler

PeÃ±a, Botp

PeÃ±a, Botp

Robert Klemme

Peña, Botp

Ryan Davis

David A. Black

David A. Black

Jimmy Kofler

Ari Brown

Phrogz

David A. Black

Jeremy Woertink

Jimmy Kofler

Jimmy Kofler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads