Detecting duplicates in an array, anything in the standard library ?

P

Peña, Botp

RnJvbTogSmltbXkgS29mbGVyIFttYWlsdG86a29mbGVyamltQG1haWxpbmF0b3IuY29tXSANCiMg
ICAgIHVuaXEubWFwIHt8dnwgKHNlbGYgLSBbdl0pLnNpemUgPCAoc2VsZi5zaXplIC0gMSkgPyB2
IDogbmlsfS5jb21wYWN0DQoNCmNvb2wuDQpjb3VsZCB3ZSBzaW1wbGlmeSBpdCBsaWtlLA0KDQpp
cmIobWFpbik6MDE0OjA+IGENCj0+IFsxLCAxLCAyLCAyLCAyLCA0LCAzXQ0KaXJiKG1haW4pOjAx
NTowPiBhLnNlbGVjdHt8ZXwgKGEtW2VdKS5zaXplIDwgYS5zaXplIC0gMX0udW5pcQ0KPT4gWzEs
IDJdDQoNCmtpbmQgcmVnYXJkcyAtYm90cA0K
 
P

Peña, Botp

RnJvbTogUGXDsWEsIEJvdHAgW21haWx0bzpib3RwQGRlbG1vbnRlLXBoaWwuY29tXSANCiMgaXJi
KG1haW4pOjAxNTowPiBhLnNlbGVjdHt8ZXwgKGEtW2VdKS5zaXplIDwgYS5zaXplIC0gMX0udW5p
cQ0KIyA9PiBbMSwgMl0NCg0Kb29wcywNCg0KaXJiKG1haW4pOjAxNDowPiBhDQo9PiBbMSwgMSwg
MiwgMiwgMiwgNCwgM10NCmlyYihtYWluKTowMTU6MD4gYS51bmlxLnNlbGVjdHt8ZXwgKGEtW2Vd
KS5zaXplIDwgYS5zaXplIC0gMX0NCj0+IFsxLCAyXQ0K
 
R

Robert Klemme

2007/8/21 said:
From: Jimmy Kofler [mailto:[email protected]]
# uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compac= t

cool.
could we simplify it like,

irb(main):014:0> a
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq
=3D> [1, 2]

Nice! But I'd think this is more efficient:

irb(main):001:0> a =3D [1, 1, 2, 2, 2, 4, 3]
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=3D> [1, 2]

Kind regards

robert
 
P

Peña, Botp

From: Robert Klemme [mailto:[email protected]]=20
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > 1}
=3D> [1, 2]
 
R

Ryan Davis

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > = 1}
=3D> [1, 2]

I came up with something vaguely similar:

class Array
def dupes
a =3D self.dup
self.partition { |o| a.delete(o) }.last
end
end
=3D> [2, 4]
 
D

David A. Black

Hi --

Duplicates can also be extracted from an array like this:


class Array

def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end

end

It's buggy, though:
[nil,1,2,2,3,nil].find_dups
=> [2]


David

--
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
 
D

David A. Black

--1926193751-448758571-1187696956=:25684
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-448758571-1187696956=:25684"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-448758571-1187696956=:25684
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=20
compare also,
=20
irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > 1}
=3D> [1, 2]

I came up with something vaguely similar:

class Array
def dupes
a =3D self.dup
self.partition { |o| a.delete(o) }.last
end
end
=3D> [2, 4]

You'd want to throw a .uniq on there; otherwise, non-consecutive dupes
get processed twice:
[1,2,2,3,4,4,2].dupes
=3D> [2, 4, 2]


David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-448758571-1187696956=:25684--
--1926193751-448758571-1187696956=:25684--
 
J

Jimmy Kofler

Posted by Peña, Botp (Guest) on 21.08.2007 10:31
could we simplify it like

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]


Sure.

ruby -e 'a = [nil,1,2,2,3,nil]' -e 'p a.uniq.select{|e| (a-[e]).size <
a.size - 1}'
=> [nil, 2]

So we do not need to fix the original version to handle nil correctly:

ruby -e 'a = [nil,1,2,2,3,nil]' -e 'p (a.size - a.nitems > 1) ? ([nil]
+ a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact) :
(a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact)'
=> [nil, 2]


Cheers,

j.k.
 
A

Ari Brown

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > = 1}
=3D> [1, 2]

I still think it's easier just to union itself...

a =3D [1,2,3,2,1]
b =3D a & a
b =3D [1,2,3]
---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est =20
man alive
 
P

Phrogz

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
compare also,
irb(main):056:0> b=a.dup
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f == e}).size > 1}
=> [1, 2]

I still think it's easier just to union itself...

a = [1,2,3,2,1]
b = a & a
b = [1,2,3]

....but that's not what the OP wanted. What you've written is the same
as the #uniq method.

Don't feel bad, this thread has been filled with people answering the
wrong question. :p The original question was roughly "How do I find
out all the elements in the array that are duplicates?"

Solutions to that question would not include '3' in the above results.
It's unclear to me if %w| a b b b | should include 'b' once or twice
in the output, though, and the original poster has not clarified that,
that I can see.
 
D

David A. Black

--1926193751-1964928007-1187713559=:30298
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-1964928007-1187713559=:30298"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-1964928007-1187713559=:30298
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

From: Robert Klemme [mailto:[email protected]]
# irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
compare also,
irb(main):056:0> b=3Da.dup
=3D> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f =3D=3D e}).size > 1= }
=3D> [1, 2]

I still think it's easier just to union itself...

a =3D [1,2,3,2,1]
b =3D a & a
b =3D [1,2,3]

...but that's not what the OP wanted. What you've written is the same
as the #uniq method.

Don't feel bad, this thread has been filled with people answering the
wrong question. :p The original question was roughly "How do I find
out all the elements in the array that are duplicates?"

Solutions to that question would not include '3' in the above results.
It's unclear to me if %w| a b b b | should include 'b' once or twice
in the output, though, and the original poster has not clarified that,
that I can see.

I think once, since it's just the quality of being non-unique in
the array that qualifies an object for inclusion. At least, that's my
understanding, though as one of the people who reimplemented
Array#uniq, I may not be the right person to listen to :)


David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-1964928007-1187713559=:30298--
--1926193751-1964928007-1187713559=:30298--
 
J

Jeremy Woertink

I just thought I would put in my 2 cents. I actually had to create a
script that would run through a file and find all the duplicate account
numbers and the number of times they were duplicated and write that to a
new file.

@lines = Hash.new(0)
@group = Array.new
IO.readlines("C:/test/" + @file).each { |line|
@lines[line.split(';')[5].chomp] += 1 }
@lines.each_pair { |k,v| @group << k.to_s + " => " + v.to_s if v > 1 }


This is a part of the file that reads the file and grabs the duplicates



~Jeremy
 
J

Jimmy Kofler

Jeremy said:
I actually had to ... find all the duplicate account
numbers and the number of times they were duplicated and ... .
...
~Jeremy


A much less verbose 'nil' fix of the original version would be to use
[v] instead of v:

a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]


And with this fixed version it's also possible to count & grab duplicate
array items in one go:

a = [nil,1,2,2,3,nil,nil]
a = (a * 5 << "unique_obj1" << "unique_obj2").sort_by { rand }

p a.uniq.map {|v| diff = (a.size - (a-[v]).size); (diff > 1) ? [v, diff]
: nil}.compact

=> [[2, 10], [3, 5], [nil, 15], [1, 5]]


Cheers,

j.k.
 
J

Jimmy Kofler

Jimmy said:
Jeremy Woertink wrote:
I actually had to ... find all the duplicate account
numbers and the number of times they were duplicated and ... .
...
~Jeremy


A much less verbose 'nil' fix of the original version would be to use
[v] instead of v:

a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]

This fix does not work for a = [nil,1,2,[7],2,[7],3,nil], but the
previous version using "(a.size - a.nitems > 1) ? ..." does. Ruby 1.9
though is said to introduce a non-greedy Array#flatten:

# Ruby 1.9
a = [nil,1,[7],2,2,[7],3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten(1)
=> [nil, [7], 2]


Cheers,

j.k.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,226
Latest member
KristanTal

Latest Threads

Top