Duplicate elements in array

S

Shuaib Zahda

Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = ["apple", "banana", "apple", "orange"]
=> ["apple", "banana", "apple", "orange"]
array.uniq
=> ["apple", "banana", "orange"]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib
 
M

Mohit Sindhwani

Shuaib said:
Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = ["apple", "banana", "apple", "orange"]
=> ["apple", "banana", "apple", "orange"]
array.uniq
=> ["apple", "banana", "orange"]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib

I don't know a good way to do it, but one way to get the result would be
to force it into a hash since that eliminates duplicates.


I'm sure there's a better way to do it, but here's what I got.

array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow"]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}

puts duplicates

Cheers
Mohit.
 
M

Mohit Sindhwani

Sean said:
Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = ["apple", "banana", "apple", "orange"]
=> ["apple", "banana", "apple", "orange"]
array.uniq
=> ["apple", "banana", "orange"]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib
think of it right now):

array = ["apple", "banana", "apple", "orange"]
counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item| hash[item]
+= 1; hash}
p counts #=> {"apple"=>2, "banana"=>1, "orange"=>1}
p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> ["apple"]


Regards,
Sean
I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.
 
R

Robert Klemme

Sean said:
Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = ["apple", "banana", "apple", "orange"]
=> ["apple", "banana", "apple", "orange"]
array.uniq
=> ["apple", "banana", "orange"]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib
think of it right now):

array = ["apple", "banana", "apple", "orange"]
counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item|
hash[item]
+= 1; hash}
p counts #=> {"apple"=>2, "banana"=>1, "orange"=>1}
p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> ["apple"]

irb(main):007:0> array = %w{apple banana apple orange}
=> ["apple", "banana", "apple", "orange"]
irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> ["apple"]

Kind regards

robert
 
H

Harry Kakueki

Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = ["apple", "banana", "apple", "orange"]
=> ["apple", "banana", "apple", "orange"]
array.uniq
=> ["apple", "banana", "orange"]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib

arr,dup = ["apple", "banana", "apple", "orange"],[]
(arr.length-1).times do
a = arr.shift
dup << a if arr.include?(a)
end
p dup.uniq

Harry
 
S

Sean O'Halpin

I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.

Hi,

They are definitely worth looking into - inject in particular is a
powerful tool (Robert Klemme can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).

Regards,
Sean

# Mohit Sindhwani (with slight adjustment)
def duplicates_1(array)
seen = { }
duplicates = { }
array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
duplicates.keys
end

# Robert Klemme
def duplicates_2(array)
array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
end

# from facets
def duplicates_3(array)
array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end

require 'benchmark'

def do_benchmark(title, n, methods, *args, &block)
puts '-' * 40
puts title
puts '-' * 40
Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
methods.each do |meth|
x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
end
end
end

# get some data (Ubuntu specific I guess - YMMV)
array = File.read('/etc/dictionaries-common/words').split(/\n/)

# test w/o dups
do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

# create some duplicates
array = array[0..999] * 100
do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

__END__
$ ruby bm-duplicates.rb
 
M

Mohit Sindhwani

Sean said:
I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.

Hi,

They are definitely worth looking into - inject in particular is a
powerful tool (Robert Klemme can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).

Regards,
Sean

# Mohit Sindhwani (with slight adjustment)
def duplicates_1(array)
seen = { }
duplicates = { }
array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
duplicates.keys
end

# Robert Klemme
def duplicates_2(array)
array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
end

# from facets
def duplicates_3(array)
array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end

require 'benchmark'

def do_benchmark(title, n, methods, *args, &block)
puts '-' * 40
puts title
puts '-' * 40
Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
methods.each do |meth|
x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
end
end
end

# get some data (Ubuntu specific I guess - YMMV)
array = File.read('/etc/dictionaries-common/words').split(/\n/)

# test w/o dups
do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

# create some duplicates
array = array[0..999] * 100
do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

__END__
$ ruby bm-duplicates.rb
----------------------------------------
no duplicates
----------------------------------------
user system total real
duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)
----------------------------------------
duplicates
----------------------------------------
user system total real
duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.
 
S

Sean O'Halpin

Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.

It depends what you're doing with them and how big they are. But in
this instance, I changed your solution to use a hash because you were
appending the duplicates to an array which resulted in adding an entry
to that array every time you detected a duplicate. This didn't show up
in your example because your data contained at most two instances of
an item. If you change your example to:

array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow",
"apple", "apple"]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}

puts duplicates

it outputs

apple
cow
apple
apple

which is probably not what you want.

Regards,
Sean
 
J

Jimmy Kofler

Duplicate elements in array
Posted by Shuaib Zahda (shuaib85) on 28.10.2007 13:47
Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.

Here's yet another way to do it:
http://snippets.dzone.com/posts/show/4148

Cheers,

j.k.
 
M

Mohit Sindhwani

Sean said:
Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.

It depends what you're doing with them and how big they are. But in
this instance, I changed your solution to use a hash because you were
appending the duplicates to an array which resulted in adding an entry
to that array every time you detected a duplicate. This didn't show up
in your example because your data contained at most two instances of
an item. If you change your example to:

array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow",
"apple", "apple"]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}

puts duplicates

it outputs

apple
cow
apple
apple

which is probably not what you want.

Regards,
Sean
Thanks for the explanation, Sean. Actually, I guess it's not clear if
the OP wants to know each occurrence of the duplicates or just the list
of duplicates. But, there are now solutions for both cases!

Cheers,
Mohit.
10/29/2007 | 11:44 AM.
 
P

Peña, Botp

From: Sean O'Halpin [mailto:[email protected]]=20
# $ ruby bm-duplicates.rb
# ----------------------------------------
# no duplicates
# ----------------------------------------
# user system total real
# duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
# duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
# duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)
# ----------------------------------------
# duplicates
# ----------------------------------------
# user system total real
# duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
# duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
# duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

i just tested this using ruby1.9 on a p4 box running windowsxp. i =
included ruby's group_by and got surprising results.

C:\ruby1.9\bin>diff test-old.rb test.rb
19a20,24
#1.9's group_by
def duplicates_4(array)
array.group_by{|e|e}.select{|_,k| k.size>1}.keys
end
26c31
< Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
---
Benchmark.bmbm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
34c39
< array =3D File.read('/etc/dictionaries-common/words').split(/\n/)
---
array =3D File.read('american-english').split(/\n/)
38c43
< :duplicates_3], array)
---
:duplicates_3,:duplicates_4], array)
43c48
< :duplicates_3], array)
---
:duplicates_3,:duplicates_4], array)

C:\ruby1.9\bin>


C:\ruby1.9\bin>ruby test.rb
----------------------------------------
no duplicates
----------------------------------------
Rehearsal -------------------------------------------------
duplicates_1 7.609000 0.094000 7.703000 ( 7.984000)
duplicates_2 10.438000 0.109000 10.547000 ( 11.608000)
duplicates_3 14.609000 0.219000 14.828000 ( 14.874000)
duplicates_4 11.422000 0.141000 11.563000 ( 14.201000)
--------------------------------------- total: 44.641000sec

user system total real
duplicates_1 7.219000 0.125000 7.344000 ( 8.109000)
duplicates_2 9.844000 0.078000 9.922000 ( 10.374000)
duplicates_3 14.391000 0.172000 14.563000 ( 18.498000)
duplicates_4 11.172000 0.172000 11.344000 ( 12.998000)
----------------------------------------
duplicates
----------------------------------------
Rehearsal -------------------------------------------------
duplicates_1 3.375000 0.000000 3.375000 ( 3.765000)
duplicates_2 3.218000 0.000000 3.218000 ( 3.828000)
duplicates_3 3.250000 0.000000 3.250000 ( 3.672000)
duplicates_4 2.032000 0.047000 2.079000 ( 2.077000)
--------------------------------------- total: 11.922000sec

user system total real
duplicates_1 3.375000 0.000000 3.375000 ( 3.437000)
duplicates_2 3.188000 0.000000 3.188000 ( 3.218000)
duplicates_3 3.219000 0.015000 3.234000 ( 3.281000)
duplicates_4 1.844000 0.000000 1.844000 ( 1.859000)

C:\ruby1.9\bin>

kind regards -botp
 
R

Robert Klemme

2007/10/28 said:
irb(main):007:0> array = %w{apple banana apple orange}
=> ["apple", "banana", "apple", "orange"]
irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> ["apple"]
Succint ~and~ efficient!
Thanks!

Do you have a mail filter checking for any posts
containing 'inject'? :)

I don't need that since most of them were written by me. :) (slight
exaggeration)
*chuckle*

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top