How to get non-unique elements from an array?

Mark Van Holstyn · Oct 17, 2005

------=_Part_70355_30315321.1129531111646
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Zach, if an element repeats more that once, itll double up in the output
array. need to add a .uniq on the end

irb(main):165:0> a =3D [1,2,3,4,5,3,5,3,5]
=3D> [1, 2, 3, 4, 5, 3, 5, 3, 5]
irb(main):166:0> t=3D[]; a.delete_if{ |e| r=3D(not t.include? e); t.push(e)=
; r }
=3D> [3, 5, 3, 5]

That aside, this one is a little shorter, and seems to benchmark a touch
faster as well.

irb(main):180:0> a=3D[1,2,3,4,5,3,5,3,5]
=3D> [1, 2, 3, 4, 5, 3, 5, 3, 5]
irb(main):181:0> t=3D[]; a.select{ |e| r=3Dt.include?e; t<<e; r }.uniq
=3D> [3, 5]
irb(main):182:0> a
=3D> [1, 2, 3, 4, 5, 3, 5, 3, 5]

Mark

------=_Part_70355_30315321.1129531111646--

Martin DeMello · Oct 17, 2005

Sam Kong said:
Hello!

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

Without counting:

h1 = {}
h2 = {}
a.each {|i|
h2 = true if h1
h1 = true
}

p h2.keys

martin

Relm · Oct 17, 2005

I need to get non-unique elements from an array.
The best I came up with was using a hash as a counter for each unique
elements.

a = [0,1,2,3,4,5,2,3]

#What I want to get is [2,3] as 2,3 are non-unique elements.

Two variants using regular expressions:

eval(a.inspect.gsub(/\b(\d+),(?!.*\b\1\b)/,'')).uniq

a.sort.inspect.scan(/\b(\d+)(, \1\b)+/).map{|x|x[0].to_i}

pauldacus · Oct 17, 2005

Young grasshoppers... let us never forget the power of the REGEX! Let
us not bother with counters or blocks... let us GSUB this bad boy!

a = [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=>["0", "1", "4", "5"]

Sure... it returns the elements as strings, but a quick "to_i" call in
the inevitable iterator that is coming solves that. This thing is also
subject to a Schwartzian Transform of the "map-sort-map" type.

Paul Dacus

Simon Strandgaard · Oct 17, 2005

Young grasshoppers... let us never forget the power of the REGEX! Let
us not bother with counters or blocks... let us GSUB this bad boy!

a =3D [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=3D>["0", "1", "4", "5"]

no need to sort..

a =3D [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
#=3D> [0, 1, 4, 5]

pauldacus · Oct 17, 2005

Well, until someone catches the horrible gaff on my last try by solving
the EXACT OPPOSITE PROBLEM WHILE BEING CONDESCENDING (my wife is used
to this), I'll keep trying:

Of course, the following is what works, finding duplicates ELEMENTS:

a.sort.to_s.scan(/(.)(\1)+/).flatten.uniq

Yeah... I guess it's clear this finds individual duplicate ELEMENTS,
not numbers. Works on single digits/letters/etc, blows up on all else.
How's that for useless?

What would do the trick is a 'non-greedy' array subtraction:

Normal(greedy): [0,1,2,3,4,5,2,3] - [0,1,2,3,4,5] = []
Non-greedy: [0,1,2,3,4,5,2,3] - [0,1,2,3,4,5] = [2,3]

Seems like there has got to be a way of solving this w/o counters or
blocks...

Ryan Leavengood · Oct 17, 2005

Young grasshoppers... let us never forget the power of the REGEX! Let
us not bother with counters or blocks... let us GSUB this bad boy!

a =3D [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=3D>["0", "1", "4", "5"]

Click to expand...

no need to sort..

a =3D [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
#=3D> [0, 1, 4, 5]

Both are interesting solutions, but you are returning the unique
elements of the array, not the non-unique ones, which was the original
problem. I say this not to be pedantic, but to see how you would solve
it.

Ryan

Simon Strandgaard · Oct 17, 2005

Young grasshoppers... let us never forget the power of the REGEX! Le= t
us not bother with counters or blocks... let us GSUB this bad boy!

a =3D [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=3D>["0", "1", "4", "5"]

Click to expand...

no need to sort..

a =3D [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
#=3D> [0, 1, 4, 5]

Click to expand...

Both are interesting solutions, but you are returning the unique
elements of the array, not the non-unique ones, which was the original
problem. I say this not to be pedantic, but to see how you would solve
it.

Sorry should be without the delete..

a =3D [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
#=3D> [2, 3]

Jeff Wood · Oct 17, 2005

How 'bout

a =3D [ 0,1,2,3,4,5,2,3 ]
a.uniq.inject(a.dup) { |b,i| b.delete_at(b.index(i)) ; b }

Young grasshoppers... let us never forget the power of the REGEX! = Let
us not bother with counters or blocks... let us GSUB this bad boy!

a =3D [0,1,2,3,4,5,2,3]

a.sort.to_s.gsub(/(.)(\1)+/, '').split(//)

=3D>["0", "1", "4", "5"]

no need to sort..

a =3D [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.delete(a.to_s.gsub(/(.)(?!.*\1)/,'')).split('').map{|i|i.to_i}
#=3D> [0, 1, 4, 5]

Click to expand...

Both are interesting solutions, but you are returning the unique
elements of the array, not the non-unique ones, which was the original
problem. I say this not to be pedantic, but to see how you would solve
it.

Click to expand...

Sorry should be without the delete..

a =3D [0, 1, 2, 3, 4, 5, 2, 3]
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
#=3D> [2, 3]

pauldacus · Oct 17, 2005

That '?!' zero width lookahead regex is nice, Simon.

A workable alternative, with nary a block or hash counter to offend the
eye:

a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq

This works for all numbers, not just single digits. Returns a string,
which you will 'to_i' in an iterator. Replace the \d with \w for
something to find dup strings... I think.

Sam Kong · Oct 17, 2005

Hi!

I'm the OP.
I thank those who replied to my message.
I learned a lot and seeing all different approaches was fun.

Somebody emailed me with his own solution which I like very much.
I want to share it with the group.

a = [0,1,2,3,4,5,2,3]
a.select{|e| a.index(e) != a.rindex(e)}.uniq

It is very brief and still very easy to understand.
Don't you think so?

I wonder if my slightly modified version works.

a.select{|e| a.grep(e).length > 1}.uniq

grep is for pattern but seems to work fine for integers.
I'm not sure though. (Simple tests show it works.)

In my opinion, using regex is too much for this problem (maybe
slower?).
Array#inject seems to be good but I get confused whenever I come across
it (maybe I'm not smart enough).

What do you think?

Thanks all!

Sam

Ryan Leavengood · Oct 17, 2005

I wonder if my slightly modified version works.

It works, but is considerably slower (I've included a few other recent
submissions as well):

require 'benchmark'
include Benchmark

a =3D [0,1,2,3,4,5,2,3]

BMN =3D 50000
bm(5) do |bx|
bx.report('index ') { BMN.times { a.select{|e| a.index(e) !=3D
a.rindex(e)}.uniq } }
bx.report('grep ') { BMN.times { a.select{|e| a.grep(e).length > 1}.uniq=
} }
bx.report('scan ') { BMN.times { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq } }
bx.report('inject') { BMN.times { a.uniq.inject(a.dup) { |b,i|
b.delete_at(b.index(i)) ; b } } }
bx.report('gsub ') { BMN.times {
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i} } }
end

Results on my laptop:

user system total real
index 1.281000 0.000000 1.281000 ( 1.281000)
grep 3.813000 0.015000 3.828000 ( 3.827000)
scan 3.515000 0.000000 3.515000 ( 3.531000)
inject 1.594000 0.000000 1.594000 ( 1.593000)
gsub 3.063000 0.016000 3.079000 ( 3.078000)

Whoever created that index and rindex based solution created a very
fast one. I'll leave it as an exercise for the reading to benchmark
all the other solutions.

Ryan

pauldacus · Oct 17, 2005

That is a good one.

Personally, I was curious though when you said you thought regex-ing it
was slower or 'too much', if it really would be on huge arrays.

And so a test:

(REGEX) a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq
vs.
(INDEX) a.select{|e| a.index(e) != a.rindex(e)}.uniq

I ran these on an array with 11,000 elements, 2,000 of which were
duplicates, as follows:

(1..10000).to_a + (2000..3000).to_a

The scan (REGEX) script finished in 0.21 seconds
The (INDEX) script ran in 28.77 seconds

The regex was over 100X faster!

And I actually was trying to run this with 110,000 elements but the
INDEX script seem to just lock up. The regex was done in 2.63 seconds,
and the INDEX script was still going 10 minutes later....

Have no doubt... a scan is darn fast!

Mark Van Holstyn · Oct 17, 2005

------=_Part_86147_19898330.1129581637841
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hey Sam,

I too tried a solution similar to...
a.select{|e| a.index(e) !=3D a.rindex(e)}.uniq

Then i tried the one which i wrote...
t=3D[]; a.select{ |e| r=3Dt.include?e; t<<e; r }.uniq

The second runs MUCH faster. The first one must search the array twice for
each element in in, and therefor runs something on the order of n^3 worst
case. The second only searches once for each and runs n^2 worst case. As fo=
r
the version with grep... Also, a.grep seems to run somewhere on the order o=
f
n^4.

Adding mine to ryan's bench mark i come out with...

a =3D [0,1,2,3,4,5,2,3]

BMN =3D 50000
bm(5) do |bx|
bx.report('index ') { BMN.times { a.select{|e| a.index(e) !=3D a.rindex(e)}=
uniq
} }
bx.report('grep ') { BMN.times { a.select{|e| a.grep(e).length > 1}.uniq } =
}
bx.report('scan ') { BMN.times { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq } }
bx.report('inject ') { BMN.times { a.uniq.inject(a.dup) { |b,i| b.delete_at=
(
b.index(i)) ; b } } }
bx.report('gsub ') { BMN.times {
a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
} }
bx.report('include') { BMN.times { t=3D[]; a.select{ |e| r=3Dt.include?e; t=
<<e;
r }.uniq } }
end

user system total real
index 1.312000 0.000000 1.312000 ( 1.328000)
grep 3.594000 0.000000 3.594000 ( 3.625000)
scan 3.094000 0.000000 3.094000 ( 3.109000)
inject 1.406000 0.000000 1.406000 ( 1.407000)
gsub 2.844000 0.000000 2.844000 ( 2.843000)
include 1.109000 0.000000 1.109000 ( 1.110000)

Mark

Hi!

I'm the OP.
I thank those who replied to my message.
I learned a lot and seeing all different approaches was fun.

Somebody emailed me with his own solution which I like very much.
I want to share it with the group.

a =3D [0,1,2,3,4,5,2,3]
a.select{|e| a.index(e) !=3D a.rindex(e)}.uniq

It is very brief and still very easy to understand.
Don't you think so?

I wonder if my slightly modified version works.

a.select{|e| a.grep(e).length > 1}.uniq

grep is for pattern but seems to work fine for integers.
I'm not sure though. (Simple tests show it works.)

In my opinion, using regex is too much for this problem (maybe
slower?).
Array#inject seems to be good but I get confused whenever I come across
it (maybe I'm not smart enough).

What do you think?

Thanks all!

Sam

--
Mark Van Holstyn
(e-mail address removed)
http://lotswholetime.com

------=_Part_86147_19898330.1129581637841--

Ryan Leavengood · Oct 17, 2005

And so a test:

(REGEX) a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq
vs.
(INDEX) a.select{|e| a.index(e) !=3D a.rindex(e)}.uniq

I ran these on an array with 11,000 elements, 2,000 of which were
duplicates, as follows:

(1..10000).to_a + (2000..3000).to_a

The scan (REGEX) script finished in 0.21 seconds
The (INDEX) script ran in 28.77 seconds

The regex was over 100X faster!

Thanks for this. I was wondering if my method of benchmarking might be
flawed, and it sort of is in this case. But it makes sense here
because the scan just has to make one run through the string, whereas
each iteration of the select has multiple calls to an index which
might potentially traverse the entire array. As my benchmark shows
though, a lot of searches in a smaller array is faster using the
index.

Also another problem with the string scanning is that it depends on
the array elements being numbers (at least in the case above.)

I decided to benchmark again using your method and adding some more
implementations. Here we go (beware of wrapped lines):

require 'benchmark'
include Benchmark

[[[0,1,2,3,4,5,2,3], 50000],[(1..5000).to_a + (2000..2500).to_a,
1]].each do |a, iterations|
bm(12) do |bx|
bx.report('simonk') { iterations.times { a.select{|e| a.index(e)
!=3D a.rindex(e)}.uniq } }
bx.report('samk ') { iterations.times { a.select{|e|
a.grep(e).length > 1}.uniq } }
bx.report('paul ') { iterations.times { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq } }
bx.report('jeff ') { iterations.times { a.uniq.inject(a.dup) {
|b,i| b.delete_at(b.index(i)) ; b } } }
bx.report('paolo ') { iterations.times { h =3D {};u =3D a.inject([])
{|res, x| h[x] ? res + [x] : h[x] =3D res}.uniq } }
bx.report('jegII ') { iterations.times { seen =3D
Hash.new(0);a.select { |e| (seen[e] +=3D 1) > 1 }.uniq } }
bx.report('simons') { iterations.times { uary =3D a.uniq;
a.map.delete_if{|i| x=3Duary.member?(i); uary.delete(i); x} } }
bx.report('robert') { iterations.times { a.inject(Hash.new(0))
{|h,e| h[e]+=3D1;h}.select {|k,v|v>1}.map {|x,y|x} } }
bx.report('david ') { iterations.times { a.uniq.find_all {|x|
a.find_all {|y| y =3D=3D x }.size > 1 } } }
bx.report('zach ') { iterations.times { t=3D[]; a.delete_if{ |e|
r=3D(not t.include? e); t.push(e); r }.uniq } }
bx.report('markvh') { iterations.times { t=3D[]; a.select{ |e|
r=3Dt.include?e; t<<e; r }.uniq } }
bx.report('martin') { iterations.times { h1 =3D {};h2 =3D {};a.each
{|i| h2 =3D true if h1;h1 =3D true };h2.keys } }
end
end

The results:

Small array, lots of iterations:
user system total real
simonk 1.344000 0.000000 1.344000 ( 1.344000)
samk 3.859000 0.000000 3.859000 ( 3.876000)
paul 3.532000 0.000000 3.532000 ( 3.547000)
jeff 1.640000 0.000000 1.640000 ( 1.641000)
paolo 2.656000 0.000000 2.656000 ( 2.672000)
jegII 1.704000 0.016000 1.720000 ( 1.719000)
simons 2.109000 0.000000 2.109000 ( 2.110000)
robert 3.172000 0.015000 3.187000 ( 3.203000)
david 5.078000 0.000000 5.078000 ( 5.079000)
zach 0.328000 0.000000 0.328000 ( 0.328000)
markvh 0.391000 0.000000 0.391000 ( 0.391000)
martin 0.390000 0.000000 0.390000 ( 0.484000)

Big array, one iteration:
user system total real
simonk 4.125000 0.000000 4.125000 ( 4.453000)
samk 17.125000 0.000000 17.125000 ( 17.283000)
paul 0.063000 0.000000 0.063000 ( 0.062000)
jeff 0.109000 0.000000 0.109000 ( 0.110000)
paolo 0.031000 0.000000 0.031000 ( 0.031000)
jegII 0.016000 0.000000 0.016000 ( 0.016000)
simons 2.219000 0.000000 2.219000 ( 2.265000)
robert 0.031000 0.000000 0.031000 ( 0.032000)
david 34.188000 0.016000 34.204000 ( 35.127000)
zach 2.156000 0.000000 2.156000 ( 2.203000)
markvh 0.015000 0.000000 0.015000 ( 0.016000)
martin 0.000000 0.000000 0.000000 ( 0.000000)

Overall Martin's was the best. As he suspected, David's was quite
slow. Sam's as well.

But it is interesting to see the varying results for each algorithm
based on how they are used.

Ryan

Ryan Leavengood · Oct 17, 2005

The results:

Small array, lots of iterations:
user system total real
markvh 0.391000 0.000000 0.391000 ( 0.391000)
martin 0.390000 0.000000 0.390000 ( 0.484000)

Big array, one iteration:
user system total real
markvh 0.015000 0.000000 0.015000 ( 0.016000)
martin 0.000000 0.000000 0.000000 ( 0.000000)

Overall Martin's was the best. As he suspected, David's was quite
slow. Sam's as well.

On second thought, Mark's may have been the best overall. Either way,
both Mark's and Martin's are good algorithms. Many of the others were
good as well.

Ryan

pauldacus · Oct 17, 2005

Awesome! What seemed like a fairly simple question... not so! I'm
going to look over all the answers again. The benchmarking is awesome.

Lyndon Samson · Oct 17, 2005

------=_Part_3515_23621305.1129590967768
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

user system total real
index 1.344000 0.000000 1.344000 ( 1.437000)
grep 3.875000 0.000000 3.875000 ( 4.266000)
scan 2.703000 0.000000 2.703000 ( 3.188000)
inject 1.625000 0.000000 1.625000 ( 1.812000)
gsub 3.078000 0.000000 3.078000 ( 3.547000)
bits 0.969000 0.000000 0.969000 ( 1.062000)
include 1.250000 0.000000 1.250000 ( 1.469000)

BMN =3D 50000
bm(5) do |bx|
bx.report('index ') { BMN.times { a.select{|e| a.index(e) !=3Da.rindex(e)}.=
uniq
} }
bx.report('grep ') { BMN.times { a.select{|e| a.grep(e).length > 1}.uniq } =
}
bx.report('scan ') { BMN.times {
a.sort.join('').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq
} }
bx.report('inject') { BMN.times { a.uniq.inject(a.dup) { |b,i|b.delete_at(
b.index(i)) ; b } } }
bx.report('gsub ') { BMN.times
{a.to_s.gsub(/(.)(?!.*\1)/,'').split('').map{|i|i.to_i}
} }
bx.report('bits ') { BMN.times {
bn =3D bn2 =3D 0
a.each {|el|(bn2 |=3D (1<<el) if bn[el] =3D=3D 1) | bn |=3D (1<<el) }
#0.upto(a.length) { |bitNdx| puts bitNdx if bn2[bitNdx] =3D=3D 1 }
} }
bx.report('include') { BMN.times { t=3D[]; a.select{ |e| r=3Dt.include?e; t=
<<e;r
}.uniq } }

end

Nyah nyah, unlimited length bitstrings win

No output array of course bu=
t
the results are encoded in Fixnum/Bignum bits.

------=_Part_3515_23621305.1129590967768--

Zach Dennis · Oct 17, 2005

Ryan said:
On second thought, Mark's may have been the best overall. Either way,
both Mark's and Martin's are good algorithms. Many of the others were
good as well.

Mine isn't the shortest, but it's the fastest I've seen thus far! Twice
as fast as the next closest! Why does everyone leave me out of benchmark,

---my code
t=[]; a.delete_if{ |e| r=(not t.include? e); t.push(e); r }
---end my code

"c:\ruby\bin\ruby.exe" find_non-uniques.rb -d

user system total real
delete 0.218000 0.000000 0.218000 ( 0.218000) <---- MINE!!!
index 0.500000 0.000000 0.500000 ( 0.547000)
grep 0.500000 0.000000 0.500000 ( 0.516000)
scan 1.078000 0.000000 1.078000 ( 1.172000)
inject 0.656000 0.000000 0.656000 ( 0.672000)
gsub 1.250000 0.000000 1.250000 ( 1.250000)
include 0.672000 0.000000 0.672000 ( 0.687000)

Zach

Zach Dennis · Oct 17, 2005

Oh wait... I just saw Ryan's other email... he didn't forget and I'm not
necessarily the fastest depending on the route you go. =)

Zach

Randomly remove elements from array	1	Aug 30, 2024
I want to overwrite(=update/correction elements) elements from the first row to the third row.	1	Jul 9, 2023
How to fetch and console.log all items from an associative array	3	Apr 17, 2024
How to get event.target.id ?	3	Jan 8, 2023
Converting an Array to a String in JavaScript	7	Sep 22, 2023
Javascript fill function data from multidimensonal array	0	Dec 11, 2022
Blue J Ciphertext Program	2	Nov 22, 2023
How to play corresponding sound?	2	Jun 10, 2023

How to get non-unique elements from an array?

Mark Van Holstyn

Martin DeMello

Relm

pauldacus

Simon Strandgaard

pauldacus

Ryan Leavengood

Simon Strandgaard

Jeff Wood

pauldacus

Sam Kong

Ryan Leavengood

pauldacus

Mark Van Holstyn

Ryan Leavengood

Ryan Leavengood

pauldacus

Lyndon Samson

Zach Dennis

Zach Dennis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads