Best way to check if all elements in a List are unique

L

laredotornado

Hi,

I"m using Java 1.5. Given a java.util.List that I know to have at
least one element, what is the best way to check that all elements in
the list are unique ?

Thanks, - Dave
 
L

Lew

laredotornado said:
I"m using Java 1.5. Given a java.util.List that I know to have at
least one element, what is the best way to check that all elements in
the list are unique ?

That depends largely on what you mean by "best", and somewhat on what you mean
by "check".

If the List contains exactly one element, then you already know that it's
unique, so the real problem doesn't come about until it has at least two elements.

One way is to dump the List into a Set as you iterate through the List, and
for each element see if it's already in the Set. (This is indicated by the
Set#add() returning false.)

If all you need to do is ensure that each element is unique, you can dump the
contents into a Set and use the Set. (The Set implementations with which I'm
familiar have constructors that take a Collection as the argument.)

If you have a choice about whether to use a List in the first place, just
don't, and use a Set instead.

Whether any of these constitute "best" I cannot tell from your question.
 
M

Mike Schilling

laredotornado said:
Hi,

I"m using Java 1.5. Given a java.util.List that I know to have at
least one element, what is the best way to check that all elements in
the list are unique ?

Thanks, - Dave

boolean areListElementsUnique(List<?> l)
{
return l.size() == new HashSet<Object>(l).size();
}
 
J

John B. Matthews

laredotornado said:
I'm using Java 1.5. Given a java.util.List that I know to have at
least one element, what is the best way to check that all elements in
the list are unique ?

You should be able to construct a Set, say TreeSet, of the List elements
and see if the sizes match.
 
L

Lew

Mike said:
boolean areListElementsUnique(List<?> l)
{
return l.size() == new HashSet<Object>(l).size();
}

Sweet.

You could also generify that:

<T> boolean areListElementsUnique( List<T> list )
{
if (list == null || list.size() == 1 )
{
return true;
}
return list.size() == new HashSet <T> (list).size();
}

In this case the generics add nothing except a sense for the programmer that
they're being clever. The 'null' check is important, I swan.
 
T

Tom Anderson

I"m using Java 1.5. Given a java.util.List that I know to have at least
one element, what is the best way to check that all elements in the list
are unique ?

List<T> l;
boolean allUnique = l.size() == new HashSet<T>(l).size();

tom
 
M

Mike Schilling

John said:
You should be able to construct a Set, say TreeSet, of the List
elements and see if the sizes match.

TreeSet requires the elements be comparable. HashSet will work for any
element type.
 
T

Tom Anderson

You should be able to construct a Set, say TreeSet, of the List elements
and see if the sizes match.

I am tickled that we've all thought of exactly the same solution! Are
there any other interesting ways to do this?

The only improvement, of sorts, i'd offer is:

boolean areAllUnique(List<T> l) {
Set<T> seen = new HashSet<T>();
for (T o: l) if (!seen.add(0)) return false;
return true;
}

Which doesn't need to examine any more elements of the list than it
absolutely has to to decide if the list is all unique. Except for zero-
and one-element lists, that is.

The way i'd do this in shell script is (not tested!):

l="space separated list"
[[ $(echo $l | wc -w) -eq $(echo $l | tr ' ' '\n' | sort -u | wc -w) ]]

Which suggests the following java:

boolean areAllUnique(List<T> l) {
l = new ArrayList<T>(l);
Collections.sort(l);
T prev = null;
for (T cur: l) {
if (cur.equals(prev)) return false;
prev = cur;
}
return true;
}

Which is pretty different, but still built around the idea of sorting and
then detecting duplicates by comparing adjacent elements, as sort -u does.

tom
 
J

Joshua Cranmer

if (cur.equals(prev)) return false;

What if there are null entries in the list?

This snippet also, of course, assumes that T is a comparable object (I
believe Collections.sort fails if there's no comparator and T can't be
cast to Comparable).

The HashSets also assume that you're checking equality via .equals (as
opposed to using ==), and that .hashCode() is correctly implemented,
i.e., consistent with equals.
 
M

Mike Schilling

Tom said:
boolean areAllUnique(List<T> l) {
l = new ArrayList<T>(l);
Collections.sort(l);
T prev = null;
for (T cur: l) {
if (cur.equals(prev)) return false;
prev = cur;
}
return true;
}

Which is pretty different, but still built around the idea of sorting
and then detecting duplicates by comparing adjacent elements, as sort
-u does.

This assumes the elements can be ordered, so it fails to be general for the
same reason that the TreeSort solution does.
 
R

Roedy Green

I"m using Java 1.5. Given a java.util.List that I know to have at
least one element, what is the best way to check that all elements in
the list are unique ?

http://mindprod.com/jgloss/hashset.html
--
Roedy Green Canadian Mind Products
http://mindprod.com

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.
~ Douglas Adams (born: 1952-03-11 died: 2001-05-11 at age: 49)
 
J

John B. Matthews

"Mike Schilling said:
TreeSet requires the elements be comparable. HashSet will work for
any element type.

Good points; but if the elements are Comparable (or admit a Comparator),
TreeMap won't require re-hashing. The Set interface promises "no
duplicate elements," but there are a number of implementations that I've
never used.

Dave: Any requirements that might suggest a preference?
 
M

Mike Schilling

John said:
Good points; but if the elements are Comparable (or admit a
Comparator), TreeMap won't require re-hashing.

But the HashSet shouldn't require expansion, since it knows the maximum
number of elements it will hold, while the TreeSet may require rebalancing.
And rehashing is more or less free if the objects precompute their hash
(like Strings) or use the default "identity" hash. I'd like to see numbers
before concluding that TreeMap is cheaper.
 
J

John B. Matthews

"Mike Schilling said:
But the HashSet shouldn't require expansion, since it knows the
maximum number of elements it will hold,

That makes sense. The API even promises "an initial capacity sufficient
to contain the elements in the specified collection."
while the TreeSet may require rebalancing. And rehashing is more or
less free if the objects precompute their hash (like Strings) or use
the default "identity" hash.

Or the OP may need some feature of another implementation, making it the
preferred choice.
I'd like to see numbers before concluding that TreeMap is cheaper.

I'd like to see a problem before resorting to that!
 
E

Eric Sosman

Hi,

I"m using Java 1.5. Given a java.util.List that I know to have at
least one element, what is the best way to check that all elements in
the list are unique ?

As others have said, it depends on what you mean by "best."
It *also* depends on what you mean by "unique!" For example,
how many "unique" elements are in

List<String> list = new ArrayList<String>();
for (int i = 0; i < 100; ++i)
list.add(new String("unique"));

? Both "one" and "one hundred" are reasonable answers.
 
J

John B. Matthews

Tom Anderson said:
The method already does that - if it's null, you get a
NullPointerException.

I thought the idea was to treat empty and unitary lists as unique, not
to mention skirting the size() of null.

Here's a primitive test:

import java.util.*;

public class SetTest {

private static final int MAX = 10000;

public static void main(String[] args) {
List<String> names = new ArrayList<String>();
List<Integer> numbers = new ArrayList<Integer>();
for (int i = 0; i < MAX; ++i) {
names.add(String.valueOf(i + MAX));
numbers.add(Integer.valueOf(i + MAX));
}
for (int i = 0; i < 3; i++) {
test(names);
test(numbers);
}
}

private static <T> void test(List<T> list) {
if (list == null) return;
int s1 = list.size();
long start = System.nanoTime();
int s2 = new HashSet<T>(list).size();
long stop = System.nanoTime();
System.out.println("HashSet: " + s1 + " " + s2
+ " " + (s1 == s2) + dt(stop - start));
start = System.nanoTime();
s2 = new TreeSet<T>(list).size();
stop = System.nanoTime();
System.out.println("TreeSet: " + s1 + " " + s2
+ " " + (s1 == s2) + dt(stop - start));
}

private static String dt(long nanos) {
return String.format("%7.3f ms.", nanos / 1000000d);
}
}
 
V

Volker Borchert

Lew said:
<T> boolean areListElementsUnique( List<T> list )
{
if (list == null || list.size() == 1 )
<=
{
return true;
}
return list.size() == new HashSet <T> (list).size();
}

And, I'd prefer to call list.size() only once and reuse the result,
for there might be lists out there whose size() is not O(1).

Anyway, the question sounds awfully like a homework assignment...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top