How to check variables for uniqueness ?

krislioe · Dec 21, 2006

Hi all,

I have eight variables : var1, var2... var 8. All types String.
How to check that each variables has unique values ?

Thank you for your help,
xtanto

Andrew Thompson · Dec 21, 2006

(e-mail address removed) wrote:
....

I have eight variables : var1, var2... var 8. All types String.
How to check that each variables has unique values ?

One way would be to create a Map, iterate the
var's and if not present in the map, add the value
as a key, else return false.

Andrew T.

Patricia Shanahan · Dec 21, 2006

Andrew said:
(e-mail address removed) wrote:
...

One way would be to create a Map, iterate the
var's and if not present in the map, add the value
as a key, else return false.

Andrew T.

Any particular reason for Map, rather than Set?

Note that the result of a Set add call is true if, and only if, the
value is not already in the Set.

Patricia

Andrew Thompson · Dec 21, 2006

Patricia said:
....
Any particular reason for Map, rather than Set?

You mean besides, 'lack of enough consultation
of the relevant docs.'? ;-)

Note that the result of a Set add call is true if, and only if, the
value is not already in the Set.

A Set sounds the go - it is just right for this task.

Andrew T.

John Ersatznom · Dec 21, 2006

Andrew said:
You mean besides, 'lack of enough consultation
of the relevant docs.'? ;-)

A Set sounds the go - it is just right for this task.

HashSet<String> foo = new HashSet<String>();
foo.add(var1);
foo.add(var2);
foo.add(var3);
foo.add(var4);
foo.add(var5);
foo.add(var6);
foo.add(var7);
foo.add(var8);
if (foo.size() < 8)
duplicateExists();
else
duplicateDoesNotExist();

If you actually need to identify the specific duplicate pairs, you need
to compare them one by one -- 1 with all the others, 2 with all the
higher-numbered ones, and so on up to 7 and 8, using equals().

If you want case insensitivity, use e.g.

foo.add(var3.toLowerCase());

or equalsIgnoreCase().

Patricia Shanahan · Dec 21, 2006

John said:
HashSet<String> foo = new HashSet<String>();
foo.add(var1);
foo.add(var2);
foo.add(var3);
foo.add(var4);
foo.add(var5);
foo.add(var6);
foo.add(var7);
foo.add(var8);
if (foo.size() < 8)
duplicateExists();
else
duplicateDoesNotExist();

If you actually need to identify the specific duplicate pairs, you need
to compare them one by one -- 1 with all the others, 2 with all the
higher-numbered ones, and so on up to 7 and 8, using equals().

To save repititious writing, I'm going to assume the strings are in an
array. The equivalent of your code would be:

HashSet<String> foo = new HashSet<String>();
for(String v:vars){
foo.add(v);
}
if (foo.size() < vars.length)
duplicateExists();
else
duplicateDoesNotExist();

You can simplify finding specific duplicates by checking the foo.add
results:

HashSet<String> foo = new HashSet<String>();
for(int i=0; i<vars.length; i++){
if(!foo.add(vars){
for(int j=0; j<i; j++){
if(vars.equals(vars[j])){
reportDuplicate(i,j);
}
}
}
}

A true result from foo.add means the string was actually added to the
set, so it has no duplicate with a lower index.

Patricia

Ed Kirwan · Dec 21, 2006

Patricia said:
You can simplify finding specific duplicates by checking the foo.add
results:

HashSet<String> foo = new HashSet<String>();
for(int i=0; i<vars.length; i++){
if(!foo.add(vars){
for(int j=0; j<i; j++){
if(vars.equals(vars[j])){
reportDuplicate(i,j);
}
}
}
}

A true result from foo.add means the string was actually added to the
set, so it has no duplicate with a lower index.

Patricia

Perhaps using a List would obviate the need for the nest loop?

List list = new ArrayList();
for (int i = 0, n = vars.length; i < n; i++) {
int duplicateIndex = list.indexOf(vars);
if (duplicateIndex != -1) {
reportDuplicate(i, duplicateIndex);
} else {
list.add(vars);
}
}

..ed

Remon van Vliet · Dec 21, 2006

Perhaps using a List would obviate the need for the nest loop?

List list = new ArrayList();
for (int i = 0, n = vars.length; i < n; i++) {
int duplicateIndex = list.indexOf(vars);
if (duplicateIndex != -1) {
reportDuplicate(i, duplicateIndex);
} else {
list.add(vars);
}
}

.ed

The nested loop is only needed to allow reporting of a specific duplicate
pair. I cannot think of many practical examples where that is required
rather than simply reporting that the element to be added is a duplicate. If
it is required then I'd say you're right, using a List does result is
slightly more readable code.

That said, if the collection must not contain duplicate elements then at
least from a design and correctness perspective you should use a Set. I'd
personally do so even if that decision would result in a few extra lines of
code here and there.

Remon

Oliver Wong · Dec 21, 2006

John Ersatznom said:
If you want case insensitivity, use e.g.

foo.add(var3.toLowerCase());

This might not actually work, because of the fickleness of certain human
languages.

or equalsIgnoreCase().

Yeah, I'd essentially wrap the String in a custom class which overrides
equals to call equalsIgnoreCase, and give that to the Set.

- Oliver

Hemal Pandya · Dec 22, 2006

Ed said:
Patricia Shanahan wrote: [...]
Perhaps using a List would obviate the need for the nest loop?

It will, but will be a lot more expensive. Use can use a
Map<String,Integer> to both avoid nested loop and report indexes. Yes,
it will take more memory.

[....]

Patricia Shanahan · Dec 22, 2006

Hemal said:
Ed said:

Patricia Shanahan wrote: [...]
Perhaps using a List would obviate the need for the nest loop?

Click to expand...

Note that I did NOT write that.

It will, but will be a lot more expensive. Use can use a
Map<String,Integer> to both avoid nested loop and report indexes. Yes,
it will take more memory.

[....]

Hemal Pandya · Dec 22, 2006

Patricia Shanahan wrote:
[....]

Note that I did NOT write that.

No, you did not. Your lines would have had one more '>' at the
beginning-of-line. I apologize if I caused confusion.

John Ersatznom · Dec 22, 2006

Oliver said:
This might not actually work, because of the fickleness of certain human
languages.
?

Yeah, I'd essentially wrap the String in a custom class which overrides
equals to call equalsIgnoreCase, and give that to the Set.

What is obviously missing from java.util is an Equalizer:

public interface Equalizer<T> {
public boolean areEqual (T foo, T bar);
public boolean getHash (T foo);
}

and the ability to pass these to collection constructors to use, the way
those that use order comparison can already be handed a custom comparator.

Problems caused by comparators not consitent with an object's equals
method could be avoided by supplying an Equalizer that is consistent
with the comparator, as well as it obviating the need you perceive to
wrap the String class. (Either way, by the way, you need to replace
hashCode() with a case-insensitive version too, or you'll have strings
that compare equal and have different hash codes, at least potentially.
That at least can't happen if you use add(var.toFooCase()) or similar.)

Oliver Wong · Dec 22, 2006

John Ersatznom said:
?

I'm not a linguist, so this may be linguistically incorrect, but it
illustrates the type of problems you can run into:

assert locale is German; //pseudcode
assert "BEISSEN".toLowerCase().equals("beissen");
assert "BEISSEN".toLowerCase().equals("beißen");

- Oliver

John Ersatznom · Dec 23, 2006

Oliver said:
I'm not a linguist, so this may be linguistically incorrect, but it
illustrates the type of problems you can run into:

assert locale is German; //pseudcode
assert "BEISSEN".toLowerCase().equals("beissen");
assert "BEISSEN".toLowerCase().equals("beißen");

Yeah, and assert "Color".toLowerCase().equals("Colour".toLowerCase()).
Whenever there's multiple legitimate spellings for the same word,
there's going to be trouble if you try to make the computer "smart
enough" to treat them as equal.

Mind you, there ARE lexicographical "distance" measures that are useful
for "fuzzy-matching", such as spell-checker "suggestions" use. (Google
now suggests an alternate if it thinks you've misspelled a query term,
for example.) But you can't use those as an equality test, since they
don't define an equivalence relation -- they aren't transitive, since
you can have a.isCloseTo(b), a.isCloseTo(c), and !b.isCloseTo(c) (e.g.
where the distance is 1 from c to a, 1 from a to b, and 2 from c to b,
and 1 is the threshold). Even a threshold of 1 is too high if the result
is not only to equate "color" with "colour" but also with "colon".

Best to treat distinct spellings as distinct, and perhaps use a
fuzzy-match "suggested alternative" if users enter a query with no
results, e.g. if a search for "beissen" comes up empty.

Of course, if you really want to drive yourself mad, try to program the
computer to identify when two different input strings identify the same
thing in general. Good luck having it compare e.g. "Carrie-Anne Moss"
and "Lead actress in The Matrix" as equal. Sure, go ahead, you'll even
solve the NLP while you're at it so you should become rich and famous.
If you succeed.

Of course, all this arose in the context of "foo.equalsIgnoreCase(bar)"
vs. "foo.toLowerCase().equals(bar.toLowerCase())". Those *should* be
equal; both should be transforming words into a canonical
representation. Or else there should be another toFoo() method that
returns a canonical representation that compares equal for words that
compare equalsIgnoreCase, because the usefulness of having such a
representation to use as a key in a hashmap is obvious.

Oliver Wong · Dec 27, 2006

John Ersatznom said:
Yeah, and assert "Color".toLowerCase().equals("Colour".toLowerCase()).

{
String originalA = "color";
a = originalA; // "color"
a = a.toUppercase(); // "COLOR"
a = a.toLowercase(); // "color"
assert a.equals(originalA);
}
{
String originalA = "beißen";
a = originalA; // "beißen"
a = a.toUppercase(); // "BEISSEN"
a = a.toLowercase(); // "beissen"
assert a.equals(originalA);
}

- Oliver

John Ersatznom · Dec 29, 2006

Oliver said:
{
String originalA = "color";
a = originalA; // "color"
a = a.toUppercase(); // "COLOR"
a = a.toLowercase(); // "color"
assert a.equals(originalA);
}

I don't see "colour" (with a U) in there anywhere, Oliver.

Oliver Wong · Dec 29, 2006

John Ersatznom said:
I don't see "colour" (with a U) in there anywhere, Oliver.

You weren't intended to.

- Oliver

Ed · Dec 30, 2006

Hemal Pandya skrev:

Ed said:
Ed said:

Patricia Shanahan wrote: [...]
Perhaps using a List would obviate the need for the nest loop?

Click to expand...

It will, but will be a lot more expensive.
[....]

Thanks for that tip, Hemal. I had no idea that Set-implementations were
so much more efficient (in this case) than List-implementations. The
output from the (no-doubt indent-mashed) code below gives:

522393 duplicated words. Using java.util.HashSet, time = 678ms.
522393 duplicated words. Using java.util.TreeSet, time = 1812ms.
522393 duplicated words. Using java.util.ArrayList, time = 157724ms.
522393 duplicated words. Using java.util.LinkedList, time = 251739ms.

import java.util.*;
import java.io.*;

class Test {
private static String TEXT_BOOK_NAME = "war-and-peace.txt";

public static void main(String[] args) {
try {
String text = readText(); // Read text into RAM
countDuplicateWords(text, new HashSet());
countDuplicateWords(text, new TreeSet());
countDuplicateWords(text, new ArrayList());
countDuplicateWords(text, new LinkedList());
} catch (Throwable t) {
System.out.println(t.toString());
}
}

private static String readText() throws Throwable {
BufferedReader reader =
new BufferedReader(new FileReader(TEXT_BOOK_NAME));
String line = null;
StringBuffer text = new StringBuffer();
while ((line = reader.readLine()) != null) {
text.append(line + " ");
}
return text.toString();
}

private static void countDuplicateWords(String text,
Collection listOfWords) {
int numDuplicatedWords = 0;
long startTime = System.currentTimeMillis();
for (StringTokenizer i = new StringTokenizer(text);
i.hasMoreElements()

{
String word = i.nextToken();
if (listOfWords.contains(word)) {
numDuplicatedWords++;
} else {
listOfWords.add(word);
}
}
long endTime = System.currentTimeMillis();
System.out.println(numDuplicatedWords + " duplicated words. " +
"Using " + listOfWords.getClass().getName() +
", time = " + (endTime - startTime) + "ms.");
}
}

..ed

Lew · Dec 30, 2006

Ed said:
Hemal Pandya skrev:

Ed said:

Patricia Shanahan wrote: [...]
Perhaps using a List would obviate the need for the nest loop?

Click to expand...

It will, but will be a lot more expensive.
[....]

Click to expand...

Thanks for that tip, Hemal. I had no idea that Set-implementations were
so much more efficient (in this case) than List-implementations. The
output from the (no-doubt indent-mashed) code below gives:

522393 duplicated words. Using java.util.HashSet, time = 678ms.
522393 duplicated words. Using java.util.TreeSet, time = 1812ms.
522393 duplicated words. Using java.util.ArrayList, time = 157724ms.
522393 duplicated words. Using java.util.LinkedList, time = 251739ms.

import java.util.*;
import java.io.*;

class Test {
private static String TEXT_BOOK_NAME = "war-and-peace.txt";

public static void main(String[] args) {
try {
String text = readText(); // Read text into RAM
countDuplicateWords(text, new HashSet());
countDuplicateWords(text, new TreeSet());
countDuplicateWords(text, new ArrayList());
countDuplicateWords(text, new LinkedList());
} catch (Throwable t) {
System.out.println(t.toString());
}
}

private static String readText() throws Throwable {
BufferedReader reader =
new BufferedReader(new FileReader(TEXT_BOOK_NAME));
String line = null;
StringBuffer text = new StringBuffer();
while ((line = reader.readLine()) != null) {
text.append(line + " ");
}
return text.toString();
}

private static void countDuplicateWords(String text,
Collection listOfWords) {
int numDuplicatedWords = 0;
long startTime = System.currentTimeMillis();
for (StringTokenizer i = new StringTokenizer(text);
i.hasMoreElements() {
String word = i.nextToken();
if (listOfWords.contains(word)) {
numDuplicatedWords++;
} else {
listOfWords.add(word);
}
}
long endTime = System.currentTimeMillis();
System.out.println(numDuplicatedWords + " duplicated words. " +
"Using " + listOfWords.getClass().getName() +
", time = " + (endTime - startTime) + "ms.");
}
}

(Please do not embed TAB characters in newsgroup postings.)

You could use a HashMap if you wanted to know how many times each word occurred:

Map< String, Integer > concordance = new HashMap< String, Integer > ();
for ( StringTokenizer tok = new StringTokenizer(text);
tok.hasMoreElements(); )
{
String word = tok.nextToken();
Integer kt = concordance.get( word );
if ( kt == null )
{
concordance.put( word, Integer.valueOf( 0 ));
}
else
{
concordance.put( word, Integer.valueOf( kt.intValue() + 1 ));
}
}

then get total dupes by analyzing the concordance:

int totalDupes = 0;
for ( Map.Entry< String, Integer > entry : concordance.entrySet() )
{
if ( entry.getValue().intValue() > 1 )
{
++totalDupes;
}
}

- Lew

Check forms With JavaScript	1	Mar 28, 2023
How to put a null check on this code	0	Jan 4, 2022
How to convert XML to XSLT & XSL-FO to be used by FOP ?	1	Mar 21, 2007
Trouble accessing a value within a JSON string.	1	Jun 16, 2023
How to check time delay caused by code itself?	1	Jul 20, 2022
ValueError - "Found input variables with inconsistent numbers of samples: [100, 120]"	1	Jul 27, 2023
Subclassing Hash to enforce value uniqueness ala key uniqueness.	5	Nov 18, 2008
How to treat an input data as variable?	4	Apr 13, 2023

How to check variables for uniqueness ?

krislioe

Andrew Thompson

Patricia Shanahan

Andrew Thompson

John Ersatznom

Patricia Shanahan

Ed Kirwan

Remon van Vliet

Oliver Wong

Hemal Pandya

Patricia Shanahan

Hemal Pandya

John Ersatznom

Oliver Wong

John Ersatznom

Oliver Wong

John Ersatznom

Oliver Wong

Ed

Lew

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads