How to check variables for uniqueness ?

K

krislioe

Hi all,

I have eight variables : var1, var2... var 8. All types String.
How to check that each variables has unique values ?

Thank you for your help,
xtanto
 
A

Andrew Thompson

(e-mail address removed) wrote:
....
I have eight variables : var1, var2... var 8. All types String.
How to check that each variables has unique values ?

One way would be to create a Map, iterate the
var's and if not present in the map, add the value
as a key, else return false.

Andrew T.
 
P

Patricia Shanahan

Andrew said:
(e-mail address removed) wrote:
...

One way would be to create a Map, iterate the
var's and if not present in the map, add the value
as a key, else return false.

Andrew T.

Any particular reason for Map, rather than Set?

Note that the result of a Set add call is true if, and only if, the
value is not already in the Set.

Patricia
 
A

Andrew Thompson

Patricia said:
....
Any particular reason for Map, rather than Set?

You mean besides, 'lack of enough consultation
of the relevant docs.'? ;-)
Note that the result of a Set add call is true if, and only if, the
value is not already in the Set.

A Set sounds the go - it is just right for this task.

Andrew T.
 
J

John Ersatznom

Andrew said:
You mean besides, 'lack of enough consultation
of the relevant docs.'? ;-)




A Set sounds the go - it is just right for this task.

HashSet<String> foo = new HashSet<String>();
foo.add(var1);
foo.add(var2);
foo.add(var3);
foo.add(var4);
foo.add(var5);
foo.add(var6);
foo.add(var7);
foo.add(var8);
if (foo.size() < 8)
duplicateExists();
else
duplicateDoesNotExist();

If you actually need to identify the specific duplicate pairs, you need
to compare them one by one -- 1 with all the others, 2 with all the
higher-numbered ones, and so on up to 7 and 8, using equals().

If you want case insensitivity, use e.g.

foo.add(var3.toLowerCase());

or equalsIgnoreCase().
 
P

Patricia Shanahan

John said:
HashSet<String> foo = new HashSet<String>();
foo.add(var1);
foo.add(var2);
foo.add(var3);
foo.add(var4);
foo.add(var5);
foo.add(var6);
foo.add(var7);
foo.add(var8);
if (foo.size() < 8)
duplicateExists();
else
duplicateDoesNotExist();

If you actually need to identify the specific duplicate pairs, you need
to compare them one by one -- 1 with all the others, 2 with all the
higher-numbered ones, and so on up to 7 and 8, using equals().

To save repititious writing, I'm going to assume the strings are in an
array. The equivalent of your code would be:

HashSet<String> foo = new HashSet<String>();
for(String v:vars){
foo.add(v);
}
if (foo.size() < vars.length)
duplicateExists();
else
duplicateDoesNotExist();

You can simplify finding specific duplicates by checking the foo.add
results:

HashSet<String> foo = new HashSet<String>();
for(int i=0; i<vars.length; i++){
if(!foo.add(vars){
for(int j=0; j<i; j++){
if(vars.equals(vars[j])){
reportDuplicate(i,j);
}
}
}
}

A true result from foo.add means the string was actually added to the
set, so it has no duplicate with a lower index.

Patricia
 
E

Ed Kirwan

Patricia said:
You can simplify finding specific duplicates by checking the foo.add
results:

HashSet<String> foo = new HashSet<String>();
for(int i=0; i<vars.length; i++){
if(!foo.add(vars){
for(int j=0; j<i; j++){
if(vars.equals(vars[j])){
reportDuplicate(i,j);
}
}
}
}

A true result from foo.add means the string was actually added to the
set, so it has no duplicate with a lower index.

Patricia


Perhaps using a List would obviate the need for the nest loop?

List list = new ArrayList();
for (int i = 0, n = vars.length; i < n; i++) {
int duplicateIndex = list.indexOf(vars);
if (duplicateIndex != -1) {
reportDuplicate(i, duplicateIndex);
} else {
list.add(vars);
}
}

..ed
 
R

Remon van Vliet

Perhaps using a List would obviate the need for the nest loop?

List list = new ArrayList();
for (int i = 0, n = vars.length; i < n; i++) {
int duplicateIndex = list.indexOf(vars);
if (duplicateIndex != -1) {
reportDuplicate(i, duplicateIndex);
} else {
list.add(vars);
}
}

.ed


The nested loop is only needed to allow reporting of a specific duplicate
pair. I cannot think of many practical examples where that is required
rather than simply reporting that the element to be added is a duplicate. If
it is required then I'd say you're right, using a List does result is
slightly more readable code.

That said, if the collection must not contain duplicate elements then at
least from a design and correctness perspective you should use a Set. I'd
personally do so even if that decision would result in a few extra lines of
code here and there.

Remon
 
O

Oliver Wong

John Ersatznom said:
If you want case insensitivity, use e.g.

foo.add(var3.toLowerCase());

This might not actually work, because of the fickleness of certain human
languages.
or equalsIgnoreCase().

Yeah, I'd essentially wrap the String in a custom class which overrides
equals to call equalsIgnoreCase, and give that to the Set.

- Oliver
 
H

Hemal Pandya

Ed said:
Patricia Shanahan wrote: [...]
Perhaps using a List would obviate the need for the nest loop?

It will, but will be a lot more expensive. Use can use a
Map<String,Integer> to both avoid nested loop and report indexes. Yes,
it will take more memory.

[....]
 
P

Patricia Shanahan

Hemal said:
Ed said:
Patricia Shanahan wrote: [...]
Perhaps using a List would obviate the need for the nest loop?

Note that I did NOT write that.
It will, but will be a lot more expensive. Use can use a
Map<String,Integer> to both avoid nested loop and report indexes. Yes,
it will take more memory.

[....]
 
H

Hemal Pandya

Patricia Shanahan wrote:
[....]
Note that I did NOT write that.

No, you did not. Your lines would have had one more '>' at the
beginning-of-line. I apologize if I caused confusion.
 
J

John Ersatznom

Oliver said:
This might not actually work, because of the fickleness of certain human
languages.
?

Yeah, I'd essentially wrap the String in a custom class which overrides
equals to call equalsIgnoreCase, and give that to the Set.

What is obviously missing from java.util is an Equalizer:

public interface Equalizer<T> {
public boolean areEqual (T foo, T bar);
public boolean getHash (T foo);
}

and the ability to pass these to collection constructors to use, the way
those that use order comparison can already be handed a custom comparator.

Problems caused by comparators not consitent with an object's equals
method could be avoided by supplying an Equalizer that is consistent
with the comparator, as well as it obviating the need you perceive to
wrap the String class. (Either way, by the way, you need to replace
hashCode() with a case-insensitive version too, or you'll have strings
that compare equal and have different hash codes, at least potentially.
That at least can't happen if you use add(var.toFooCase()) or similar.)
 
O

Oliver Wong

John Ersatznom said:

I'm not a linguist, so this may be linguistically incorrect, but it
illustrates the type of problems you can run into:

assert locale is German; //pseudcode
assert "BEISSEN".toLowerCase().equals("beissen");
assert "BEISSEN".toLowerCase().equals("beißen");

- Oliver
 
J

John Ersatznom

Oliver said:
I'm not a linguist, so this may be linguistically incorrect, but it
illustrates the type of problems you can run into:

assert locale is German; //pseudcode
assert "BEISSEN".toLowerCase().equals("beissen");
assert "BEISSEN".toLowerCase().equals("beißen");

Yeah, and assert "Color".toLowerCase().equals("Colour".toLowerCase()).
Whenever there's multiple legitimate spellings for the same word,
there's going to be trouble if you try to make the computer "smart
enough" to treat them as equal.

Mind you, there ARE lexicographical "distance" measures that are useful
for "fuzzy-matching", such as spell-checker "suggestions" use. (Google
now suggests an alternate if it thinks you've misspelled a query term,
for example.) But you can't use those as an equality test, since they
don't define an equivalence relation -- they aren't transitive, since
you can have a.isCloseTo(b), a.isCloseTo(c), and !b.isCloseTo(c) (e.g.
where the distance is 1 from c to a, 1 from a to b, and 2 from c to b,
and 1 is the threshold). Even a threshold of 1 is too high if the result
is not only to equate "color" with "colour" but also with "colon". :)

Best to treat distinct spellings as distinct, and perhaps use a
fuzzy-match "suggested alternative" if users enter a query with no
results, e.g. if a search for "beissen" comes up empty.

Of course, if you really want to drive yourself mad, try to program the
computer to identify when two different input strings identify the same
thing in general. Good luck having it compare e.g. "Carrie-Anne Moss"
and "Lead actress in The Matrix" as equal. Sure, go ahead, you'll even
solve the NLP while you're at it so you should become rich and famous.
If you succeed. :)

Of course, all this arose in the context of "foo.equalsIgnoreCase(bar)"
vs. "foo.toLowerCase().equals(bar.toLowerCase())". Those *should* be
equal; both should be transforming words into a canonical
representation. Or else there should be another toFoo() method that
returns a canonical representation that compares equal for words that
compare equalsIgnoreCase, because the usefulness of having such a
representation to use as a key in a hashmap is obvious.
 
O

Oliver Wong

John Ersatznom said:
Yeah, and assert "Color".toLowerCase().equals("Colour".toLowerCase()).

{
String originalA = "color";
a = originalA; // "color"
a = a.toUppercase(); // "COLOR"
a = a.toLowercase(); // "color"
assert a.equals(originalA);
}
{
String originalA = "beißen";
a = originalA; // "beißen"
a = a.toUppercase(); // "BEISSEN"
a = a.toLowercase(); // "beissen"
assert a.equals(originalA);
}

- Oliver
 
J

John Ersatznom

Oliver said:
{
String originalA = "color";
a = originalA; // "color"
a = a.toUppercase(); // "COLOR"
a = a.toLowercase(); // "color"
assert a.equals(originalA);
}

I don't see "colour" (with a U) in there anywhere, Oliver.
 
E

Ed

Hemal Pandya skrev:
Ed said:
Patricia Shanahan wrote: [...]
Perhaps using a List would obviate the need for the nest loop?

It will, but will be a lot more expensive.
[....]

Thanks for that tip, Hemal. I had no idea that Set-implementations were
so much more efficient (in this case) than List-implementations. The
output from the (no-doubt indent-mashed) code below gives:

522393 duplicated words. Using java.util.HashSet, time = 678ms.
522393 duplicated words. Using java.util.TreeSet, time = 1812ms.
522393 duplicated words. Using java.util.ArrayList, time = 157724ms.
522393 duplicated words. Using java.util.LinkedList, time = 251739ms.


import java.util.*;
import java.io.*;

class Test {
private static String TEXT_BOOK_NAME = "war-and-peace.txt";

public static void main(String[] args) {
try {
String text = readText(); // Read text into RAM
countDuplicateWords(text, new HashSet());
countDuplicateWords(text, new TreeSet());
countDuplicateWords(text, new ArrayList());
countDuplicateWords(text, new LinkedList());
} catch (Throwable t) {
System.out.println(t.toString());
}
}

private static String readText() throws Throwable {
BufferedReader reader =
new BufferedReader(new FileReader(TEXT_BOOK_NAME));
String line = null;
StringBuffer text = new StringBuffer();
while ((line = reader.readLine()) != null) {
text.append(line + " ");
}
return text.toString();
}

private static void countDuplicateWords(String text,
Collection listOfWords) {
int numDuplicatedWords = 0;
long startTime = System.currentTimeMillis();
for (StringTokenizer i = new StringTokenizer(text);
i.hasMoreElements();) {
String word = i.nextToken();
if (listOfWords.contains(word)) {
numDuplicatedWords++;
} else {
listOfWords.add(word);
}
}
long endTime = System.currentTimeMillis();
System.out.println(numDuplicatedWords + " duplicated words. " +
"Using " + listOfWords.getClass().getName() +
", time = " + (endTime - startTime) + "ms.");
}
}



..ed
 
L

Lew

Ed said:
Hemal Pandya skrev:
Ed said:
Patricia Shanahan wrote: [...]
Perhaps using a List would obviate the need for the nest loop?
It will, but will be a lot more expensive.
[....]

Thanks for that tip, Hemal. I had no idea that Set-implementations were
so much more efficient (in this case) than List-implementations. The
output from the (no-doubt indent-mashed) code below gives:

522393 duplicated words. Using java.util.HashSet, time = 678ms.
522393 duplicated words. Using java.util.TreeSet, time = 1812ms.
522393 duplicated words. Using java.util.ArrayList, time = 157724ms.
522393 duplicated words. Using java.util.LinkedList, time = 251739ms.


import java.util.*;
import java.io.*;

class Test {
private static String TEXT_BOOK_NAME = "war-and-peace.txt";

public static void main(String[] args) {
try {
String text = readText(); // Read text into RAM
countDuplicateWords(text, new HashSet());
countDuplicateWords(text, new TreeSet());
countDuplicateWords(text, new ArrayList());
countDuplicateWords(text, new LinkedList());
} catch (Throwable t) {
System.out.println(t.toString());
}
}

private static String readText() throws Throwable {
BufferedReader reader =
new BufferedReader(new FileReader(TEXT_BOOK_NAME));
String line = null;
StringBuffer text = new StringBuffer();
while ((line = reader.readLine()) != null) {
text.append(line + " ");
}
return text.toString();
}

private static void countDuplicateWords(String text,
Collection listOfWords) {
int numDuplicatedWords = 0;
long startTime = System.currentTimeMillis();
for (StringTokenizer i = new StringTokenizer(text);
i.hasMoreElements();) {
String word = i.nextToken();
if (listOfWords.contains(word)) {
numDuplicatedWords++;
} else {
listOfWords.add(word);
}
}
long endTime = System.currentTimeMillis();
System.out.println(numDuplicatedWords + " duplicated words. " +
"Using " + listOfWords.getClass().getName() +
", time = " + (endTime - startTime) + "ms.");
}
}

(Please do not embed TAB characters in newsgroup postings.)

You could use a HashMap if you wanted to know how many times each word occurred:

Map< String, Integer > concordance = new HashMap< String, Integer > ();
for ( StringTokenizer tok = new StringTokenizer(text);
tok.hasMoreElements(); )
{
String word = tok.nextToken();
Integer kt = concordance.get( word );
if ( kt == null )
{
concordance.put( word, Integer.valueOf( 0 ));
}
else
{
concordance.put( word, Integer.valueOf( kt.intValue() + 1 ));
}
}

then get total dupes by analyzing the concordance:

int totalDupes = 0;
for ( Map.Entry< String, Integer > entry : concordance.entrySet() )
{
if ( entry.getValue().intValue() > 1 )
{
++totalDupes;
}
}

- Lew
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top