How to check variables for uniqueness ?

Discussion in 'Java' started by krislioe@gmail.com, Dec 21, 2006.

  1. Guest

    Hi all,

    I have eight variables : var1, var2... var 8. All types String.
    How to check that each variables has unique values ?

    Thank you for your help,
    xtanto
    , Dec 21, 2006
    #1
    1. Advertising

  2. wrote:
    ....
    > I have eight variables : var1, var2... var 8. All types String.
    > How to check that each variables has unique values ?


    One way would be to create a Map, iterate the
    var's and if not present in the map, add the value
    as a key, else return false.

    Andrew T.
    Andrew Thompson, Dec 21, 2006
    #2
    1. Advertising

  3. Andrew Thompson wrote:
    > wrote:
    > ...
    >> I have eight variables : var1, var2... var 8. All types String.
    >> How to check that each variables has unique values ?

    >
    > One way would be to create a Map, iterate the
    > var's and if not present in the map, add the value
    > as a key, else return false.
    >
    > Andrew T.
    >


    Any particular reason for Map, rather than Set?

    Note that the result of a Set add call is true if, and only if, the
    value is not already in the Set.

    Patricia
    Patricia Shanahan, Dec 21, 2006
    #3
  4. Patricia Shanahan wrote:
    > Andrew Thompson wrote:
    > > wrote:
    > > ...
    > >> I have eight variables : var1, var2... var 8. All types String.
    > >> How to check that each variables has unique values ?

    > >
    > > One way would be to create a Map, iterate the
    > > var's and if not present in the map, add the value
    > > as a key, else return false.

    ....
    > Any particular reason for Map, rather than Set?


    You mean besides, 'lack of enough consultation
    of the relevant docs.'? ;-)

    > Note that the result of a Set add call is true if, and only if, the
    > value is not already in the Set.


    A Set sounds the go - it is just right for this task.

    Andrew T.
    Andrew Thompson, Dec 21, 2006
    #4
  5. Andrew Thompson wrote:
    > Patricia Shanahan wrote:
    >
    >>Andrew Thompson wrote:
    >>
    >>> wrote:
    >>>...
    >>>
    >>>>I have eight variables : var1, var2... var 8. All types String.
    >>>>How to check that each variables has unique values ?
    >>>
    >>>One way would be to create a Map, iterate the
    >>>var's and if not present in the map, add the value
    >>>as a key, else return false.

    >
    > ...
    >
    >>Any particular reason for Map, rather than Set?

    >
    >
    > You mean besides, 'lack of enough consultation
    > of the relevant docs.'? ;-)
    >
    >
    >>Note that the result of a Set add call is true if, and only if, the
    >>value is not already in the Set.

    >
    >
    > A Set sounds the go - it is just right for this task.


    HashSet<String> foo = new HashSet<String>();
    foo.add(var1);
    foo.add(var2);
    foo.add(var3);
    foo.add(var4);
    foo.add(var5);
    foo.add(var6);
    foo.add(var7);
    foo.add(var8);
    if (foo.size() < 8)
    duplicateExists();
    else
    duplicateDoesNotExist();

    If you actually need to identify the specific duplicate pairs, you need
    to compare them one by one -- 1 with all the others, 2 with all the
    higher-numbered ones, and so on up to 7 and 8, using equals().

    If you want case insensitivity, use e.g.

    foo.add(var3.toLowerCase());

    or equalsIgnoreCase().
    John Ersatznom, Dec 21, 2006
    #5
  6. John Ersatznom wrote:
    > Andrew Thompson wrote:
    >> Patricia Shanahan wrote:
    >>
    >>> Andrew Thompson wrote:
    >>>
    >>>> wrote:
    >>>> ...
    >>>>
    >>>>> I have eight variables : var1, var2... var 8. All types String.
    >>>>> How to check that each variables has unique values ?
    >>>>
    >>>> One way would be to create a Map, iterate the
    >>>> var's and if not present in the map, add the value
    >>>> as a key, else return false.

    >>
    >> ...
    >>
    >>> Any particular reason for Map, rather than Set?

    >>
    >>
    >> You mean besides, 'lack of enough consultation
    >> of the relevant docs.'? ;-)
    >>
    >>
    >>> Note that the result of a Set add call is true if, and only if, the
    >>> value is not already in the Set.

    >>
    >>
    >> A Set sounds the go - it is just right for this task.

    >
    > HashSet<String> foo = new HashSet<String>();
    > foo.add(var1);
    > foo.add(var2);
    > foo.add(var3);
    > foo.add(var4);
    > foo.add(var5);
    > foo.add(var6);
    > foo.add(var7);
    > foo.add(var8);
    > if (foo.size() < 8)
    > duplicateExists();
    > else
    > duplicateDoesNotExist();
    >
    > If you actually need to identify the specific duplicate pairs, you need
    > to compare them one by one -- 1 with all the others, 2 with all the
    > higher-numbered ones, and so on up to 7 and 8, using equals().


    To save repititious writing, I'm going to assume the strings are in an
    array. The equivalent of your code would be:

    HashSet<String> foo = new HashSet<String>();
    for(String v:vars){
    foo.add(v);
    }
    if (foo.size() < vars.length)
    duplicateExists();
    else
    duplicateDoesNotExist();

    You can simplify finding specific duplicates by checking the foo.add
    results:

    HashSet<String> foo = new HashSet<String>();
    for(int i=0; i<vars.length; i++){
    if(!foo.add(vars){
    for(int j=0; j<i; j++){
    if(vars.equals(vars[j])){
    reportDuplicate(i,j);
    }
    }
    }
    }

    A true result from foo.add means the string was actually added to the
    set, so it has no duplicate with a lower index.

    Patricia
    Patricia Shanahan, Dec 21, 2006
    #6
  7. Ed Kirwan Guest

    Patricia Shanahan wrote:

    >
    > You can simplify finding specific duplicates by checking the foo.add
    > results:
    >
    > HashSet<String> foo = new HashSet<String>();
    > for(int i=0; i<vars.length; i++){
    > if(!foo.add(vars){
    > for(int j=0; j<i; j++){
    > if(vars.equals(vars[j])){
    > reportDuplicate(i,j);
    > }
    > }
    > }
    > }
    >
    > A true result from foo.add means the string was actually added to the
    > set, so it has no duplicate with a lower index.
    >
    > Patricia


    Perhaps using a List would obviate the need for the nest loop?

    List list = new ArrayList();
    for (int i = 0, n = vars.length; i < n; i++) {
    int duplicateIndex = list.indexOf(vars);
    if (duplicateIndex != -1) {
    reportDuplicate(i, duplicateIndex);
    } else {
    list.add(vars);
    }
    }

    ..ed

    --
    www.EdmundKirwan.com - Home of The Fractal Class Composition.

    Download Fractality, free Java code analyzer:
    www.EdmundKirwan.com/servlet/fractal/frac-page130.html
    Ed Kirwan, Dec 21, 2006
    #7

  8. > Perhaps using a List would obviate the need for the nest loop?
    >
    > List list = new ArrayList();
    > for (int i = 0, n = vars.length; i < n; i++) {
    > int duplicateIndex = list.indexOf(vars);
    > if (duplicateIndex != -1) {
    > reportDuplicate(i, duplicateIndex);
    > } else {
    > list.add(vars);
    > }
    > }
    >
    > .ed


    The nested loop is only needed to allow reporting of a specific duplicate
    pair. I cannot think of many practical examples where that is required
    rather than simply reporting that the element to be added is a duplicate. If
    it is required then I'd say you're right, using a List does result is
    slightly more readable code.

    That said, if the collection must not contain duplicate elements then at
    least from a design and correctness perspective you should use a Set. I'd
    personally do so even if that decision would result in a few extra lines of
    code here and there.

    Remon
    Remon van Vliet, Dec 21, 2006
    #8
  9. Oliver Wong Guest

    "John Ersatznom" <> wrote in message
    news:emd9s6$cns$...
    >
    > If you want case insensitivity, use e.g.
    >
    > foo.add(var3.toLowerCase());


    This might not actually work, because of the fickleness of certain human
    languages.

    >
    > or equalsIgnoreCase().


    Yeah, I'd essentially wrap the String in a custom class which overrides
    equals to call equalsIgnoreCase, and give that to the Set.

    - Oliver
    Oliver Wong, Dec 21, 2006
    #9
  10. Hemal Pandya Guest

    Ed Kirwan wrote:
    > Patricia Shanahan wrote:

    [...]
    > Perhaps using a List would obviate the need for the nest loop?


    It will, but will be a lot more expensive. Use can use a
    Map<String,Integer> to both avoid nested loop and report indexes. Yes,
    it will take more memory.

    [....]
    Hemal Pandya, Dec 22, 2006
    #10
  11. Hemal Pandya wrote:
    > Ed Kirwan wrote:
    >> Patricia Shanahan wrote:

    > [...]
    >> Perhaps using a List would obviate the need for the nest loop?


    Note that I did NOT write that.

    >
    > It will, but will be a lot more expensive. Use can use a
    > Map<String,Integer> to both avoid nested loop and report indexes. Yes,
    > it will take more memory.
    >
    > [....]
    >
    Patricia Shanahan, Dec 22, 2006
    #11
  12. Hemal Pandya Guest

    Patricia Shanahan wrote:
    [....]
    > Note that I did NOT write that.


    No, you did not. Your lines would have had one more '>' at the
    beginning-of-line. I apologize if I caused confusion.
    Hemal Pandya, Dec 22, 2006
    #12
  13. Oliver Wong wrote:
    > "John Ersatznom" <> wrote in message
    > news:emd9s6$cns$...
    >
    >>If you want case insensitivity, use e.g.
    >>
    >>foo.add(var3.toLowerCase());

    >
    > This might not actually work, because of the fickleness of certain human
    > languages.


    ?

    > Yeah, I'd essentially wrap the String in a custom class which overrides
    > equals to call equalsIgnoreCase, and give that to the Set.


    What is obviously missing from java.util is an Equalizer:

    public interface Equalizer<T> {
    public boolean areEqual (T foo, T bar);
    public boolean getHash (T foo);
    }

    and the ability to pass these to collection constructors to use, the way
    those that use order comparison can already be handed a custom comparator.

    Problems caused by comparators not consitent with an object's equals
    method could be avoided by supplying an Equalizer that is consistent
    with the comparator, as well as it obviating the need you perceive to
    wrap the String class. (Either way, by the way, you need to replace
    hashCode() with a case-insensitive version too, or you'll have strings
    that compare equal and have different hash codes, at least potentially.
    That at least can't happen if you use add(var.toFooCase()) or similar.)
    John Ersatznom, Dec 22, 2006
    #13
  14. Oliver Wong Guest

    "John Ersatznom" <> wrote in message
    news:emg90r$ruf$...
    > Oliver Wong wrote:
    >> "John Ersatznom" <> wrote in message
    >> news:emd9s6$cns$...
    >>
    >>>If you want case insensitivity, use e.g.
    >>>
    >>>foo.add(var3.toLowerCase());

    >>
    >> This might not actually work, because of the fickleness of certain
    >> human languages.

    >
    > ?


    I'm not a linguist, so this may be linguistically incorrect, but it
    illustrates the type of problems you can run into:

    assert locale is German; //pseudcode
    assert "BEISSEN".toLowerCase().equals("beissen");
    assert "BEISSEN".toLowerCase().equals("beißen");

    - Oliver
    Oliver Wong, Dec 22, 2006
    #14
  15. Oliver Wong wrote:
    > "John Ersatznom" <> wrote in message
    > news:emg90r$ruf$...
    >
    >>Oliver Wong wrote:
    >>
    >>>"John Ersatznom" <> wrote in message
    >>>news:emd9s6$cns$...
    >>>
    >>>
    >>>>If you want case insensitivity, use e.g.
    >>>>
    >>>>foo.add(var3.toLowerCase());
    >>>
    >>> This might not actually work, because of the fickleness of certain
    >>>human languages.

    >>
    >>?

    >
    >
    > I'm not a linguist, so this may be linguistically incorrect, but it
    > illustrates the type of problems you can run into:
    >
    > assert locale is German; //pseudcode
    > assert "BEISSEN".toLowerCase().equals("beissen");
    > assert "BEISSEN".toLowerCase().equals("beißen");


    Yeah, and assert "Color".toLowerCase().equals("Colour".toLowerCase()).
    Whenever there's multiple legitimate spellings for the same word,
    there's going to be trouble if you try to make the computer "smart
    enough" to treat them as equal.

    Mind you, there ARE lexicographical "distance" measures that are useful
    for "fuzzy-matching", such as spell-checker "suggestions" use. (Google
    now suggests an alternate if it thinks you've misspelled a query term,
    for example.) But you can't use those as an equality test, since they
    don't define an equivalence relation -- they aren't transitive, since
    you can have a.isCloseTo(b), a.isCloseTo(c), and !b.isCloseTo(c) (e.g.
    where the distance is 1 from c to a, 1 from a to b, and 2 from c to b,
    and 1 is the threshold). Even a threshold of 1 is too high if the result
    is not only to equate "color" with "colour" but also with "colon". :)

    Best to treat distinct spellings as distinct, and perhaps use a
    fuzzy-match "suggested alternative" if users enter a query with no
    results, e.g. if a search for "beissen" comes up empty.

    Of course, if you really want to drive yourself mad, try to program the
    computer to identify when two different input strings identify the same
    thing in general. Good luck having it compare e.g. "Carrie-Anne Moss"
    and "Lead actress in The Matrix" as equal. Sure, go ahead, you'll even
    solve the NLP while you're at it so you should become rich and famous.
    If you succeed. :)

    Of course, all this arose in the context of "foo.equalsIgnoreCase(bar)"
    vs. "foo.toLowerCase().equals(bar.toLowerCase())". Those *should* be
    equal; both should be transforming words into a canonical
    representation. Or else there should be another toFoo() method that
    returns a canonical representation that compares equal for words that
    compare equalsIgnoreCase, because the usefulness of having such a
    representation to use as a key in a hashmap is obvious.
    John Ersatznom, Dec 23, 2006
    #15
  16. Oliver Wong Guest

    "John Ersatznom" <> wrote in message
    news:emja3c$80k$...
    > Oliver Wong wrote:
    >> "John Ersatznom" <> wrote in message
    >> news:emg90r$ruf$...
    >>
    >>>Oliver Wong wrote:
    >>>
    >>>>"John Ersatznom" <> wrote in message
    >>>>news:emd9s6$cns$...
    >>>>
    >>>>
    >>>>>If you want case insensitivity, use e.g.
    >>>>>
    >>>>>foo.add(var3.toLowerCase());
    >>>>
    >>>> This might not actually work, because of the fickleness of certain
    >>>> human languages.
    >>>
    >>>?

    >>
    >>
    >> I'm not a linguist, so this may be linguistically incorrect, but it
    >> illustrates the type of problems you can run into:
    >>
    >> assert locale is German; //pseudcode
    >> assert "BEISSEN".toLowerCase().equals("beissen");
    >> assert "BEISSEN".toLowerCase().equals("beißen");

    >
    > Yeah, and assert "Color".toLowerCase().equals("Colour".toLowerCase()).


    {
    String originalA = "color";
    a = originalA; // "color"
    a = a.toUppercase(); // "COLOR"
    a = a.toLowercase(); // "color"
    assert a.equals(originalA);
    }
    {
    String originalA = "beißen";
    a = originalA; // "beißen"
    a = a.toUppercase(); // "BEISSEN"
    a = a.toLowercase(); // "beissen"
    assert a.equals(originalA);
    }

    - Oliver
    Oliver Wong, Dec 27, 2006
    #16
  17. Oliver Wong wrote:
    >>>assert locale is German; //pseudcode
    >>>assert "BEISSEN".toLowerCase().equals("beissen");
    >>>assert "BEISSEN".toLowerCase().equals("beißen");

    >>
    >>Yeah, and assert "Color".toLowerCase().equals("Colour".toLowerCase()).

    >
    >
    > {
    > String originalA = "color";
    > a = originalA; // "color"
    > a = a.toUppercase(); // "COLOR"
    > a = a.toLowercase(); // "color"
    > assert a.equals(originalA);
    > }


    I don't see "colour" (with a U) in there anywhere, Oliver.
    John Ersatznom, Dec 29, 2006
    #17
  18. Oliver Wong Guest

    "John Ersatznom" <> wrote in message
    news:en3as3$p8d$...
    > Oliver Wong wrote:
    >>>>assert locale is German; //pseudcode
    >>>>assert "BEISSEN".toLowerCase().equals("beissen");
    >>>>assert "BEISSEN".toLowerCase().equals("beißen");
    >>>
    >>>Yeah, and assert "Color".toLowerCase().equals("Colour".toLowerCase()).

    >>
    >>
    >> {
    >> String originalA = "color";
    >> a = originalA; // "color"
    >> a = a.toUppercase(); // "COLOR"
    >> a = a.toLowercase(); // "color"
    >> assert a.equals(originalA);
    >> }

    >
    > I don't see "colour" (with a U) in there anywhere, Oliver.


    You weren't intended to.

    - Oliver
    Oliver Wong, Dec 29, 2006
    #18
  19. Ed Guest

    Hemal Pandya skrev:

    > Ed Kirwan wrote:
    > > Patricia Shanahan wrote:

    > [...]
    > > Perhaps using a List would obviate the need for the nest loop?

    >
    > It will, but will be a lot more expensive.
    > [....]


    Thanks for that tip, Hemal. I had no idea that Set-implementations were
    so much more efficient (in this case) than List-implementations. The
    output from the (no-doubt indent-mashed) code below gives:

    522393 duplicated words. Using java.util.HashSet, time = 678ms.
    522393 duplicated words. Using java.util.TreeSet, time = 1812ms.
    522393 duplicated words. Using java.util.ArrayList, time = 157724ms.
    522393 duplicated words. Using java.util.LinkedList, time = 251739ms.


    import java.util.*;
    import java.io.*;

    class Test {
    private static String TEXT_BOOK_NAME = "war-and-peace.txt";

    public static void main(String[] args) {
    try {
    String text = readText(); // Read text into RAM
    countDuplicateWords(text, new HashSet());
    countDuplicateWords(text, new TreeSet());
    countDuplicateWords(text, new ArrayList());
    countDuplicateWords(text, new LinkedList());
    } catch (Throwable t) {
    System.out.println(t.toString());
    }
    }

    private static String readText() throws Throwable {
    BufferedReader reader =
    new BufferedReader(new FileReader(TEXT_BOOK_NAME));
    String line = null;
    StringBuffer text = new StringBuffer();
    while ((line = reader.readLine()) != null) {
    text.append(line + " ");
    }
    return text.toString();
    }

    private static void countDuplicateWords(String text,
    Collection listOfWords) {
    int numDuplicatedWords = 0;
    long startTime = System.currentTimeMillis();
    for (StringTokenizer i = new StringTokenizer(text);
    i.hasMoreElements();) {
    String word = i.nextToken();
    if (listOfWords.contains(word)) {
    numDuplicatedWords++;
    } else {
    listOfWords.add(word);
    }
    }
    long endTime = System.currentTimeMillis();
    System.out.println(numDuplicatedWords + " duplicated words. " +
    "Using " + listOfWords.getClass().getName() +
    ", time = " + (endTime - startTime) + "ms.");
    }
    }



    ..ed

    --

    www.EdmundKirwan.com - Home of The Fractal Class Composition
    Ed, Dec 30, 2006
    #19
  20. Lew Guest

    Ed wrote:
    > Hemal Pandya skrev:
    >
    >> Ed Kirwan wrote:
    >>> Patricia Shanahan wrote:

    >> [...]
    >>> Perhaps using a List would obviate the need for the nest loop?

    >> It will, but will be a lot more expensive.
    >> [....]

    >
    > Thanks for that tip, Hemal. I had no idea that Set-implementations were
    > so much more efficient (in this case) than List-implementations. The
    > output from the (no-doubt indent-mashed) code below gives:
    >
    > 522393 duplicated words. Using java.util.HashSet, time = 678ms.
    > 522393 duplicated words. Using java.util.TreeSet, time = 1812ms.
    > 522393 duplicated words. Using java.util.ArrayList, time = 157724ms.
    > 522393 duplicated words. Using java.util.LinkedList, time = 251739ms.
    >
    >
    > import java.util.*;
    > import java.io.*;
    >
    > class Test {
    > private static String TEXT_BOOK_NAME = "war-and-peace.txt";
    >
    > public static void main(String[] args) {
    > try {
    > String text = readText(); // Read text into RAM
    > countDuplicateWords(text, new HashSet());
    > countDuplicateWords(text, new TreeSet());
    > countDuplicateWords(text, new ArrayList());
    > countDuplicateWords(text, new LinkedList());
    > } catch (Throwable t) {
    > System.out.println(t.toString());
    > }
    > }
    >
    > private static String readText() throws Throwable {
    > BufferedReader reader =
    > new BufferedReader(new FileReader(TEXT_BOOK_NAME));
    > String line = null;
    > StringBuffer text = new StringBuffer();
    > while ((line = reader.readLine()) != null) {
    > text.append(line + " ");
    > }
    > return text.toString();
    > }
    >
    > private static void countDuplicateWords(String text,
    > Collection listOfWords) {
    > int numDuplicatedWords = 0;
    > long startTime = System.currentTimeMillis();
    > for (StringTokenizer i = new StringTokenizer(text);
    > i.hasMoreElements();) {
    > String word = i.nextToken();
    > if (listOfWords.contains(word)) {
    > numDuplicatedWords++;
    > } else {
    > listOfWords.add(word);
    > }
    > }
    > long endTime = System.currentTimeMillis();
    > System.out.println(numDuplicatedWords + " duplicated words. " +
    > "Using " + listOfWords.getClass().getName() +
    > ", time = " + (endTime - startTime) + "ms.");
    > }
    > }


    (Please do not embed TAB characters in newsgroup postings.)

    You could use a HashMap if you wanted to know how many times each word occurred:

    Map< String, Integer > concordance = new HashMap< String, Integer > ();
    for ( StringTokenizer tok = new StringTokenizer(text);
    tok.hasMoreElements(); )
    {
    String word = tok.nextToken();
    Integer kt = concordance.get( word );
    if ( kt == null )
    {
    concordance.put( word, Integer.valueOf( 0 ));
    }
    else
    {
    concordance.put( word, Integer.valueOf( kt.intValue() + 1 ));
    }
    }

    then get total dupes by analyzing the concordance:

    int totalDupes = 0;
    for ( Map.Entry< String, Integer > entry : concordance.entrySet() )
    {
    if ( entry.getValue().intValue() > 1 )
    {
    ++totalDupes;
    }
    }

    - Lew
    Lew, Dec 30, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ognen Ivanovski
    Replies:
    0
    Views:
    942
    Ognen Ivanovski
    Jul 15, 2003
  2. Don Bate
    Replies:
    0
    Views:
    397
    Don Bate
    Jul 22, 2003
  3. Olaf Meyer

    XML schema uniqueness constraints

    Olaf Meyer, Jan 15, 2004, in forum: XML
    Replies:
    0
    Views:
    497
    Olaf Meyer
    Jan 15, 2004
  4. Adam Gardner
    Replies:
    5
    Views:
    143
    Sebastian Hungerecker
    Nov 19, 2008
  5. Kasp
    Replies:
    5
    Views:
    149
    Andreas Kahari
    Nov 13, 2003
Loading...

Share This Page