Find duplicates

Discussion in 'Java' started by Stern, Oct 20, 2003.

  1. Stern

    Stern Guest

    Hello,

    I have a Vector of objects, each of them having a name. Now I need to
    find in this Vector objects with the same names, get the list of them
    and remove them from the vector. I don't know which names these are.
    Is there any quick solution for the problem?

    Thanx,

    Stern.
     
    Stern, Oct 20, 2003
    #1
    1. Advertising

  2. Hi,

    I cannot give you a quicksort for your problem but maybe you can use
    java.util.Set. A Set does not allow duplicate entries.

    Michael


    "Stern" <> schrieb im Newsbeitrag
    news:...
    > Hello,
    >
    > I have a Vector of objects, each of them having a name. Now I need to
    > find in this Vector objects with the same names, get the list of them
    > and remove them from the vector. I don't know which names these are.
    > Is there any quick solution for the problem?
    >
    > Thanx,
    >
    > Stern.
     
    Michael Meyer, Oct 20, 2003
    #2
    1. Advertising

  3. "Stern" <> wrote in message
    news:...
    > Hello,
    >
    > I have a Vector of objects, each of them having a name. Now I need to
    > find in this Vector objects with the same names, get the list of them
    > and remove them from the vector. I don't know which names these are.
    > Is there any quick solution for the problem?


    I don't think there's a quick way, unless your pre-index your vector so that
    you already have the objects indexed by name (e.g. a Map that maps a name to
    a list of those items that share that name.) Then you just drop the lists
    that have > 1 item.

    To do it on-the-fly, create the following data structures:
    1) a hash map "uniqueNames" that will map a string to an item for an item
    that has been seen for the first time.
    2) a set "multiNames" that will contain the names of items that have been
    seen more than once
    3) a list "deleteItems" that will contain items to be deleted.

    For each item in the vector {
    if the item's name is in the multiNames set,
    add the item to the deleteItems list.
    else if the item's name is a key in the uniqueNames hash map,
    add the item to the deleteItems list
    remove the item name from the uniqueNames hash table
    add the item's name ot the multiNames set
    else // the item has never been seen
    add the name / item to the uniqueNames hash set
    }
    Walk the deleteItems list to delete the items.

    If you have to do this alot I would really recommend just keeping a
    name-based index to the list of shared items as outlined above.

    Cheers,
    Matt Humphrey http://www.iviz.com/
     
    Matt Humphrey, Oct 20, 2003
    #3
  4. Stern

    Chris Smith Guest

    Matt Humphrey wrote:
    > To do it on-the-fly, create the following data structures:
    > 1) a hash map "uniqueNames" that will map a string to an item for an item
    > that has been seen for the first time.
    > 2) a set "multiNames" that will contain the names of items that have been
    > seen more than once
    > 3) a list "deleteItems" that will contain items to be deleted.
    >
    > For each item in the vector {
    > if the item's name is in the multiNames set,
    > add the item to the deleteItems list.
    > else if the item's name is a key in the uniqueNames hash map,
    > add the item to the deleteItems list
    > remove the item name from the uniqueNames hash table
    > add the item's name ot the multiNames set
    > else // the item has never been seen
    > add the name / item to the uniqueNames hash set
    > }
    > Walk the deleteItems list to delete the items.


    To make this a but simpler, recall that Vector implements List, and
    provides an Iterator. An Iterator can be used to remove elements of the
    Vector without disrupting the traversal. So the process becomes:

    Create the following data structures:

    1) A set "names" that will contain the names that have been seen

    For each item in the vector (using an iterator) {
    if the item's name is in the "names" set,
    remove the item using the iterator
    else
    add the item's name to the "names" set
    }

    --
    www.designacourse.com
    The Easiest Way to Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
     
    Chris Smith, Oct 20, 2003
    #4
  5. Stern

    Chris Smith Guest

    Chris Smith wrote:
    > To make this a but simpler, recall that Vector implements List, and
    > provides an Iterator. An Iterator can be used to remove elements of the
    > Vector without disrupting the traversal. So the process becomes:

    [...]

    Actually, I didn't realize what you were doing. You were deleting *all*
    copies of duplicates, where I assumed that the task was to delete the
    duplicate copies but leave one copy in. I guess the OP needs to decide
    which of us has it right, and implement the appropriate task either way.

    --
    www.designacourse.com
    The Easiest Way to Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
     
    Chris Smith, Oct 20, 2003
    #5
  6. Stern:

    >I have a Vector of objects, each of them having a name. Now I need to
    >find in this Vector objects with the same names, get the list of them
    >and remove them from the vector. I don't know which names these are.
    >Is there any quick solution for the problem?


    Sort the Vector with Collections.sort, then iterate over the entries
    and remove duplicates. It's easy with the sorted Vector because
    duplicates will obviously be adjacent. Something like this (v is your
    Vector):

    Iterator iter = v.iterator();
    Object prev = null;
    while (iter.hasNext()) {
    Object obj = iter.next();
    if (prev != null && obj.equals(prev)) {
    iter.remove();
    } else {
    prev = obj;
    }
    }

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Oct 20, 2003
    #6
  7. Stern

    Filip Larsen Guest

    Stern wrote

    > I have a Vector of objects, each of them having a name. Now I need to
    > find in this Vector objects with the same names, get the list of them
    > and remove them from the vector. I don't know which names these are.
    > Is there any quick solution for the problem?


    If we assume you want to remove duplicated entries only (that is, if an
    object with same name appears twice, only the second object is removed), and
    that the equals and hashCode method of the objects are defined mapped to the
    name of the object, then you should be able to use

    Vector x = ...
    Vector xWithoutDuplicates = new Vector(new LinkedHashSet(x));

    If you don't want a copy, but rather have the x vector be replaced you could
    do

    Vector x = ...
    Collection uniques = new LinkedHashSet(x);
    x.clear();
    x.addAll(uniques);

    Both of these should execute reasonably fast (comparable to the other
    methods proposed in this thread, although I haven't made an actual test
    run), but uses slightly more memory than needed due to use of LinkedHashSet.
    If the order of the elements in the vector doesn't matter, you can use
    HashSet instead and save some space.


    Regards,
    --
    Filip Larsen
     
    Filip Larsen, Oct 21, 2003
    #7
  8. Stern

    Roedy Green Guest

    On 20 Oct 2003 04:12:16 -0700, (Stern) wrote or
    quoted :

    >I don't know which names these are.
    >Is there any quick solution for the problem?


    deDupKeepingFirst or deDupKeepingLast

    See http://mindprod.com/products.html#SORTED

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Oct 21, 2003
    #8
  9. Stern

    Stern Guest

    > Actually, I didn't realize what you were doing. You were deleting *all*
    > copies of duplicates, where I assumed that the task was to delete the
    > duplicate copies but leave one copy in. I guess the OP needs to decide
    > which of us has it right, and implement the appropriate task either way.


    I really want to leave one copy in, so your approach is OK. I'll try
    yours and that one of Filip Larsen. Thanx.
     
    Stern, Oct 21, 2003
    #9
  10. Stern

    Phil... Guest

    How do you determine duplicate? Is it the name
    is a string and a duplicate entry has the same string?
    Or is it that the name references of the two entries
    are equal?

    "Michael Meyer" <> wrote in message
    news:bn0jcj$rrf$01$-online.com...
    > Hi,
    >
    > I cannot give you a quicksort for your problem but maybe you can use
    > java.util.Set. A Set does not allow duplicate entries.
    >
    > Michael
    >
    >
    > "Stern" <> schrieb im Newsbeitrag
    > news:...
    > > Hello,
    > >
    > > I have a Vector of objects, each of them having a name. Now I need to
    > > find in this Vector objects with the same names, get the list of them
    > > and remove them from the vector. I don't know which names these are.
    > > Is there any quick solution for the problem?
    > >
    > > Thanx,
    > >
    > > Stern.

    >
    >
     
    Phil..., Oct 22, 2003
    #10
  11. Phil... wrote:
    > How do you determine duplicate?


    By using the equals() method that every object has. Note that depending
    on the Set implementation class, a consistent implementation of the
    compareTo() or hashCode() methods may also be necessary.

    So if your objects implement these methods correctly (by delegating
    them to the name String object), it should work as intended.
     
    Michael Borgwardt, Oct 22, 2003
    #11
  12. Stern

    Roedy Green Guest

    On Wed, 22 Oct 2003 10:20:06 +0200, Michael Borgwardt
    <> wrote or quoted :

    >By using the equals() method that every object has.


    Deduping in not quite the same as removing equal elements. Sometimes
    your definition for deduping would require a different equal method,
    ..e.g. Dedup files with equal names, equal names ignoring case, or
    dedup files with equal names and timestamp,
    or equal sizes ...

    The way I handle it in deDupKeepingFirst is to use a variable
    comparator, the same one I use to sort the elements before checking
    adjacents for equality.

    see http://mindprod.com/jgloss/products.html#SORTED
    for source.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Oct 22, 2003
    #12
  13. Stern

    jyotidayal

    Joined:
    Dec 5, 2010
    Messages:
    1
    I ran into a problem of duplicate files and someone told me about duplicates-finder.com
     
    jyotidayal, Dec 5, 2010
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. da_rod_father
    Replies:
    3
    Views:
    6,790
    Hal Rosser
    Mar 18, 2006
  2. D'Arcy J.M. Cain
    Replies:
    1
    Views:
    596
    Paul Rubin
    Mar 26, 2009
  3. Replies:
    1
    Views:
    483
    John Machin
    Mar 26, 2009
  4. MRAB
    Replies:
    0
    Views:
    744
  5. Wybo Dekker
    Replies:
    1
    Views:
    389
    Yukihiro Matsumoto
    Nov 15, 2005
Loading...

Share This Page