HashSet is strange

A

Anony!

Hi

I have a list (linkedlist) that contains duplicates.

List list = new LinkedList();

I want to remove the redundancy and record it, so I decided to move it into
a set (a set does not contain duplicates). I chose HashSet because order is
not important.

Set set = new HashSet(list);


Questions:

1. This removes the redundancy for simple objects like Strings, but how what
if I want to remove objects that have the same identifier? I figured you
would have to over-ride the write an equals method, but i cant find it in
the HashSet API.

2. Second, how do i retrieve the objects that were not added to the set
(because of redundancy)?


Any help appreciated.

AAA
 
M

Murray

Re: HashSet is strange

I respectfully disagree :p
1. This removes the redundancy for simple objects like Strings, but how what
if I want to remove objects that have the same identifier? I figured you
would have to over-ride the write an equals method, but i cant find it in
the HashSet API.

Look at the doc for Set, in particular the add() method. The requirement of
overriding equals and hashCode is not specific to HashSet.
2. Second, how do i retrieve the objects that were not added to the set
(because of redundancy)?

If you're doing this:

Set set = new HashSet(list);

then you'll probably have to manually iterate over your list and check which
elements exist in the Set. Alternatively, if you're adding elements to the
Set in a loop, the add() method returns true if the object didn't already
exist in the Set. Either way, there's no in-built API call to do exactly
what you want.

Commons Collections has set operations like intersect, union, disjunct etc
that you may find helpful, but it would be fairly trivial to write your own.
http://jakarta.apache.org/commons/collections/
 
A

Anony!

Re: HashSet is strange
I respectfully disagree :p


Look at the doc for Set, in particular the add() method. The requirement of
overriding equals and hashCode is not specific to HashSet.

This is what makes HashSet strange.

I assume it creates a hashcode for each object and uses an equals method
(somewhere) to check if the objects are equal.

Which implementation of the Set interface allows for you to specify the
semantics of objects being equal?
If you're doing this:

Set set = new HashSet(list);

then you'll probably have to manually iterate over your list and check which
elements exist in the Set. Alternatively, if you're adding elements to the
Set in a loop, the add() method returns true if the object didn't already
exist in the Set. Either way, there's no in-built API call to do exactly
what you want.

So your saying i have to write my own method to remove redundancy from a
linked-list?

AAA
 
M

Murray

This is what makes HashSet strange.

I assume it creates a hashcode for each object and uses an equals method
(somewhere) to check if the objects are equal.

Which implementation of the Set interface allows for you to specify the
semantics of objects being equal?

Any Set. I think you misunderstand. You need to override the equals() method
on your object to define when your objects should be considered equal. Set
will then use your equals() method to determine equality, and hence
"redundancy".

http://mindprod.com/jgloss/equals.html (Roedy where are you?)

You should also override hashCode http://mindprod.com/jgloss/hashcode.html
So your saying i have to write my own method to remove redundancy from a
linked-list?

No. You asked how to determine which elements were NOT added to the set.
You'd have to compare your new set with your original list to see which
elements weren't added.

You're on the right track using HashSet. You just need to override equals
and hashCode to define what makes your object equal.
 
C

Christian Schuhegger

John said:
If I understand you correctly, you are perpetrating an ugly hack by
doing as you describe. In particular, consider this excerpt of the docs
for the SortedSet interface (which is implemented by TreeSet): "Note
that the ordering maintained by a sorted set (whether or not an explicit
comparator is provided) must be consistent with equals if the sorted set
is to correctly implement the Set interface." The Comparator docs
discuss the issue more fully.

i have to admit that i did not read the documentation that carefully.

actually what i would like to have is an interface as you have in ANSI
Common Lisp. for example have a look at the following two links:

14. Sequences:
http://www-2.cs.cmu.edu/Groups/AI/html/cltl/clm/node141.html
15.5. Using Lists as Sets:
http://www-2.cs.cmu.edu/Groups/AI/html/cltl/clm/node152.html

by letting all of the methods given in the above two references take
addional parameters like :test and :key you actually do not care which
datastructure you have as the underlying, a list, a hashmap, a tree, ...
the functions just do what you want them to do on any datastructure.

you do not have to think about a clever way to do something like:
usermap.keySet().removeAll(idSet);
i would have never guessed that the removal of the keys from the keySet
would also remove the elements from the map?? i would have guessed that
keySet returns a new data-structure without any connection to the
original map.

but by reading the documentation carefully enough i could have found out:
public Set keySet()
Returns a set view of the keys contained in this map. The set is backed
by the map, so changes to the map are reflected in the set, and vice-versa.
One sign of a mature programmer is that he or she understands that
clever hacks generally aren't.

i agree with you :) but another sign of a mature programmer is that he
not only knows one way to achieve a result ;)
 
J

John C. Bollinger

Anony! wrote:
[I wrote:]
I tried your method and it does indeed removes duplicates from a list.

Glad to hear that I still know how to code :) .
However, I don't understand how it works. I've been racking my brains over
it this morning.

Again, the same problem. READ THE API DOCS. You will have no end of
trouble trying to produce or understand code if you don't know what
particular methods do or the details of what particular classes
represent. You can find the J2SE API docs online at
http://java.sun.com/j2se/1.4.2/docs/api/index.html. If you look around
java.sun.com a little you will even find a version that you can download.
I gather it compares index positions of elements in the list. It appears to
be removing elements 1 index ahead of the current index all the time, but
thats not whats it's doing.

At each iteration, it obtains the index of the next element that it will
consider, then it retrieves that element. It finds the first index of
that element in the List, and if it is less than the index of the
current element under consideration (thus indicating that the current
one is a duplicate) then it removes the current element via the
iterator. The iterator keeps track of the position in the List, and
does not get confused if it is used to modify the List during iteration.

Part of your specific confusion in this case appears to relate to what
an Iterator is / does: in particular, that an Iterator never has a
"current" element. It has a next element (except at the end of the
iteration) and a previous element (except at the start of the
iteration), but if you think of it as a marker in the List then it
always marks a spot _between_ elements.


John Bollinger
(e-mail address removed)
 
J

John C. Bollinger

Christian said:
i have to admit that i did not read the documentation that carefully.

The same remark appears in TreeSet's class documentation as well; I
cited SortedSet instead to emphasize that the requirement it is not a
quirk of TreeSet specifically. And as I wrote before, the issue is
discussed at some length in the Comparator docs too. I'm not looking to
pick a fight, but I have only limited sympathy for people who do not
RT<expletive/>M. At least the class-level docs and the docs for the
specific methods used, implemented, or overridden. How can you expect
to subclass a class or implement an interface correctly if you do not
know how your class' parent(s) indicate it should behave? How can you
expect to use an object's methods appropriately if you don't know
exactly what they're supposed to do?
actually what i would like to have is an interface as you have in ANSI
Common Lisp. for example have a look at the following two links:

14. Sequences:
http://www-2.cs.cmu.edu/Groups/AI/html/cltl/clm/node141.html
15.5. Using Lists as Sets:
http://www-2.cs.cmu.edu/Groups/AI/html/cltl/clm/node152.html

I don't speak Lisp, but I get the idea.
by letting all of the methods given in the above two references take
addional parameters like :test and :key you actually do not care which
datastructure you have as the underlying, a list, a hashmap, a tree, ...
the functions just do what you want them to do on any datastructure.

If you read the docs for the various Collections classes (or at least
those for the basic interfaces: Collection, List, Set, and maybe Map)
then you might find that Java Collections do much of what you're looking
for. List and Set both extend Collection in such a way that you can use
them fairly interchangeably. The Collections class (note the "s")
offers a variety of methods for Collection instances, some more generic
than others, but none more specific than the basic collection
interfaces. Collection itself defines methods for several of the
operations described for Lisp Sequences, and List defines a few more.
Those defined on Lists and not on Sets are inapplicable to a Java Set
because in Java generic Sets have no defined order.
you do not have to think about a clever way to do something like:


i would have never guessed that the removal of the keys from the keySet
would also remove the elements from the map?? i would have guessed that
keySet returns a new data-structure without any connection to the
original map.

And therein you have touched one of my pet peeves: why are you
_guessing_ about such details? As I have been exhorting our friend
Anony (who started this thread), read the API docs.
but by reading the documentation carefully enough i could have found out:
public Set keySet()
Returns a set view of the keys contained in this map. The set is backed
by the map, so changes to the map are reflected in the set, and vice-versa.

One way a person might come to such a discovery without carefully
studying all the method descriptions of every class and interface might
be something like this:

() I conceive a problem: I want to remove from a Map all of the entries
with keys that appear in some Collection.
() I look to the Map interface for a solution, and find no method that
performs this specific task.
() It seems an entirely reasonable thing to do, however, so perhaps
there is still an easy way to do it. I start looking at the methods that
have to do with keys.
() The keySet() method stands out because it returns a Collection (a Set
in fact). I read the docs for the keySet method, and aha!

If you approach the platform API with the view that it is fairly
complete and flexible then you will be more likely to find what you're
looking for. The Collections classes are an especially good example,
IMO; they have their flaws (some of which are inescapable consequences
of the Java language design), but they are well designed, highly
flexible, and written at very appropriate levels of abstraction.


John Bollinger
(e-mail address removed)
 
C

Christian Schuhegger

John said:
And therein you have touched one of my pet peeves: why are you
_guessing_ about such details? As I have been exhorting our friend
Anony (who started this thread), read the API docs.

i think we can stop this here and i agree that you're right. i am also
annoyed by people not reading the documentation properly.

but after having seen serveral different programming languages,
programming apis, ... for me the natural approach for the task was to
look for something that takes a comparator and therefore i did not even
look further.

for example even in c++ the <algorithm>
http://www.dinkumware.com/manuals/reader.aspx?b=p/&h=algorith.html
package gives you the possibility to use all of the algorithms with a
predicate instead of having to implement a custom equals/hashCode method
like in java. besides that this is the first time that i see a keySet()
method that acts like the one in the java HashMap.

whenever possible i try to stick to a functional programming style and
use "higher order functions". there are good papers comparing functional
programming and object oriented programming in the context of c++ (for
people not used to functional programming), e.g.:
http://citeseer.ist.psu.edu/383622.html
http://okmij.org/ftp/Computation/Subtyping/

have a nice evening,
 
T

Timo Kinnunen

2. Second, how do i retrieve the objects that were not added to the set
(because of redundancy)?

List list = /* original list */

List copy = new ArrayList(list);
Set set = new HashSet(list);
copy.removeAll(set);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top