Comparing elements from one hashset to another

F

farah

Hi,

The following code was repeated three times to store 3 files into 3
individual hash sets. I now need to compare file A to file B, i want to
see how many of the words in set A are also on set B. (Each set
contains the same number of words). Im not sure how to implement this,
can anyone help??

public FileIn() {

try {
// make sure there is a text file in the java directory where the java
// code is saved
in = new BufferedReader(new FileReader("C:\\Program
Files\\Java\\example.txt"));
set = new HashSet();

int Len = 1;
while(Len>0) {
String line = in.readLine();
try {
Len = line.length();
System.out.println(line);
set.add(line);
} catch(NullPointerException npe){
Len = 0; //no more file to read
}
}
 
P

Patricia Shanahan

farah said:
Hi,

The following code was repeated three times to store 3 files into 3
individual hash sets. I now need to compare file A to file B, i want to
see how many of the words in set A are also on set B. (Each set
contains the same number of words). Im not sure how to implement this,
can anyone help??

Take a look at the method retainAll that HashSet inherits from
AbstractCollection.

Patricia
 
R

Remon van Vliet

farah said:
Hi,

The following code was repeated three times to store 3 files into 3
individual hash sets. I now need to compare file A to file B, i want to
see how many of the words in set A are also on set B. (Each set
contains the same number of words). Im not sure how to implement this,
can anyone help??

public FileIn() {

try {
// make sure there is a text file in the java directory where the java
// code is saved
in = new BufferedReader(new FileReader("C:\\Program
Files\\Java\\example.txt"));
set = new HashSet();

int Len = 1;
while(Len>0) {
String line = in.readLine();
try {
Len = line.length();
System.out.println(line);
set.add(line);
} catch(NullPointerException npe){
Len = 0; //no more file to read
}
}

Hello, what you want to do is make a union of the two collections (e.g. make
a 3rd collection that has all elements that are contained by both of your
source collections, a .size() on this 3rd collection gives you the answer to
your problem). The fastest way to do this is (using A, B and C to match your
question) :

HashSet C = new HashSet(A);
A.retainAll(B);
C.size();

This obviously works just as well if you have more than 2 source collections

Remon van Vliet
 
F

farah

Im not too sure i understand........how would this enable me to compare
the values from two individual sets? would i be checking for
duplicates??


farah
 
O

Oliver Wong

farah said:
Im not too sure i understand........how would this enable me to compare
the values from two individual sets? would i be checking for
duplicates??

The retainAll() method essentially performs the intersection operation
on two sets, which is (I think) what you want (i.e. A intersect B = {all x :
(x is an element of A) and (x is an element of B)} ).

- Oliver
 
R

Roedy Green

The following code was repeated three times to store 3 files into 3
individual hash sets. I now need to compare file A to file B, i want to
see how many of the words in set A are also on set B. (Each set
contains the same number of words). Im not sure how to implement this,
can anyone help??

there are two basic ways:

extract the two sets as arrays, sort and compare linearly

Enumerate Set A and look each element up in Set B to see if it exists.

http://mindprod.com/jgloss/hashset.html#COMPARING
 
P

Patricia Shanahan

Oliver said:
The retainAll() method essentially performs the intersection
operation on two sets, which is (I think) what you want (i.e. A
intersect B = {all x : (x is an element of A) and (x is an element of
B)} ).

- Oliver

In particular, this does a non-destructive intersection using clone and
retainAll:

HashSet ab = (HashSet)a.clone();
ab.retainAll(b);

The elements of ab are those that appear in both a and b.

Of course, you can always do it manually by iterating over one of the
sets and asking the other set whether it contains each element. But why
write code yourself if it is already there and tested in java.util?

Patricia
 
R

Roedy Green

HashSet C = new HashSet(A);
A.retainAll(B);
C.size();

the retainAll is easiest to code of the three algorithms, but I
suspect that the sort method will be fastest for very large sets,
which is the bulkiest to code.

retainAll has the over head of creating a duplicate set and the
overhead of looking up each element and removing nearly all the
elements.

sort has the overhead of loading the sort class, exporting two arrays,
and sorting.

lookup has the overhead of a hashed lookup on every element of one of
the sets in the other.


Code for all three algorithms is posted at
http://mindprod.com/jgloss/hashset.html#COMPARING
 
P

Patricia Shanahan

Roedy said:
the retainAll is easiest to code of the three algorithms, but I
suspect that the sort method will be fastest for very large sets,
which is the bulkiest to code.

retainAll has the over head of creating a duplicate set and the
overhead of looking up each element and removing nearly all the
elements.

sort has the overhead of loading the sort class, exporting two arrays,
and sorting.

lookup has the overhead of a hashed lookup on every element of one of
the sets in the other.


Code for all three algorithms is posted at
http://mindprod.com/jgloss/hashset.html#COMPARING

In terms of theoretical complexity, the sort method is O(n log(n)), but
the retainAll method, and its manual equivalent, are O(n), assuming
HashSet lookup is constant time. Of course, for any reasonably small
problem size, the sort method might be faster.

On the other hand, we were not asked for the fastest method, and the
retainAll approach is really easy to code.

Patricia
 
F

farah

i think in this instance i might try the easier of the methods.....im
pretty new to java and not very competent with the language
 
H

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Patricia Shanahan schreef:
In particular, this does a non-destructive intersection using clone and
retainAll:

HashSet ab = (HashSet)a.clone();
ab.retainAll(b);

This also works if a is declared of type Set (don't code against
implementations...):

Set ab = new HashSet(a);
ab.retainAll(b);

This is what is recommended in Sun's Java tutorials, too.

H.

--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFELPuge+7xMGD3itQRAnM2AJ4vY6vgIh4qlsQ4Q8ObcSASvN34xwCeMeqH
3Ry72y9X1xuEaaEMwXBRgiI=
=axhv
-----END PGP SIGNATURE-----
 
E

Ed Kirwan

Hendrik said:
This also works if a is declared of type Set (don't code against
implementations...):

Set ab = new HashSet(a);
ab.retainAll(b);

This is what is recommended in Sun's Java tutorials, too.

H.

:)

Collection ab = new HashSet(a);
 
F

farah

Set ab = new HashSet(a);
ab.retainAll(b);


how can i check that this code works? it compiles but how can i check
that it has done its task?
 
F

farah

Set ab = new HashSet(a);
ab.retainAll(b);

ok so the code compares the words from two individual sets.....how can
i check the output of this so that it shows me which of the words
within the two sets are identical???


Farah
 
F

farah

Set ab = new HashSet(a);
ab.retainAll(b);

ok so the code compares the words from two individual sets.....how can
i check that it has identified any words that were contained in BOTH
sets?

(Id like the code to place any words that are contained in both sets
int a seperate store and than return the number of words contained in
that store (this being the total number of identical words that appear
in both sets))

Farah
 
W

Wibble

farah said:
Set ab = new HashSet(a);
ab.retainAll(b);

ok so the code compares the words from two individual sets.....how can
i check that it has identified any words that were contained in BOTH
sets?

(Id like the code to place any words that are contained in both sets
int a seperate store and than return the number of words contained in
that store (this being the total number of identical words that appear
in both sets))

Farah
ab.retainAll(b).retainAll(a).size()
 
F

farah

Roedy,

For the retainAll method, a list of fruits that had been entered
individually as strings were used. My code hoever uses info from files
that are read into the program. Im having trouble modifying my code to
work around this, can you help??

Farah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top