how can i insert a text file into a hashSet??

F

farah

Hi, ive written some code to feed a text file into my program, as shown
below, it currently displays the first line of the text file (this is
not needed, it was just a check to make sure the code worked).

Im not sure how i would go about storing the text file (a list of
words) in a hashset so that i can later compare two text files....im
thinking i may need to use a streamTokenizer? can anyone start me off
with some code? or just give me some pointers please??

import java.io.*;

public class FileIn {
// assigns BufferedReader an instance name to be used in the code
BufferedReader in;
public FileIn() {
try {
// make sure there is a text file in the java directory where the java
// code is saved
in = new BufferedReader(new FileReader("\\C:\\Program
Files\\java.txt\\"));
String line = in.readLine();
System.out.println(line);
// i think this code only allows for the first line to be read
// because it closes after one line is read and printed
in.close();

// if any error occured during the try statement the system will
display
// the error that it encountered
} catch(IOException ioe) {
System.out.println(ioe.toString());
}
}

public static void main(String args[]) {
//this gets executed when the java file is run
// this then (starting from the top) does what u tell it
FileIn newFile = new FileIn();
}
}
 
R

Rhino

farah said:
Hi, ive written some code to feed a text file into my program, as shown
below, it currently displays the first line of the text file (this is
not needed, it was just a check to make sure the code worked).

Im not sure how i would go about storing the text file (a list of
words) in a hashset so that i can later compare two text files....im
thinking i may need to use a streamTokenizer? can anyone start me off
with some code? or just give me some pointers please??

import java.io.*;

public class FileIn {
// assigns BufferedReader an instance name to be used in the code
BufferedReader in;
public FileIn() {
try {
// make sure there is a text file in the java directory where the java
// code is saved
in = new BufferedReader(new FileReader("\\C:\\Program
Files\\java.txt\\"));
String line = in.readLine();
System.out.println(line);
// i think this code only allows for the first line to be read
// because it closes after one line is read and printed
in.close();

// if any error occured during the try statement the system will
display
// the error that it encountered
} catch(IOException ioe) {
System.out.println(ioe.toString());
}
}

public static void main(String args[]) {
//this gets executed when the java file is run
// this then (starting from the top) does what u tell it
FileIn newFile = new FileIn();
}
}

I'm not clear on what you're trying to accomplish here, or why. Are you
trying to put an entire text file in a single slot of a hashSet? Or are you
trying to put each unique word of the text file into a different slot - so
that you have one slot for 'the', another slot for 'house', another slot for
"dog" etc.?

Why are you trying to do this? What do you want to compare?

Also, have you read the chapter on Collections in the Java Tutorial? I think
it might answer your questions. Here's the link:
http://java.sun.com/docs/books/tutorial/collections/index.html.
 
T

Thomas Fritsch

farah said:
Hi, ive written some code to feed a text file into my program, as shown
below, it currently displays the first line of the text file (this is
not needed, it was just a check to make sure the code worked).

Im not sure how i would go about storing the text file (a list of
words) in a hashset so that i can later compare two text files....im
thinking i may need to use a streamTokenizer? can anyone start me off
with some code? or just give me some pointers please??
You'll need 2 nested loops:
an outer loop for reading the file line by line until EOF,
and an inner loop for processing each line word by word.
Roughly like this:

BufferedReader in = ...;
String line = in.readLine(); // read first line
while (line != null) { // terminates on EOF
StringTokenizer tokens = new StringTokenizer(line, " \t");
while (tokens.hasMoreTokens()) {
String word = tokens.nextToken();
...
}
line = in.readLine(); // read next line
}
 
T

tom fredriksen

farah said:
Hi, ive written some code to feed a text file into my program, as shown
below, it currently displays the first line of the text file (this is
not needed, it was just a check to make sure the code worked).

Im not sure how i would go about storing the text file (a list of
words) in a hashset so that i can later compare two text files....im
thinking i may need to use a streamTokenizer? can anyone start me off
with some code? or just give me some pointers please??

That depends on whether you want only one occurrence of each word or
many, or if you want one slot for an entire file or the file spread over
the entire datastructure. Sets are for single occurence while List or
Map can handle several occurences. Read the suggested tutorial on
collections.

BTW, the code is of no importance to the question, so you could have
left it out.

/tom
 
E

Ed Kirwan

farah said:
Hi, ive written some code to feed a text file into my program, as shown
below, it currently displays the first line of the text file (this is
not needed, it was just a check to make sure the code worked).

StreamTokenizers can be a little off-putting; for an example, see:
http://www.codeguru.com/java/tij/tij0113.shtml

Or you could insert the engine from the following, more simple(?) code
into your own snippet:

import java.util.StringTokenizer;
class Crap {
public static void main(String[] args) {
new Crap().go();
}
private void go() {
String test = "Mary had a little lamb";
for (StringTokenizer st = new StringTokenizer(test);
st.hasMoreTokens();) {
String word = st.nextToken();
System.out.println("Word = " + word);
}
}
}
 
F

farah

Here is my revised code, which i think inserts a text file into a hash
set:
public class FileIn {

BufferedReader in;
HashSet set;


public FileIn() {

try {

in = new BufferedReader(new FileReader("C:\\Program
Files\\Java\\example.txt"));
set = new HashSet();

int Len = 1;
while(Len>0) {
String line = in.readLine();
try {
Len = line.length();
System.out.println(line);
set.add(line);
} catch(NullPointerException npe){
Len = 0; //no more file to read
}
}

public static void main(String args[]) {

FileIn newFile = new FileIn();

}
}

Im not sure how to modify the code I have (shown above) so that I can
feed three individual text files (A, B and C) into the program and
store them in individual hash sets. I need to do this so that I can
later compare A and C, n A and B. This is what I think I need to do in
pseudo code:

{
readFileIntoSet("my first file", setA);
readFileIntoSet("my second file", setB);
readFileIntoSet("my third file", setC);

double equalityAToB = compareSets(setA, setB);
double equalityAToC = compareSets(setA, setC);
double equalityBToC = compareSets(setB, setC);
}

I figured I need to write a method that takes the files as constants in
their parameters.....but im not to sure how to do this?! I thibk the
following would be fine but im not sure how to write the methods

private void readFileIntoSet(String filePath, Set setToReadIn) {...}
private double equalityBToC (Set firstSet, Set secondSet) {...}
 
P

Patricia Shanahan

farah said:
anyone? :eek:)

This thread may have faded out because of lack of clarity about the
ultimate objective. There are a lot of possible interpretations of
"insert a text file into a hashSet".

Your latest code suggests that you want to insert each line in the file
into a Set. Is that correct?

Is the objective to find out if two files each contain the same set of
lines, regardless of order and number of repetitions?

Patricia
 
F

farah

Im sorry if this has been difficult to understand.
What im trying to do here is to insert three indiviual text files into
three seperate hashSets. I would like to compare 2 files to check if
they have any of the same words (one word will only ever appear once in
a file eg 'the' will appear no more than once in each file).
 
P

Patricia Shanahan

farah said:
Im sorry if this has been difficult to understand.
What im trying to do here is to insert three indiviual text files into
three seperate hashSets. I would like to compare 2 files to check if
they have any of the same words (one word will only ever appear once in
a file eg 'the' will appear no more than once in each file).

That means you need to parse the words out of the files. Are they one
word per line, or can there be multiple words on a line?

Also, do you need to normalize the capitalization? Are "the" and "The"
the same word? Or is the capitalization already normalized?

Can you read the words and output a list of them? That is the first
implementation step.

Patricia
 
F

farah

Before feeding the files into the program they will have been stripped
of all punctuation, capital letters etc.Each word will be in small caps
and will be on an individual line. So far, the code i have (as
displayed above), displays the output of ONE text file and inserts it
into a hashset, line by line (or rather, word by word in this case).

I need to change the code so that it can feed THREE text files (A,B and
C). into the program instrad of just the one and insert them into
individual hash sets. Id like to compare files A and B, to see how
many of the words in A also appear in B. I would then like to compare
files A and C to check how many of the words in A also appear in C.

Farah
 
P

Patricia Shanahan

farah said:
Before feeding the files into the program they will have been stripped
of all punctuation, capital letters etc.Each word will be in small caps
and will be on an individual line. So far, the code i have (as
displayed above), displays the output of ONE text file and inserts it
into a hashset, line by line (or rather, word by word in this case).

I need to change the code so that it can feed THREE text files (A,B and
C). into the program instrad of just the one and insert them into
individual hash sets. Id like to compare files A and B, to see how
many of the words in A also appear in B. I would then like to compare
files A and C to check how many of the words in A also appear in C.

Farah

That clarifies a lot, but now I don't understand the difficulty. You
know how to read one file into one HashSet. Why not just do that three
times, with three different file names and three different HashSet
variables?

Patricia
 
F

farah

Yes, i have done so, i was wondering if there wasa simpler way of doing
this......however my main problem is that i dont know how to go about
comparing the hash sets??
Here's what i need to do in psuedo code:

double equalityAToB = compareSets(setA, setB);
double equalityAToC = compareSets(setA, setC);
double equalityBToC = compareSets(setB, setC);

Im not sure how to implement this in java though

Farah
 
R

Roedy Green

double equalityAToB = compareSets(setA, setB);
double equalityAToC = compareSets(setA, setC);
double equalityBToC = compareSets(setB, setC);

Im not sure how to implement this in java though


there are two ways.

The classic way is to create an array of each set, sort it and compare
looking for dups, the way the various methods in SortedArrayList do.
See http://mindprod.com/products1.html#SORTED

The slower "modern" way would be to iterate over setA with a for:each
and do a lookup of that element to see if it exists in setB

see http://mindprod.com/jgloss/hashset.html
 
A

Andrew McDonagh

Roedy said:
there are two ways.

The classic way is to create an array of each set, sort it and compare
looking for dups, the way the various methods in SortedArrayList do.
See http://mindprod.com/products1.html#SORTED

Classic?

Depends upon how fast the search algorithm needs to be and whether
readability of code is preferred.

Fast executing code isn't usually the easiest code to read as its been
optimized.
The slower "modern" way would be to iterate over setA with a for:each
and do a lookup of that element to see if it exists in setB

see http://mindprod.com/jgloss/hashset.html

Whats 'Modern' about that approach?

Iterating over a collection (raw array or Java collection classes), to
find duplicates is not modern. If you count the early 1980s as modern
(as in the history of computing), then using Iterators instead of direct
indexing with a loop, maybe 'Modern'. I'd say it was just good OO design.

YMMV
 
F

farah

which of these ways would be simpler?? I dont have much experience
with java and it takes me a while to figure things out so id like to go
with the simpler option

Farah
 
F

farah

This is my code as it stands:

public class FileIn {

BufferedReader in;
HashSet set;


public FileIn() {


try {


in = new BufferedReader(new FileReader("C:\\Program
Files\\Java\\example.txt"));
set = new HashSet();


int Len = 1;
while(Len>0) {
String line = in.readLine();
try {
Len = line.length();
System.out.println(line);
set.add(line);


} catch(NullPointerException npe){


Len = 0; //no more file to read


}
}


public static void main(String args[]) {

FileIn newFile = new FileIn();



should i move the method in the constructor into a new function so that
it looks something like this:

class FileIn
{
public FileIn() { }
public bool ReadFile(String thePathToTheFile) { }
public bool CompareTo(FileIn theFileToCompareTo) {}
};

Then inside the compare to function do the comparison function.. ??
If so, how would i go about writing a comparison function??
 
R

Roedy Green

String line = in.readLine();
try {
Len = line.length();
System.out.println(line);
set.add(line);


} catch(NullPointerException npe){

That leaves a bad taste in the mouth. You don't want to camouflage
other illegit NPEs.

try something like this:

while ( ( line = in.readLine() ) != null )
{
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,778
Messages
2,569,605
Members
45,237
Latest member
AvivMNS

Latest Threads

Top