Read a file multiple times

F

Federico

I, I've this code:

public class Main {

public static void main(String[] args) {

try {


FileWriter fstream = new FileWriter("out.txt");

BufferedWriter out = new BufferedWriter(fstream);

FileReader input = new FileReader("in1.txt");

BufferedReader bufRead = new BufferedReader(input);

FileReader input2 = new FileReader("in2.txt");

BufferedReader bufRead2 = new BufferedReader(input2);

String line;

String line2;

line = bufRead.readLine();

line2 = bufRead2.readLine();

bufRead2.mark(7000000);

while (line != null) {
while(line2 != null) {
out.write(line + line2 + "\n");
line2 = bufRead2.readLine();
}

line = bufRead.readLine();
bufRead2.reset();
}

bufRead.close();
bufRead2.close();
} catch (ArrayIndexOutOfBoundsException e) {
e.printStackTrace();

} catch (IOException e) {
e.printStackTrace();
}
}
}

Basically I want to read the file in1 and for each string founded,
combine it with all the in2 strings and write all in out:
For example:
in1.txt:

aaa
bbb
ccc
ddd

in2.txt:


eee
fff
ggg
hhh
iii

out.txt:

aaaeee
aaafff
aaaggg
aaahhh
aaaiii
bbbeee
bbbfff
....
dddiii

This will implement that for each line in in1.txt the bufferedreader
of in2.txt will be reset.
Obviusly this work onli for the first string of in1.txt.
Hi read the documentation for mark() and reset() but I can't solve
nothing with these methods.
I've to use vectors?
Maybe is better to use fileinputstream?

Please help me.

ps: I'm out of scholl and this is not a trick for solve homeworks (I
hate that) and I really try to solve myself but I can't.

cheer,
Federico.
 
P

Patricia Shanahan

Federico said:
I, I've this code:

public class Main {

public static void main(String[] args) {

try {


FileWriter fstream = new FileWriter("out.txt");

BufferedWriter out = new BufferedWriter(fstream);

FileReader input = new FileReader("in1.txt");

BufferedReader bufRead = new BufferedReader(input);

FileReader input2 = new FileReader("in2.txt");

BufferedReader bufRead2 = new BufferedReader(input2);

String line;

String line2;

line = bufRead.readLine();

line2 = bufRead2.readLine();

bufRead2.mark(7000000);

while (line != null) {
while(line2 != null) {
out.write(line + line2 + "\n");
line2 = bufRead2.readLine();
}

line = bufRead.readLine();
bufRead2.reset();
}

bufRead.close();
bufRead2.close();
} catch (ArrayIndexOutOfBoundsException e) {
e.printStackTrace();

} catch (IOException e) {
e.printStackTrace();
}
}
}

Basically I want to read the file in1 and for each string founded,
combine it with all the in2 strings and write all in out:
For example:
in1.txt:

aaa
bbb
ccc
ddd

in2.txt:


eee
fff
ggg
hhh
iii

out.txt:

aaaeee
aaafff
aaaggg
aaahhh
aaaiii
bbbeee
bbbfff
...
dddiii

This will implement that for each line in in1.txt the bufferedreader
of in2.txt will be reset.
Obviusly this work onli for the first string of in1.txt.
Hi read the documentation for mark() and reset() but I can't solve
nothing with these methods.
I've to use vectors?
Maybe is better to use fileinputstream?

The choice depends on the file size, relative to the available memory.
The simplest solution is going to be to read one of the files into an
in-memory data structure, such as an ArrayList. Once you have done that,
you can read the other file a line at a time and output all pairs for
that line.

The fact that outputting all pairs seems reasonable to you suggests that
at least one of the files is reasonably small. For example, if the
smaller file contains a million lines, the list of pairs has at least
10**12 elements.

However, if neither file fits in memory, you are going to have to open
one of them as a RandomAccessFile. For each line of the other file, you
need to seek(0) in the RandomAccessFile and use readLine to advance
through it a line at a time.

Patricia
 
F

Federico

First thanks for your response;
nope, the file is relatively large and do not fit in memory :(
Yes I've had in mind to open as a random access file.
If you have the time can you show me a little general example to do
it?

Again thanks!

Federico.
 
J

Jim Korman

First thanks for your response;
nope, the file is relatively large and do not fit in memory :(
Yes I've had in mind to open as a random access file.
If you have the time can you show me a little general example to do
it?

Again thanks!

Federico.

Federico, Using a RandomAccessFile as Patrica mentioned is easy

// Open for read only "r"
RandomAccessFile myFile = new RandomAccessFile("myfile.dat","r");

// read some data
String line = myfile.readLine();

// If you want to "mark" your current position
long markPos = myfile.getFilePointer();

// And to go back to that position
myfile.seek(markPos);

Jim
 
M

Mark Space

Federico said:
First thanks for your response;
nope, the file is relatively large and do not fit in memory :(


Yuck.

What kind of madness is this program? Are you just testing or did
someone actually think this is a good idea and pay you for it?
 
M

Martin Gregorie

There's a simple solution to this problem: use a relational database.
Load each list into a separate table and then do a join. The result is
exactly what you want.

Here's a test script I ran using Postgres:
==========================================
create table atab ( a char(3) );
create table btab ( b char(3) );
insert into atab ( a ) values ( 'aaa' );
insert into atab ( a ) values ( 'bbb' );
insert into atab ( a ) values ( 'ccc' );
insert into atab ( a ) values ( 'ddd' );
select a from atab;

insert into btab ( b ) values ( 'eee' );
insert into btab ( b ) values ( 'fff' );
insert into btab ( b ) values ( 'ggg' );
insert into btab ( b ) values ( 'hhh' );
insert into btab ( b ) values ( 'iii' );

select b from btab;

select a,b from atab, btab;

drop table atab;
drop table btab;

and here is the output from the three SELECT statements:
========================================================
a
-----
aaa
bbb
ccc
ddd
(4 rows)

b
-----
eee
fff
ggg
hhh
iii
(5 rows)

a | b
-----+-----
aaa | eee
aaa | fff
aaa | ggg
aaa | hhh
aaa | iii
bbb | eee
bbb | fff
bbb | ggg
bbb | hhh
bbb | iii
ccc | eee
ccc | fff
ccc | ggg
ccc | hhh
ccc | iii
ddd | eee
ddd | fff
ddd | ggg
ddd | hhh
ddd | iii
(20 rows)

....and the beauty of this approach is that the file sizes are only
limited by the disk space available for the database. Its fast too:

$ time psql -f dbtest.sql >dbtest.txt


real 0m0.088s
user 0m0.002s
sys 0m0.011s

This was run on a 256 MB, 866 MHz NetVista running Postgres 8.2.5 under
Linux (Fedora Core 7).

Obligatory Java content: If this is written in Java you'd use JDBC to
insert the files into the tables and write the file containing the
output. If you use the Derby database you can have an all-Java solution.
I ran the test under Postgres because that's what I have installed.

However, there may be faster approaches. Its likely that the database
has table loading utilities that will be at least as fast as anything
you can write (e.g. the Postgres COPY verb). If the output must simply
go to a file a script may handle this too: I'd probably just pipe the
SELECT output through gawk to adjust the line format and remove the
header and trailer lines.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top