Text File vs Database reading performance

A

antoine

** problem with news sender, apologies if message appears twice **

Hello,

I have developped an application that is doing some "backtesting" on
market data.
I have a serie of text files that contain data, one meaningful data per
line.
my app opens the file, reads a line, process it, reads the next one,
process it, and so on.

each file is around 50,000 lines long, and I have more than 200 files
(and growing).

the whole processing is taking quite some time, and I'm trying to find
ways to make it faster.

I chose the "file" way a while ago when I was lazy and the "speed"
problem had not appeared yet. however, I know I/O is one important
thing to look at if I want to improve performances.

especially, I'm thinking wouldn't that be faster to go through a
Database ? enter all my data in a DB once, and access the DB with JDBC.

has any of you any comment to make on ways to make things faster, and
on the Database performance especially ? or is my file solution good
enough ?

thanks for your insight...

-Antoine
 
R

Robert Klemme

** problem with news sender, apologies if message appears twice **

Hello,

I have developped an application that is doing some "backtesting" on
market data.
I have a serie of text files that contain data, one meaningful data per
line.
my app opens the file, reads a line, process it, reads the next one,
process it, and so on.

each file is around 50,000 lines long, and I have more than 200 files
(and growing).

the whole processing is taking quite some time, and I'm trying to find
ways to make it faster.

I chose the "file" way a while ago when I was lazy and the "speed"
problem had not appeared yet. however, I know I/O is one important
thing to look at if I want to improve performances.

especially, I'm thinking wouldn't that be faster to go through a
Database ? enter all my data in a DB once, and access the DB with JDBC.

has any of you any comment to make on ways to make things faster, and
on the Database performance especially ? or is my file solution good
enough ?

thanks for your insight...

-Antoine

First make sure that your problem is actually an IO problem. You should
profile your app to find out what parts are slow. *Then* you can start
thinking about ways to improve performance.

I wouldn't be so sure that a DB will make this faster. Reading from a
flat file can be pretty fast as well if you just read from beginning to end.

Kind regards

robert
 
B

Boris Stumm

antoine said:
my app opens the file, reads a line, process it, reads the next one,
process it, and so on.

What exactly do you mean with "processing"?
each file is around 50,000 lines long, and I have more than 200 files
(and growing).

the whole processing is taking quite some time, and I'm trying to find
ways to make it faster.

What does "some time" mean in terms of seconds?
I chose the "file" way a while ago when I was lazy and the "speed"
problem had not appeared yet. however, I know I/O is one important
thing to look at if I want to improve performances.

I assume you DO use BufferedReader or BufferedInputStream?
especially, I'm thinking wouldn't that be faster to go through a
Database ? enter all my data in a DB once, and access the DB with JDBC.

has any of you any comment to make on ways to make things faster, and
on the Database performance especially ? or is my file solution good
enough ?

If you just sequentially go over all elements, and need to process all,
using a database will not help you.

Depending on the type of processing, you can let the DB do it for you,
which can be much faster.
 
A

Andy Dingley

antoine said:
I have a serie of text files that contain data, one meaningful data per
line. my app opens the file, reads a line, process it, reads the next one,
process it, and so on.

This should always (in general) be faster than a database, provided
that:

- You're reading this serial file in order and never need a "search"
or "sort" operation.

- Your implementation isn't grossly inefficient.

I wouldn't look at a database as a solution here. If it's slow, then
work on optimizing your file-reading solution, not changing it
completely.
 
J

Jeffrey H. Coffield

antoine said:
** problem with news sender, apologies if message appears twice **

Hello,

I have developped an application that is doing some "backtesting" on
market data.
I have a serie of text files that contain data, one meaningful data per
line.
my app opens the file, reads a line, process it, reads the next one,
process it, and so on.
thanks for your insight...

-Antoine

You should determine if the slowness is related to actually reading the
data or is it related to the amount of processing. Try looping x
thousand times using the same data without reading a new record.

If this takes about the same amount of time then you need to either
optimize the processing (spend time) or move to a faster platform
("throw money at the problem").

If this test is much faster (which I doubt unless you are using a
compiled form of Java) then there are methods that could speed up
reading the file such as using a zip library or a database that
implements compression.

Jeff Coffield
 
A

antoine

Thanks for all the input, I will try to implement some of the tests
suggested.

1. processing = each line is basically a parameter/value pair,
depending on the parameter type I will use the value in some specific
computation.

2. some time = around 5 seconds per file (around 40,000 lines per
file), I have more than 200 files, so I'm looking at more than 15
minutes per "run"

3. I am using BufferedReader in the following way:

private void readMarketDataFile(String file) throws
FileNotFoundException, IOException {
String line;
int numberLines = 0;
BufferedReader in = new BufferedReader(new FileReader(file));
while ((line = in.readLine()) != null) {
numberLines++;
try {
analyseLine(line);
}
catch(Exception e) { e.printStackTrace(System.out); }
}
System.out.println("number of lines read: " + numberLines);
in.close();
}

my problem is that I feel I'm reading the file "in pieces" and
alternating between reading & processing. I'm wandering if there's a
faster way, to read the file at once and store elements in a data
structure, then analyse each entry of that data structure...

any take on that ?

thanks again

-Antoine
 
E

EJP

antoine said:
my problem is that I feel I'm reading the file "in pieces" and
alternating between reading & processing. I'm wandering if there's a
faster way, to read the file at once and store elements in a data
structure, then analyse each entry of that data structure...

any take on that ?

Yes. Don't. What's wrong with 'reading the file in pieces'? The 2nd
technique you propose doesn't scale to arbitrarily large files, and
gives you an I/O-bound first phase followed by a compute-bound 2nd
phase. The way you're doing it now is better from all points of view,
especially that of other users of the system.

You could try a larger buffer size when constructing the BufferedReader,
say 64k, but in my experience the default (8k *chars* or 16kB) is pretty
well chosen. If you make this buffer size *really* large you will
effectively get something like your 2nd suggestion anyway.
 
R

Robert Klemme

Yes. Don't. What's wrong with 'reading the file in pieces'? The 2nd
technique you propose doesn't scale to arbitrarily large files, and
gives you an I/O-bound first phase followed by a compute-bound 2nd
phase. The way you're doing it now is better from all points of view,
especially that of other users of the system.

A compromise approach would be to have a reader thread and a processor
thread connected by a bounded (!) queue. This gives you parallel
reading and processing or put differently, processing is not stopped by
slow reading. This is actually only worth the effort if processing is
slower than reading. If not, your processing thread is slowed down by
the reader. I doubt though that it's worth the effort - especially
since we haven't seen proof which part of the system is slow.
You could try a larger buffer size when constructing the BufferedReader,
say 64k, but in my experience the default (8k *chars* or 16kB) is pretty
well chosen. If you make this buffer size *really* large you will
effectively get something like your 2nd suggestion anyway.

Yep.

Kind regards

robert
 
J

JRH

You might also want to profile/instrument the code to tell you what
proportion of time is spent reading and processing the data.
Obviously, I don't know your application, but even reading millions of
small rows shouldn't eat up the hundreds of seconds you say you are
seeing. On the other hand, an inefficient data structure (unbalanced
map or a list being searched) in an app with millions of data points
easily could.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top