record size tracking

M

MileHighCelt

I have a CSV file with 100 columns and an unknown number of rows that I
am going to parse and insert into a database. I can't just fill up an
ArrayList with potentially 65K objects because performance just tanks.
At every 1000 records, I thought I would just shot that ArrayList over
to my DAO to do the insert of those row objects.

However, I am trying to think of a better way to handle that break than
comparing a counter to "1000" like this

if (aa ==1000) {
// call DAO
aa =0;
arraylist.clear;
}

because that will obviously fail near the end if there are not exactly
1000 records left to insert. Does anyone have any suggestions on how
to do this? I know this should be pretty basic (didn't we learn this in
school), but for the life of me I can't recall how to do it.
 
R

Rhino

MileHighCelt said:
I have a CSV file with 100 columns and an unknown number of rows that I
am going to parse and insert into a database. I can't just fill up an
ArrayList with potentially 65K objects because performance just tanks.
At every 1000 records, I thought I would just shot that ArrayList over
to my DAO to do the insert of those row objects.

However, I am trying to think of a better way to handle that break than
comparing a counter to "1000" like this

if (aa ==1000) {
// call DAO
aa =0;
arraylist.clear;
}

because that will obviously fail near the end if there are not exactly
1000 records left to insert. Does anyone have any suggestions on how
to do this? I know this should be pretty basic (didn't we learn this in
school), but for the life of me I can't recall how to do it.
If I had to do something similar in my own code, my logic would have a loop
to read all of the records in the CSV until you hit end of file. As you
retrieve each record, you store it in the array and increment the counter.
If the counter hits 1000, you call DAO, verify that the inserts worked, then
reset the counter back to zero. Once you're out of the loop, you call DAO
one last time, passing it the partially-filled array, and insert those
records. That should ensure that _all_ of the records get stored.

Rhino
 
R

Roedy Green

I have a CSV file with 100 columns and an unknown number of rows that I
am going to parse and insert into a database.

see http://mindprod.com/jgloss/csv.htm

You can use a CSV reader to read a row at a pop. You pretty well have
to feed them a row at a time to the database don't you? That way you
don't need that many objects at once.
 
R

Roedy Green

because that will obviously fail near the end if there are not exactly
1000 records left to insert. Does anyone have any suggestions on how
to do this?

that's a common problem, fixed sized chunks with one possibly empty
short one at the end. I have longed for slick pattern to avoid
repeating any code, but in the end I have always just handled the last
chunk specially.
 
R

Robert Klemme

MileHighCelt said:
I have a CSV file with 100 columns and an unknown number of rows that
I am going to parse and insert into a database. I can't just fill up
an ArrayList with potentially 65K objects because performance just
tanks. At every 1000 records, I thought I would just shot that
ArrayList over to my DAO to do the insert of those row objects.

However, I am trying to think of a better way to handle that break
than comparing a counter to "1000" like this

if (aa ==1000) {
// call DAO
aa =0;
arraylist.clear;
}

because that will obviously fail near the end if there are not exactly
1000 records left to insert. Does anyone have any suggestions on how
to do this? I know this should be pretty basic (didn't we learn this
in school), but for the life of me I can't recall how to do it.

I'd use List.size() instead of an extra counter. This is error prone.

private static final int COUNT = 1000;

List records = new ArrayList(COUNT);

while ( more to read ) {
records.add( read record );

if ( records.size() == COUNT ) {
send data(records);
records.clear();
}
}

if ( !records.isEmpty() ) {
send data(records);
}

Kind regards

robert
 
H

HalcyonWild

MileHighCelt said:
I have a CSV file with 100 columns and an unknown number of rows that I
am going to parse and insert into a database. I can't just fill up an
ArrayList with potentially 65K objects because performance just tanks.
At every 1000 records, I thought I would just shot that ArrayList over
to my DAO to do the insert of those row objects.

However, I am trying to think of a better way to handle that break than
comparing a counter to "1000" like this

if (aa ==1000) {
// call DAO
aa =0;
arraylist.clear;
}

because that will obviously fail near the end if there are not exactly
1000 records left to insert. Does anyone have any suggestions on how
to do this? I know this should be pretty basic (didn't we learn this in
school), but for the life of me I can't recall how to do it.


Instead of adding to an arrayList, is it possible to read one record
off the CSV, insert it, and then go to the 2nd row on the CSV. Keep a
list of all failed inserts instead, in the array list.
 
M

MileHighCelt

Thank you everyone for ideas I hadn't considered (was really thinking
counter mod something) and it gives me lots of things to think about.
I really like the idea of inserting a record as its parsed, except I
have to wonder about that performance. I have database pooling
working, so that wouldn't be an issue, but I wonder about other hits.
I suspect that each file will have about 80,000 records so what kind of
issues should I look out for when I get an existing connection, prepare
a statement and update a table?
 
J

Jimi Hullegård

MileHighCelt said:
Thank you everyone for ideas I hadn't considered (was really thinking
counter mod something) and it gives me lots of things to think about.
I really like the idea of inserting a record as its parsed, except I
have to wonder about that performance. I have database pooling
working, so that wouldn't be an issue, but I wonder about other hits.
I suspect that each file will have about 80,000 records so what kind of
issues should I look out for when I get an existing connection, prepare
a statement and update a table?

What database are you using? If all of the data in this file should be
inserted into the database, then maybe the database itself has some
functionallity to insert the data from the csv-file? Or some third party
program (search with google)....
I suspect this would be much faster then parsing the file in a javaprogram,
and inserting it into the database row by row.
Even if you want to manipulate the data this approach could still be valid.
Just insert it into a temporary table, then use sql-functions to manipulate
the data, and then move it to the real target table.

/Jimi
 
R

Roedy Green

What database are you using? If all of the data in this file should be
inserted into the database, then maybe the database itself has some
functionallity to insert the data from the csv-file?

Sybase has a load format like CSV, but with ' instead of ". It is very
quick. In database load mode it does not worry about transactions.
 
M

MileHighCelt

I am using mySQL 5.1 and a CSVparser from Ostemiller. Since there is
some manipulation of data (scrubbing mostly) and the uploads only occur
every 3 months or so, I think I will continue on this path. Besides, I
have it mostly coded now ;)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top