Best way to process a file with 20,000 records

M

Me

Hi all

What is the best way to upload and then process a csv file that
contains around 20,000 records?
The data needs to be validated and then added to the database. All
records that dont satisfy the validation requirements will need to be
recorded. However the update can still go ahead for those that do

I am using Dotnet 1.1 with Sql Server 2000 as the datasource

Is it a good idea to do this in a different thread, or have a windows
service perform this task?

Any ideas/suggestions anyone?
 
O

Otis Mukinfus

Hi all

What is the best way to upload and then process a csv file that
contains around 20,000 records?
The data needs to be validated and then added to the database. All
records that dont satisfy the validation requirements will need to be
recorded. However the update can still go ahead for those that do

I am using Dotnet 1.1 with Sql Server 2000 as the datasource

Is it a good idea to do this in a different thread, or have a windows
service perform this task?

Any ideas/suggestions anyone?

If this is a daily scheduled job?, then you can write a console
application that parses the file and validates the data, that is
called by the scheduler on the installation machine. Use the command
line to give the application the file name if it changes frequently.

I'm afraid there is no way to do automatic validation of the data. You
will have to code the validation process.

Rows which fail validation can be written out to another file with a
statement describing the failure to validate,

I do this a lot and much larger files can be handled. In fact, the
environment I work in (main frame and AS/400) has files like this with
millions of rows.

Here is the basic methodology:

// get the file name from the command line
// open the file
// read a line
// split it into an array with the string.split method
// loop through the array elements validating each one
// if the row is valid pass it to a method that does the insert
// if not, tag the row with a statement describing the reason it
failed
// write the failed line out to the error file
// read another row and repeat

That's a pretty high level overview, but you can get the idea.


Otis Mukinfus
http://www.otismukinfus.com
http://www.tomchilders.com
 
D

DKode

I would do what Otis suggested. I process files with millions of rows
as well. Parse out the bad lines to a "NotImported.csv" file with an
error description. Then with the good records, i issue a BULK INSERT
statement and provide the path to the csv file to import it quickly
into SQL server.
 
S

shriop

You can try my parser I sell that will handle all the csv file parsing
and creating for you. I would basically do what the other people have
said, only I would use dts for the bulk insert rather than bulk insert
as it can handle true csv files whereas the bulk insert and bcp cannot.

Bruce Dunwiddie
http://www.csvreader.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top