large data file manipulation

R

Roland Hall

I'm looking for information on working with large data files using FSO, XML.

I have a program which creates a large CSV file, over 7mb. It's a rate
table of freight shipping costs.
There are certain fields I do not need, some are blank. A typical line
would be:

Raw data:

" ", "30142", "GA", "01001"," ", "MA","
","100",018609,000000,000000,000000,014435,013181,010622,009022,007125,006569,006569,006569,006569,000000,000000,000000,000000

structure:

blank,fromzip,fromstate, tozip,blank, tostate,blank,class, mc, blank, blank,
blank, l5c, m5c, m1m, m2m, m5m, mxm, mxxm, mxxxm, mxlm, blank, blank,
blank,blank

I don't need the double quotes or spaces or any field determined to be blank
in the structure. It is my understanding I can read this file in 3 ways:

read(b)
readLine
readAll

I chose readLine because I didn't want the 7mb all at once nor reading bytes
because the line is not fixed. I'm using readLine. I manipulate my data
and append my data to a new file after 1000 lines, finishing up with however
many lines are left upon reaching the end.

My result file is a little over 3mb [41380 lines of raw data]. It takes
seconds to process and will only be used if shipping rates change. The 3mb
file is still too large to work with and I have decided to split it up in
one of two ways, either by state or zip code ranges. "By state" gives me 50
and zip range gives me 10. Not sure what the difference in size will be or
if it will be a noticeable difference. The rate table, or part of it, will
only in memory long enough to get the rate and then released.

I have printing to the screen turned on during the debug process. You can
see it here:
http://kiddanger.com/dev/freight.asp

My questions are:

Since I have to use data files would using XML over CSV be drastically
different to use as a lookup for my new file?
How much more efficient is XML to retrieve information over CSV being read
in? To make a true comparison, the result will eventually be multiple
files, read in with readALL [if used as CSV] and then I would search an
array for the rate I needed.

If I used XML, would it be necessary to split the file up, as I would with
the CSV [by ship to state] or could I use the single file?

Yes, I know SQL is better but I have to also have a version that does not
use a database.

TIA...

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,112
Latest member
BrentonMcc
Top