I
ivowel
dear perl experts---
I have a 300MB .csv data file (60MB compressed) that I need to read,
but not write to:
key-part1,key-part2,data1,data2,data3
ibm,2003/01,0.2,0.3,0.4
ibm,1972/01,0.5,0.3,NaN
sunw,2003/01,0.3,NaN,0.1
....
the key-part1+key-part2 combination is unique, but neither key alone
is unique.
my first idea to use this data in perl was a bit naive: create a hash
of hashes, so that I can find data or iterate over all data items that
match only one of the two keys. Something like $data1->{ibm}-
would have been nice.
I can think of a couple of methods that I could use. I could read the
data with a C program, and then have perl query my C program (e.g.,
through a socket). yikes. I could copy (yikes) the data into a data
base and access it through a data base modules, though I am not sure
what data base I should use for this purpose. (I need not one-key
access, but two key multiple-record access.) or I could do the
combination, and put the data into an SQL data base and learn SQL just
so that I can quickly access my data file. yikes and yikes. maybe
perl6 could do better, but perl6 isn't around yet. is there a way to
code so that perl5 becomes more memory efficient?
This can't be an obscure problem. What is the recommended light-
weight way of dealing with such large-data situations in perl5 ?
advice appreciated...
sincerely,
/iaw
I have a 300MB .csv data file (60MB compressed) that I need to read,
but not write to:
key-part1,key-part2,data1,data2,data3
ibm,2003/01,0.2,0.3,0.4
ibm,1972/01,0.5,0.3,NaN
sunw,2003/01,0.3,NaN,0.1
....
the key-part1+key-part2 combination is unique, but neither key alone
is unique.
my first idea to use this data in perl was a bit naive: create a hash
of hashes, so that I can find data or iterate over all data items that
match only one of the two keys. Something like $data1->{ibm}-
that after it gobbled up about 4GB of RAM, my perl program died. this{192601} and $data1->{192601}->{ibm}. great idea indeed, except
would have been nice.
I can think of a couple of methods that I could use. I could read the
data with a C program, and then have perl query my C program (e.g.,
through a socket). yikes. I could copy (yikes) the data into a data
base and access it through a data base modules, though I am not sure
what data base I should use for this purpose. (I need not one-key
access, but two key multiple-record access.) or I could do the
combination, and put the data into an SQL data base and learn SQL just
so that I can quickly access my data file. yikes and yikes. maybe
perl6 could do better, but perl6 isn't around yet. is there a way to
code so that perl5 becomes more memory efficient?
This can't be an obscure problem. What is the recommended light-
weight way of dealing with such large-data situations in perl5 ?
advice appreciated...
sincerely,
/iaw