Harsh Jha said:
I've a huge csv file and I want to read stuff from it again and again. Is it
useful to pickle it and keep and then unpickle it whenever I need to use that
data? Is it faster that accessing that file simply by opening it again and
again? Please explain, why?
Thank you.
It can be. I did a project a bunch of years ago which involved reading
(and parsing) SNMP MIBs before you could do any work. Startup took
something like 10-20 seconds. If I pre-parsed the MIBs and wrote out
the data structures as pickles, I could cut startup time to a couple of
seconds.
But, that's because the parsing I was doing was pretty complicated.
Parsing a CSV file is much easier, so I wouldn't expect you to have much
improvement reading a pickle file vs. reading the original CSV.
The bottom line is, you should try it. Pickling a data structure is
about one line of code (not counting the 'import cPickle'). Try it and
see what happens. Time how long it takes to read the original file, and
how long it takes to read the pickle. Let us know your results.
Also, let us know what "huge" means. 1000 rows? A million? 100
million?