Making a file-like object for manipulating a large file

S

Sean Davis

This should be a relatively simple problem, but I haven't quite got
the idea of how to go about it. I have a VERY large file that I would
like to load a line at a time, do some manipulations on it, and then
make it available to as a file-like object for use as input to a
database module (psycopg2) that wants a file-like object (with read
and readlines methods). I could write the manipulated file out to
disk and then read it back in, but that seems wasteful. So, it seems
like I need a buffer, a way to fill the buffer and a way to have read
and readlines use the buffer. What I can't do is to load the ENTIRE
file into a stringio object, as the file is much too large. Any
suggestions?

Thanks,
Sean
 
S

Steve Holden

Sean said:
This should be a relatively simple problem, but I haven't quite got
the idea of how to go about it. I have a VERY large file that I would
like to load a line at a time, do some manipulations on it, and then
make it available to as a file-like object for use as input to a
database module (psycopg2) that wants a file-like object (with read
and readlines methods). I could write the manipulated file out to
disk and then read it back in, but that seems wasteful. So, it seems
like I need a buffer, a way to fill the buffer and a way to have read
and readlines use the buffer. What I can't do is to load the ENTIRE
file into a stringio object, as the file is much too large. Any
suggestions?
The general approach would be (something like the following untested code):

def filter_lines(f):
for line in f:
if to_be_included(line):
yield line

fil = open("somefile.big.txt", "r")\

filegen = filter_lines(fil)

You can then iterate over the filegen generator, or write your own class
that makes it file-like. At least the generator manages to throw away
the unwanted content without buffering the whole file in memory.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
 
L

Lawrence D'Oliveiro

Sean Davis said:
I have a VERY large file that I would
like to load a line at a time, do some manipulations on it, and then
make it available to as a file-like object for use as input to a
database module (psycopg2) that wants a file-like object (with read
and readlines methods). I could write the manipulated file out to
disk and then read it back in, but that seems wasteful.

If your consumer doesn't need to seek, how about having it read from a pipe?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top