Making a file-like object for manipulating a large file

Discussion in 'Python' started by Sean Davis, Aug 24, 2007.

  1. Sean Davis

    Sean Davis Guest

    This should be a relatively simple problem, but I haven't quite got
    the idea of how to go about it. I have a VERY large file that I would
    like to load a line at a time, do some manipulations on it, and then
    make it available to as a file-like object for use as input to a
    database module (psycopg2) that wants a file-like object (with read
    and readlines methods). I could write the manipulated file out to
    disk and then read it back in, but that seems wasteful. So, it seems
    like I need a buffer, a way to fill the buffer and a way to have read
    and readlines use the buffer. What I can't do is to load the ENTIRE
    file into a stringio object, as the file is much too large. Any
    suggestions?

    Thanks,
    Sean
     
    Sean Davis, Aug 24, 2007
    #1
    1. Advertising

  2. Sean Davis

    Steve Holden Guest

    Sean Davis wrote:
    > This should be a relatively simple problem, but I haven't quite got
    > the idea of how to go about it. I have a VERY large file that I would
    > like to load a line at a time, do some manipulations on it, and then
    > make it available to as a file-like object for use as input to a
    > database module (psycopg2) that wants a file-like object (with read
    > and readlines methods). I could write the manipulated file out to
    > disk and then read it back in, but that seems wasteful. So, it seems
    > like I need a buffer, a way to fill the buffer and a way to have read
    > and readlines use the buffer. What I can't do is to load the ENTIRE
    > file into a stringio object, as the file is much too large. Any
    > suggestions?
    >

    The general approach would be (something like the following untested code):

    def filter_lines(f):
    for line in f:
    if to_be_included(line):
    yield line

    fil = open("somefile.big.txt", "r")\

    filegen = filter_lines(fil)

    You can then iterate over the filegen generator, or write your own class
    that makes it file-like. At least the generator manages to throw away
    the unwanted content without buffering the whole file in memory.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    --------------- Asciimercial ------------------
    Get on the web: Blog, lens and tag the Internet
    Many services currently offer free registration
    ----------- Thank You for Reading -------------
     
    Steve Holden, Aug 24, 2007
    #2
    1. Advertising

  3. In message <>, Sean Davis
    wrote:

    > I have a VERY large file that I would
    > like to load a line at a time, do some manipulations on it, and then
    > make it available to as a file-like object for use as input to a
    > database module (psycopg2) that wants a file-like object (with read
    > and readlines methods). I could write the manipulated file out to
    > disk and then read it back in, but that seems wasteful.


    If your consumer doesn't need to seek, how about having it read from a pipe?
     
    Lawrence D'Oliveiro, Aug 26, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. E. Naubauer

    Manipulating large arrays very fast

    E. Naubauer, Jan 24, 2006, in forum: Java
    Replies:
    8
    Views:
    595
    E. Naubauer
    Jan 25, 2006
  2. Tim Stone

    Manipulating large blobs in Python

    Tim Stone, Apr 22, 2005, in forum: Python
    Replies:
    0
    Views:
    312
    Tim Stone
    Apr 22, 2005
  3. Halid Umar A M

    Manipulating with large numbers in C

    Halid Umar A M, Apr 25, 2006, in forum: C Programming
    Replies:
    29
    Views:
    3,684
    John Bode
    May 2, 2006
  4. Patrick Kowalzick
    Replies:
    5
    Views:
    475
    Patrick Kowalzick
    Mar 14, 2006
  5. Aaron Watters
    Replies:
    2
    Views:
    292
    Istvan Albert
    Nov 16, 2007
Loading...

Share This Page