Break large file down into multiple files

Discussion in 'Python' started by brianrpsgt1, Feb 13, 2009.

  1. brianrpsgt1

    brianrpsgt1 Guest

    New to python.... I have a large file that I need to break up into
    multiple smaller files. I need to break the large file into sections
    where there are 65535 lines and then write those sections to seperate
    files. I am familiar with opening and writing files, however, I am
    struggling with creating the sections and writing the different
    sections to their own files.

    Thanks

    B
    brianrpsgt1, Feb 13, 2009
    #1
    1. Advertising

  2. En Fri, 13 Feb 2009 04:44:54 -0200, brianrpsgt1 <>
    escribió:

    > New to python.... I have a large file that I need to break up into
    > multiple smaller files. I need to break the large file into sections
    > where there are 65535 lines and then write those sections to seperate
    > files. I am familiar with opening and writing files, however, I am
    > struggling with creating the sections and writing the different
    > sections to their own files.


    This function copies at most n lines from fin to fout:

    def copylines(fin, fout, n):
    for i, line in enumerate(fin):
    fout.write(line)
    if i+1>=n: break

    Now you have to open the source file, create new files as needed and
    repeatedly call the above function until the end of source file. You'll
    have to enhace it bit, to know whether there are remaining lines or not.

    --
    Gabriel Genellina
    Gabriel Genellina, Feb 13, 2009
    #2
    1. Advertising

  3. brianrpsgt1

    brianrpsgt1 Guest

    On Feb 12, 11:02 pm, "Gabriel Genellina" <>
    wrote:
    > En Fri, 13 Feb 2009 04:44:54 -0200, brianrpsgt1 <>  
    > escribió:
    >
    > > New to python.... I have a large file that I need to break upinto
    > > multiple smallerfiles. I need to break the large fileintosections
    > > where there are 65535 lines and then write thosesectionsto seperate
    > >files.  I am familiar with opening and writingfiles, however, I am
    > > struggling with creating thesectionsand writing the different
    > >sectionsto their ownfiles.

    >
    > This function copies at most n lines from fin to fout:
    >
    > def copylines(fin, fout, n):
    >    for i, line in enumerate(fin):
    >      fout.write(line)
    >      if i+1>=n: break
    >
    > Now you have to open the source file, create newfilesas needed and  
    > repeatedly call the above function until the end of source file. You'll  
    > have to enhace it bit, to know whether there are remaining lines or not.
    >
    > --
    > Gabriel Genellina


    Gabriel ::

    Thanks for the direction. Do I simply define fin, fout and n as
    variables above the def copylines(fin, fout, n): line?

    Would it look like this?

    fin = open('C:\Path\file')
    fout = 'C:\newfile.csv')
    n = 65535

    def copylines(fin, fout, n):
    for i, line in enumerate(fin):
         fout.write(line)
         if i+1>=n: break

    Thanks

    B
    brianrpsgt1, Feb 13, 2009
    #3
  4. brianrpsgt1

    redbaron Guest

    > New to python.... I have a large file that I need to break up into
    > multiple smaller files. I need to break the large file into sections
    > where there are 65535 lines and then write those sections to seperate
    > files.


    If your lines are variable-length, then look at itertools recipes.

    from itertools import izip_longest

    def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

    with open("/file","r") as f:
    for lines in grouper(65535,f,""):
    data_to_write = '\n'.join(lines).rstrip("\n")
    ...
    <write data where you need it here>
    ...
    redbaron, Feb 13, 2009
    #4
  5. En Fri, 13 Feb 2009 05:43:02 -0200, brianrpsgt1 <>
    escribió:

    > On Feb 12, 11:02 pm, "Gabriel Genellina" <>
    > wrote:
    >> En Fri, 13 Feb 2009 04:44:54 -0200, brianrpsgt1 <>  
    >> escribió:
    >>
    >> > New to python.... I have a large file that I need to break upinto
    >> > multiple smallerfiles. I need to break the large fileintosections
    >> > where there are 65535 lines and then write thosesectionsto seperate
    >> >files.  I am familiar with opening and writingfiles, however, I am
    >> > struggling with creating thesectionsand writing the different
    >> >sectionsto their ownfiles.

    >>
    >> This function copies at most n lines from fin to fout:
    >>
    >> def copylines(fin, fout, n):
    >>    for i, line in enumerate(fin):
    >>      fout.write(line)
    >>      if i+1>=n: break
    >>
    >> Now you have to open the source file, create newfilesas needed and  
    >> repeatedly call the above function until the end of source file. You'll
    >>  
    >> have to enhace it bit, to know whether there are remaining lines or not.
    >>
    >> --
    >> Gabriel Genellina

    >
    > Gabriel ::
    >
    > Thanks for the direction. Do I simply define fin, fout and n as
    > variables above the def copylines(fin, fout, n): line?
    >
    > Would it look like this?
    >
    > fin = open('C:\Path\file')
    > fout = 'C:\newfile.csv')
    > n = 65535


    Warning: see this FAQ entry
    http://www.python.org/doc/faq/general/#why-can-t-raw-strings-r-strings-end-with-a-backslash
    >
    > def copylines(fin, fout, n):
    > for i, line in enumerate(fin):
    >      fout.write(line)
    >      if i+1>=n: break


    Almost. You have to *call* the copylines function, not just define it.
    After calling it with: copylines(fin, fout, 65535), you'll have the
    *first* chunk of lines copied. So you'll need to create a second file,
    call the copylines function again, create a third file, call... You'll
    need some kind of loop, and a way to detect when to stop and break out of
    it.

    The copylines function already knows what happened (whether there are more
    lines to copy or not) so you should enhace it and return such information
    to the caller.

    It isn't so hard, after working out the tutorial (linked from
    http://wiki.python.org/moin/BeginnersGuide ) you'll know enough Python to
    finish this program.

    If you have some previous programming experience, Dive into Python (linked
    from the Beginners Guide above) is a good online book.

    Feel free to come back when you're stuck with something.

    --
    Gabriel Genellina
    Gabriel Genellina, Feb 13, 2009
    #5
  6. brianrpsgt1

    Chris Guest

    On Feb 13, 10:02 am, redbaron <> wrote:
    > > New to python.... I have a large file that I need to break up into
    > > multiple smaller files. I need to break the large file into sections
    > > where there are 65535 lines and then write those sections to seperate
    > > files.

    >
    > If your lines are variable-length, then look at itertools recipes.
    >
    > from itertools import izip_longest
    >
    > def grouper(n, iterable, fillvalue=None):
    >     "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    >     args = [iter(iterable)] * n
    >     return izip_longest(fillvalue=fillvalue, *args)
    >
    > with open("/file","r") as f:
    >     for lines in grouper(65535,f,""):
    >         data_to_write = '\n'.join(lines).rstrip("\n")
    >         ...
    >         <write data where you need it here>
    >         ...


    I really would not recommend joining a large about of lines, that will
    take some times.

    fIn = open(input_filename, 'rb')
    chunk_size = 65535

    for i,line in enumerate(fIn):
    if not i: # First Line in the File, create a file to start
    writing to
    filenum = '%04d'%(i%chunk_size)+1
    fOut = open('%s.txt'%filenum, 'wb')
    if i and not i % chunk_size: # Once at the chunk_size close the
    old file object and create a new one
    fOut.close()
    filenum = '%04d'%(i%chunk_size)+1
    fOut = open('%s.txt'%filenum, 'wb')
    if not i % 1000:
    fOut.flush()
    fOut.write(line)

    fOut.close()
    fIn.close()
    Chris, Feb 13, 2009
    #6
  7. brianrpsgt1

    Chris Guest

    On Feb 13, 1:19 pm, Chris <> wrote:
    > On Feb 13, 10:02 am, redbaron <> wrote:
    >
    >
    >
    > > > New to python.... I have a large file that I need to break up into
    > > > multiple smaller files. I need to break the large file into sections
    > > > where there are 65535 lines and then write those sections to seperate
    > > > files.

    >
    > > If your lines are variable-length, then look at itertools recipes.

    >
    > > from itertools import izip_longest

    >
    > > def grouper(n, iterable, fillvalue=None):
    > >     "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    > >     args = [iter(iterable)] * n
    > >     return izip_longest(fillvalue=fillvalue, *args)

    >
    > > with open("/file","r") as f:
    > >     for lines in grouper(65535,f,""):
    > >         data_to_write = '\n'.join(lines).rstrip("\n")
    > >         ...
    > >         <write data where you need it here>
    > >         ...

    >
    > I really would not recommend joining a large about of lines, that will
    > take some times.
    >
    > fIn = open(input_filename, 'rb')
    > chunk_size = 65535
    >
    > for i,line in enumerate(fIn):
    >     if not i:   # First Line in the File, create a file to start
    > writing to
    >         filenum = '%04d'%(i%chunk_size)+1
    >         fOut = open('%s.txt'%filenum, 'wb')
    >     if i and not i % chunk_size:   # Once at the chunk_size close the
    > old file object and create a new one
    >         fOut.close()
    >         filenum = '%04d'%(i%chunk_size)+1
    >         fOut = open('%s.txt'%filenum, 'wb')
    >     if not i % 1000:
    >         fOut.flush()
    >     fOut.write(line)
    >
    > fOut.close()
    > fIn.close()


    Whoops, day-dreaming mistake. Use "filenum = '%04d'%(i/chunk_size)+1"
    and not i%chunk_size.
    Chris, Feb 13, 2009
    #7
  8. brianrpsgt1

    Tim Chase Guest

    > New to python.... I have a large file that I need to break up
    > into multiple smaller files. I need to break the large file
    > into sections where there are 65535 lines and then write those
    > sections to seperate files. I am familiar with opening and
    > writing files, however, I am struggling with creating the
    > sections and writing the different sections to their own
    > files.


    While this thread has offered many nice Python solutions, the
    "split" command is pretty standard on most Linux boxes:

    bash$ split -l 65535 infile.txt

    which will do what you describe. You can read the man page for
    more details.

    So if you just want a fast route to a goal rather than going
    through the learning process (I'm all for learning how to do it
    too), this may be a quick answer.

    -tkc
    Tim Chase, Feb 13, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    950
    M.E.Farmer
    Feb 13, 2005
  2. Replies:
    12
    Views:
    955
  3. Zeynel
    Replies:
    3
    Views:
    356
    Zeynel
    Nov 13, 2010
  4. Katie
    Replies:
    5
    Views:
    251
    Katie
    Sep 15, 2006
  5. Brian F.
    Replies:
    4
    Views:
    125
    James Willmore
    Nov 16, 2004
Loading...

Share This Page