Split single file into multiple files based on patterns

Discussion in 'Python' started by satyam, Oct 24, 2012.

  1. satyam

    satyam Guest

    I have a text file like this

    A1980JE39300007 2732 4195 12.527000
    A1980JE39300007 3465 9720 22.000000
    A1980JE39300007 1853 3278 12.500000
    A1980JE39300007 2732 2732 187.500000
    A1980JE39300007 19 4688 3.619000
    A1980JE39300007 2995 9720 6.667000
    A1980JE39300007 1603 9720 30.000000
    A1980JE39300007 234 4195 42.416000
    A1980JE39300007 2732 9720 18.000000
    A1980KK18700010 130 303 4.985000
    A1980KK18700010 7 4915 0.435000
    A1980KK18700010 25 1620 1.722000
    A1980KK18700010 25 186 0.654000
    A1980KK18700010 50 130 3.199000
    A1980KK18700010 186 3366 4.780000
    A1980KK18700010 30 186 1.285000
    A1980KK18700010 30 185 4.395000
    A1980KK18700010 185 186 9.000000
    A1980KK18700010 25 30 3.493000

    I want to split the file and get multiple files like A1980JE39300007.txt and A1980KK18700010.txt, where each file will contain column2, 3 and 4.
    Thanks
    Satyam
     
    satyam, Oct 24, 2012
    #1
    1. Advertising

  2. On Tue, Oct 23, 2012 at 9:01 PM, satyam <> wrote:
    > I have a text file like this
    >
    > A1980JE39300007 2732 4195 12.527000
    > A1980JE39300007 3465 9720 22.000000
    > A1980JE39300007 1853 3278 12.500000
    > A1980JE39300007 2732 2732 187.500000
    > A1980JE39300007 19 4688 3.619000
    > A1980KK18700010 30 186 1.285000
    > A1980KK18700010 30 185 4.395000
    > A1980KK18700010 185 186 9.000000
    > A1980KK18700010 25 30 3.493000
    >
    > I want to split the file and get multiple files like A1980JE39300007.txt and A1980KK18700010.txt, where each file will contain column2, 3 and 4.


    Unless your source file is very large this should be sufficient:

    $ cat source
    A1980JE39300007 2732 4195 12.527000
    A1980JE39300007 3465 9720 22.000000
    A1980JE39300007 1853 3278 12.500000
    A1980JE39300007 2732 2732 187.500000
    A1980JE39300007 19 4688 3.619000
    A1980JE39300007 2995 9720 6.667000
    A1980JE39300007 1603 9720 30.000000
    A1980JE39300007 234 4195 42.416000
    A1980JE39300007 2732 9720 18.000000
    A1980KK18700010 130 303 4.985000
    A1980KK18700010 7 4915 0.435000
    A1980KK18700010 25 1620 1.722000
    A1980KK18700010 25 186 0.654000
    A1980KK18700010 50 130 3.199000
    A1980KK18700010 186 3366 4.780000
    A1980KK18700010 30 186 1.285000
    A1980KK18700010 30 185 4.395000
    A1980KK18700010 185 186 9.000000
    A1980KK18700010 25 30 3.493000

    $ python3
    Python 3.2.3 (default, Sep 10 2012, 18:14:40)
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> for line in open("source"):

    .... file_name, remainder = line.strip().split(None, 1)
    .... with open(file_name + ".txt", "a") as writer:
    .... print(remainder, file=writer)
    ....
    >>>


    $ ls *txt
    A1980JE39300007.txt A1980KK18700010.txt

    $ cat A1980JE39300007.txt
    2732 4195 12.527000
    3465 9720 22.000000
    1853 3278 12.500000
    2732 2732 187.500000
    19 4688 3.619000
    2995 9720 6.667000
    1603 9720 30.000000
    234 4195 42.416000
    2732 9720 18.000000
     
    Jason Friedman, Oct 24, 2012
    #2
    1. Advertising

  3. satyam

    David Hutto Guest

    On Tue, Oct 23, 2012 at 11:01 PM, satyam <> wrote:
    > I have a text file like this
    >
    > A1980JE39300007 2732 4195 12.527000
    > A1980JE39300007 3465 9720 22.000000
    > A1980JE39300007 1853 3278 12.500000
    > A1980JE39300007 2732 2732 187.500000
    > A1980JE39300007 19 4688 3.619000
    > A1980JE39300007 2995 9720 6.667000
    > A1980JE39300007 1603 9720 30.000000
    > A1980JE39300007 234 4195 42.416000
    > A1980JE39300007 2732 9720 18.000000
    > A1980KK18700010 130 303 4.985000
    > A1980KK18700010 7 4915 0.435000
    > A1980KK18700010 25 1620 1.722000
    > A1980KK18700010 25 186 0.654000
    > A1980KK18700010 50 130 3.199000
    > A1980KK18700010 186 3366 4.780000
    > A1980KK18700010 30 186 1.285000
    > A1980KK18700010 30 185 4.395000
    > A1980KK18700010 185 186 9.000000
    > A1980KK18700010 25 30 3.493000
    >
    > I want to split the file and get multiple files like A1980JE39300007.txt and A1980KK18700010.txt, where each file will contain column2, 3 and 4.
    > Thanks
    > Satyam





    #parse through the lines
    turn_text_to_txt = ['A1980JE39300007 2732 4195 12.527000',
    'A1980JE39300007 3465 9720 22.000000',
    'A1980JE39300007 1853 3278 12.500000',
    'A1980JE39300007 2732 2732 187.500000',
    'A1980JE39300007 19 4688 3.619000',
    'A1980KK18700010 30 186 1.285000',
    'A1980KK18700010 30 185 4.395000',
    'A1980KK18700010 185 186 9.000000',
    'A1980KK18700010 25 30 3.493000']
    #then split and open a file for writing to create the file

    #then start a count to add an extra number, because the files #you're
    opening have the same name in some, which will #cause python to
    overwrite the last file with that name.

    #So I added an extra integer count after an underscore to #keep all
    files, even if the have the first base number.

    count = 0

    for file_data in turn_text_to_txt:

    #open the file for writing in 'w' mode so it creates the file, and
    #adds in the appropriate data, including the extra count i#nteger just
    in case there are files with the same name.

    f = open('/home/david/files/%s_%s.txt' % (file_data.split(' ')[0], count), 'w')

    #write the data to the file, however this is in list format, I could
    go further, but need a little time for a few other things.

    f.write( str(file_data.split(' ')[1:]))

    #close the file
    f.close()

    #increment the count for the next iteration, if necessary, and #again,
    this is just in case the files have the same name, and #need an
    additive.
    # count += 1


    Full code from above, without comments:

    turn_text_to_txt = ['A1980JE39300007 2732 4195 12.527000',
    'A1980JE39300007 3465 9720 22.000000',
    'A1980JE39300007 1853 3278 12.500000',
    'A1980JE39300007 2732 2732 187.500000',
    'A1980JE39300007 19 4688 3.619000',
    'A1980KK18700010 30 186 1.285000',
    'A1980KK18700010 30 185 4.395000',
    'A1980KK18700010 185 186 9.000000',
    'A1980KK18700010 25 30 3.493000']
    #then split and open a file for writing to create the file
    count = 0

    for file_data in turn_text_to_txt:

    print '/home/david/files/%s.txt' % (file_data.split(' ')[0])

    f = open('/home/david/files/%s_%s.txt' % (file_data.split(' ')[0], count), 'w')

    f.write( str(file_data.split(' ')[1:]))

    f.close()

    count += 1




    --
    Best Regards,
    David Hutto
    CEO: http://www.hitwebdevelopment.com
     
    David Hutto, Oct 24, 2012
    #3
  4. On 2012-10-23, at 10:24 PM, David Hutto <> wrote:

    > count = 0

    Don't use count.

    > for file_data in turn_text_to_txt:

    Use enumerate:

    for count, file_data in enumerate(turn_text_to_txt):

    > f = open('/home/david/files/%s_%s.txt' % (file_data.split(' ')[0], count), 'w')

    Use with:

    with open('file path', 'w') as f:
    f.write('data')

    Not only is it shorter, but it automatically closes the file once you've come out of the inner block, whether successfully or erroneously.


    Demian Brecht
    @demianbrecht
    http://demianbrecht.github.com
     
    Demian Brecht, Oct 24, 2012
    #4
  5. On Tue, 23 Oct 2012 21:43:21 -0600, Jason Friedman <>
    declaimed the following in gmane.comp.python.general:

    > $ python3
    > Python 3.2.3 (default, Sep 10 2012, 18:14:40)
    > [GCC 4.6.3] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.
    > >>> for line in open("source"):

    > ... file_name, remainder = line.strip().split(None, 1)
    > ... with open(file_name + ".txt", "a") as writer:
    > ... print(remainder, file=writer)


    That's a lot of OS file open/closing operations...

    I'd be more likely to configure the code as a "standard" "report
    control break".

    control = None
    fin = open("source")
    for line in fin:
    newControl, data = line.split(None, 1) #leave new-line for output
    if control != newControl: #only open/close files on
    #change of control break
    if control:
    fout.close()
    fout = open(newControl + ".txt", "a")
    #I'd prefer using "w" IF the input is already sorted
    #that way one knows a new file is created on each run
    #instead of having to delete any existing files from
    #previous runs
    control = newControl
    fout.write(data)
    if control:
    fout.close()
    fin.close()

    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Oct 24, 2012
    #5
  6. satyam <> writes:

    > I have a text file like this
    >
    > A1980JE39300007 2732 4195 12.527000
    > A1980JE39300007 3465 9720 22.000000
    > A1980JE39300007 2732 9720 18.000000
    > A1980KK18700010 130 303 4.985000
    > A1980KK18700010 7 4915 0.435000

    [...]
    > I want to split the file and get multiple files like
    > A1980JE39300007.txt and A1980KK18700010.txt, where each file will
    > contain column2, 3 and 4.


    Sorry for being completely off-topic here, but awk has a very convenient
    feature to deal with this. Simply use:

    awk '{ print $2,$3,$4 > $1".txt"; }' /path/to/your/file

    -- Alain.
     
    Alain Ketterlin, Oct 24, 2012
    #6
  7. On 24/10/2012 06:46, Alain Ketterlin wrote:
    > satyam <> writes:
    >
    >> I have a text file like this
    >>
    >> A1980JE39300007 2732 4195 12.527000
    >> A1980JE39300007 3465 9720 22.000000
    >> A1980JE39300007 2732 9720 18.000000
    >> A1980KK18700010 130 303 4.985000
    >> A1980KK18700010 7 4915 0.435000

    > [...]
    >> I want to split the file and get multiple files like
    >> A1980JE39300007.txt and A1980KK18700010.txt, where each file will
    >> contain column2, 3 and 4.

    >
    > Sorry for being completely off-topic here, but awk has a very convenient
    > feature to deal with this. Simply use:
    >
    > awk '{ print $2,$3,$4 > $1".txt"; }' /path/to/your/file
    >
    > -- Alain.
    >


    Although practicality beats purity :)

    --
    Cheers.

    Mark Lawrence.
     
    Mark Lawrence, Oct 24, 2012
    #7
  8. On Tue, 23 Oct 2012 20:01:03 -0700, satyam wrote:

    > I have a text file like this
    >
    > A1980JE39300007 2732 4195 12.527000

    [...]

    > I want to split the file and get multiple files like A1980JE39300007.txt
    > and A1980KK18700010.txt, where each file will contain column2, 3 and 4.


    Are you just excited and want to tell everyone, or do you actually have a
    question?

    Have you tried to write some code, or do you just expect others to do
    your work for you?

    If so, I see that your expectation was correct.



    --
    Steven
     
    Steven D'Aprano, Oct 24, 2012
    #8
  9. satyam

    David Hutto Guest

    On Wed, Oct 24, 2012 at 3:52 AM, Steven D'Aprano
    <> wrote:
    > On Tue, 23 Oct 2012 20:01:03 -0700, satyam wrote:
    >
    >> I have a text file like this
    >>
    >> A1980JE39300007 2732 4195 12.527000

    > [...]
    >
    >> I want to split the file and get multiple files like A1980JE39300007.txt
    >> and A1980KK18700010.txt, where each file will contain column2, 3 and 4.

    >
    > Are you just excited and want to tell everyone, or do you actually have a
    > question?
    >
    > Have you tried to write some code, or do you just expect others to do
    > your work for you?
    >
    > If so, I see that your expectation was correct.
    >
    >
    >
    > --
    > Steven


    Some learn better with a full example, better than any small challenge
    that can be thrown in at certain times.

    I think it should be a little of both, especially if you (an
    algorithmitist for the OP)only have enough time to throw out untested
    pseudo code.

    --
    Best Regards,
    David Hutto
    CEO: http://www.hitwebdevelopment.com
     
    David Hutto, Oct 24, 2012
    #9
  10. satyam

    Peter Otten Guest

    satyam wrote:

    > I have a text file like this
    >
    > A1980JE39300007 2732 4195 12.527000
    > A1980JE39300007 3465 9720 22.000000
    > A1980JE39300007 1853 3278 12.500000
    > A1980JE39300007 2732 2732 187.500000
    > A1980JE39300007 19 4688 3.619000
    > A1980JE39300007 2995 9720 6.667000
    > A1980JE39300007 1603 9720 30.000000
    > A1980JE39300007 234 4195 42.416000
    > A1980JE39300007 2732 9720 18.000000
    > A1980KK18700010 130 303 4.985000
    > A1980KK18700010 7 4915 0.435000
    > A1980KK18700010 25 1620 1.722000
    > A1980KK18700010 25 186 0.654000
    > A1980KK18700010 50 130 3.199000
    > A1980KK18700010 186 3366 4.780000
    > A1980KK18700010 30 186 1.285000
    > A1980KK18700010 30 185 4.395000
    > A1980KK18700010 185 186 9.000000
    > A1980KK18700010 25 30 3.493000
    >
    > I want to split the file and get multiple files like A1980JE39300007.txt
    > and A1980KK18700010.txt, where each file will contain column2, 3 and 4.
    > Thanks Satyam


    import os
    from itertools import groupby
    from operator import itemgetter

    get_key = itemgetter(0)
    get_value = itemgetter(1)

    output_folder = "tmp"
    with open("infile.txt") as instream:
    pairs = (line.split(None, 1) for line in instream)
    for key, group in groupby(pairs, key=get_key):
    path = os.path.join(output_folder, key + ".txt")
    with open(path, "a") as outstream:
    outstream.writelines(get_value(line) for line in group)

    If you are running the code more than once make sure that you remove the
    files from the previous run first.
     
    Peter Otten, Oct 24, 2012
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. crichmon
    Replies:
    4
    Views:
    509
    Mabden
    Jul 7, 2004
  2. Replies:
    4
    Views:
    998
    M.E.Farmer
    Feb 13, 2005
  3. Kevin
    Replies:
    1
    Views:
    554
    dorayme
    Apr 15, 2007
  4. Cedric Vicenti
    Replies:
    5
    Views:
    483
    Thomas Adam
    Oct 21, 2007
  5. ela
    Replies:
    12
    Views:
    379
    Uri Guttman
    Apr 6, 2009
Loading...

Share This Page