Break large file down into smaller parts

Discussion in 'Perl Misc' started by Brian F., Nov 16, 2004.

  1. Brian F.

    Brian F. Guest

    Greets,

    I have a 2million+ line file that gets generated twice a day, and was
    wondering if there would be a way to read in the amount of lines and
    split the file into several (say 5) parts with different file names?

    so instead of having list.txt with 2 million lines, i'd end up with
    file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
    equal) amount of data from the original.

    Brian F.
     
    Brian F., Nov 16, 2004
    #1
    1. Advertising

  2. Brian F.

    Toni Erdmann Guest

    Brian F. wrote:
    > Greets,
    >
    > I have a 2million+ line file that gets generated twice a day, and was
    > wondering if there would be a way to read in the amount of lines and
    > split the file into several (say 5) parts with different file names?
    >
    > so instead of having list.txt with 2 million lines, i'd end up with
    > file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
    > equal) amount of data from the original.


    man split

    split --lines=NUMBER

    Toni
     
    Toni Erdmann, Nov 16, 2004
    #2
    1. Advertising

  3. If you are using unix or the like there is a command called split that will do
    it for you.
     
    Peter Hickman, Nov 16, 2004
    #3
  4. Brian F.

    Tore Aursand Guest

    On Tue, 16 Nov 2004 08:10:45 -0800, Brian F. wrote:
    > I have a 2million+ line file that gets generated twice a day, and was
    > wondering if there would be a way to read in the amount of lines and
    > split the file into several (say 5) parts with different file names?
    >
    > so instead of having list.txt with 2 million lines, i'd end up with
    > file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
    > equal) amount of data from the original.


    1. Count the number of lines in the file; 'perldoc -q lines'
    2. Decide on how many parts you want.
    3. Iterate through the file, opening, writing to and closing
    each file as appropriate.


    --
    Tore Aursand <>
    "A car is not the only thing that can be recalled by its maker."
    (Unknown)
     
    Tore Aursand, Nov 16, 2004
    #4
  5. On Tue, 16 Nov 2004 08:10:45 -0800, Brian F. wrote:

    > Greets,
    >
    > I have a 2million+ line file that gets generated twice a day, and was
    > wondering if there would be a way to read in the amount of lines and
    > split the file into several (say 5) parts with different file names?
    >
    > so instead of having list.txt with 2 million lines, i'd end up with
    > file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
    > equal) amount of data from the original.


    If the file you're reading isn't being written to as this script runs,
    then the example should do what you want. If the file you want to read
    *is* being written to while you're reading it, that opens up a whole host
    of other issues (like losing information while reading).

    (example - may need work)
    #!/usr/bin/perl

    use strict;
    use warnings;

    my $prefix_for_chunks = '/tmp/testing';
    my $chunk_count = 1;
    my $chunk_size = 100000;
    my $file_to_read = '/var/log/messages';

    open IN, $file_to_read or die "Can't open $file_to_read: $!\n";

    my $current_output_file = sprintf "%s%04d.txt", $prefix_for_chunks,
    $chunk_count;

    open OUT, '+>', $current_output_file
    or die "Can't open $current_output_file for writing: $!\n";

    while(<IN>) {
    if(!( $. % $chunk_size ) ){
    close OUT;
    $current_output_file = sprintf "%s%s.txt", $prefix_for_chunks,
    $chunk_count++;
    open OUT, '+>', $current_output_file
    or die "Can't open $current_output_file for writing: $!\n";
    }
    print OUT $_;
    }

    close IN;
    close OUT;

    =cut

    HTH

    Jim
     
    James Willmore, Nov 16, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?QUFPTVRpbQ==?=
    Replies:
    2
    Views:
    433
    =?Utf-8?B?QUFPTVRpbQ==?=
    Apr 21, 2006
  2. Rolf Welskes

    How to break a large web project in parts

    Rolf Welskes, Aug 28, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    380
    Steven Cheng[MSFT]
    Aug 29, 2006
  3. brianrpsgt1
    Replies:
    7
    Views:
    1,288
    Tim Chase
    Feb 13, 2009
  4. Armin
    Replies:
    12
    Views:
    1,118
    Steve Holden
    Mar 23, 2009
  5. Immortal Nephi

    Break class into smaller classes

    Immortal Nephi, Jun 29, 2009, in forum: C++
    Replies:
    4
    Views:
    559
    Jorgen Grahn
    Jul 2, 2009
Loading...

Share This Page