Break large file down into smaller parts

B

Brian F.

Greets,

I have a 2million+ line file that gets generated twice a day, and was
wondering if there would be a way to read in the amount of lines and
split the file into several (say 5) parts with different file names?

so instead of having list.txt with 2 million lines, i'd end up with
file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
equal) amount of data from the original.

Brian F.
 
T

Toni Erdmann

Brian said:
Greets,

I have a 2million+ line file that gets generated twice a day, and was
wondering if there would be a way to read in the amount of lines and
split the file into several (say 5) parts with different file names?

so instead of having list.txt with 2 million lines, i'd end up with
file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
equal) amount of data from the original.

man split

split --lines=NUMBER

Toni
 
P

Peter Hickman

If you are using unix or the like there is a command called split that will do
it for you.
 
T

Tore Aursand

I have a 2million+ line file that gets generated twice a day, and was
wondering if there would be a way to read in the amount of lines and
split the file into several (say 5) parts with different file names?

so instead of having list.txt with 2 million lines, i'd end up with
file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
equal) amount of data from the original.

1. Count the number of lines in the file; 'perldoc -q lines'
2. Decide on how many parts you want.
3. Iterate through the file, opening, writing to and closing
each file as appropriate.
 
J

James Willmore

Greets,

I have a 2million+ line file that gets generated twice a day, and was
wondering if there would be a way to read in the amount of lines and
split the file into several (say 5) parts with different file names?

so instead of having list.txt with 2 million lines, i'd end up with
file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
equal) amount of data from the original.

If the file you're reading isn't being written to as this script runs,
then the example should do what you want. If the file you want to read
*is* being written to while you're reading it, that opens up a whole host
of other issues (like losing information while reading).

(example - may need work)
#!/usr/bin/perl

use strict;
use warnings;

my $prefix_for_chunks = '/tmp/testing';
my $chunk_count = 1;
my $chunk_size = 100000;
my $file_to_read = '/var/log/messages';

open IN, $file_to_read or die "Can't open $file_to_read: $!\n";

my $current_output_file = sprintf "%s%04d.txt", $prefix_for_chunks,
$chunk_count;

open OUT, '+>', $current_output_file
or die "Can't open $current_output_file for writing: $!\n";

while(<IN>) {
if(!( $. % $chunk_size ) ){
close OUT;
$current_output_file = sprintf "%s%s.txt", $prefix_for_chunks,
$chunk_count++;
open OUT, '+>', $current_output_file
or die "Can't open $current_output_file for writing: $!\n";
}
print OUT $_;
}

close IN;
close OUT;

=cut

HTH

Jim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top