Split line into an array vs multiple strings

scottmf · Jul 27, 2005

Can anyone explain why when I am reading in a file and saving the data
to a 2-d array it is faster if I split each line into an array rather
than a group of strings? Also why with each subsequent line I read in
does it take longer to process with the strings, whereas with the array
it takes the same amount of time for each line?

Thanks,
Scott

#!/usr/local/bin/perl
#
use Benchmark;
use strict;

# Create Sample File (sample.txt) and Array (@sample)
open(SAMPLE,'>sample.txt');
for (my $i=0;$i<20000;$i++) {
my $line = "abc"." "."def"." ".rand()." ".rand()." ".rand()."
".rand()." ".rand()."\n";
print SAMPLE $line;
}
close(SAMPLE);

# Count how long it takes to run each each version
my $count = 10;
timethese $count, {
'string_test' => \&string_test,
'array_test' => \&array_test
};

sub string_test{
my @array;
my $i;
open(SAMPLE, "sample.txt");
while(my $line = <SAMPLE>){
chomp($line);
my($el1, $el2, $el3, $el4, $el5, $el6, $el7) = split/\s+/,$line;
$array[$i][0] = $el1;
$array[$i][1] = $el2;
$array[$i][2] = $el3;
$array[$i][3] = $el4;
$array[$i][4] = $el5;
$array[$i][5] = $el6;
$array[$i][6] = $el7;
$i++;
}
close(SAMPLE);
}

sub array_test{
my @array;
my $i;
open(SAMPLE, "sample.txt");
while(my $line = <SAMPLE>){
chomp($line);
my @line_data = split/\s+/, $line;
$array[$i][0] = $line_data[0];
$array[$i][1] = $line_data[1];
$array[$i][2] = $line_data[2];
$array[$i][3] = $line_data[3];
$array[$i][4] = $line_data[4];
$array[$i][5] = $line_data[5];
$array[$i][6] = $line_data[6];
$i++;
}
close(SAMPLE);
}

returns:

Benchmark: timing 10 iterations of array_test, string_test...
array_test: 4 wallclock secs ( 4.30 usr + 0.00 sys = 4.30 CPU) @
2.33/s (n=10)
string_test: 18 wallclock secs (18.00 usr + 0.00 sys = 18.00 CPU) @
0.56/s (n=10)

John W. Krahn · Jul 28, 2005

scottmf said:
Can anyone explain why when I am reading in a file and saving the data
to a 2-d array it is faster if I split each line into an array rather
than a group of strings?

I can't explain it because on my computer the "string" version runs faster.

Also why with each subsequent line I read in
does it take longer to process with the strings, whereas with the array
it takes the same amount of time for each line?

#!/usr/local/bin/perl
#
use Benchmark;
use strict;

# Create Sample File (sample.txt) and Array (@sample)
open(SAMPLE,'>sample.txt');
for (my $i=0;$i<20000;$i++) {
my $line = "abc"." "."def"." ".rand()." ".rand()." ".rand()."
".rand()." ".rand()."\n";
print SAMPLE $line;
}
close(SAMPLE);

# Count how long it takes to run each each version
my $count = 10;
timethese $count, {
'string_test' => \&string_test,
'array_test' => \&array_test
};

sub string_test{
my @array;
my $i;
open(SAMPLE, "sample.txt");
while(my $line = <SAMPLE>){
chomp($line);
my($el1, $el2, $el3, $el4, $el5, $el6, $el7) = split/\s+/,$line;
$array[$i][0] = $el1;
$array[$i][1] = $el2;
$array[$i][2] = $el3;
$array[$i][3] = $el4;
$array[$i][4] = $el5;
$array[$i][5] = $el6;
$array[$i][6] = $el7;
$i++;
}
close(SAMPLE);
}

The usual way to do something like that in perl is:

sub some_test {
my @array;
open SAMPLE, '<', 'sample.txt' or die "Cannot open 'sample.txt' $!";
while ( <SAMPLE> ) {
push @array, [ split ];
}
close SAMPLE;
}

Which is a bit faster then your two examples.

And if you need to limit it to only the first seven fields:

sub some_test {
my @array;
open SAMPLE, '<', 'sample.txt' or die "Cannot open 'sample.txt' $!";
while ( <SAMPLE> ) {
push @array, [ ( split )[ 0 .. 6 ] ];
}
close SAMPLE;
}

John

scottmf · Jul 28, 2005

I ran some more tests starting with an input file of 10000 lines, and
increasing the filesize by 10000 lines for each benchmark, and I get
the following.
At this rate if my input file had 80000 lines it would take the string
method almost 30 times longer than the array method to just grab the
data. Also does anyone know why in the benchmark comparison the first
column changes from iterations per second to seconds per iteration?

Benchmark: timing 10 iterations of array_test, string_test...
array_test: 2 wallclock secs ( 2.09 usr + 0.02 sys = 2.11 CPU) @
4.74/s (n=10)
string_test: 6 wallclock secs ( 5.17 usr + 0.01 sys = 5.19 CPU) @
1.93/s (n=10)
Rate string_test array_test
string_test 1.93/s -- -59%
array_test 4.74/s 146% --
Benchmark: timing 10 iterations of array_test, string_test...
array_test: 4 wallclock secs ( 4.20 usr + 0.03 sys = 4.23 CPU) @
2.36/s (n=10)
string_test: 17 wallclock secs (16.52 usr + 0.02 sys = 16.53 CPU) @
0.60/s (n=10)
s/iter string_test array_test
string_test 1.65 -- -74%
array_test 0.423 290% --
Benchmark: timing 10 iterations of array_test, string_test...
array_test: 6 wallclock secs ( 6.31 usr + 0.02 sys = 6.33 CPU) @
1.58/s (n=10)
string_test: 39 wallclock secs (39.33 usr + 0.11 sys = 39.44 CPU) @
0.25/s (n=10)
s/iter string_test array_test
string_test 3.94 -- -84%
array_test 0.633 523% --
Benchmark: timing 10 iterations of array_test, string_test...
array_test: 8 wallclock secs ( 8.39 usr + 0.03 sys = 8.42 CPU) @
1.19/s (n=10)
string_test: 84 wallclock secs (83.25 usr + 0.05 sys = 83.30 CPU) @
0.12/s (n=10)
s/iter string_test array_test
string_test 8.33 -- -90%
array_test 0.842 889% --

String split into an array	2	Mar 21, 2011
Split an element in an array into different fields using :	1	Dec 19, 2007
Multiple Line Extraction	3	Feb 6, 2007
Parsing an Array of Hashes	3	Sep 22, 2008
Multiple Line output using Win32::Printer	5	Jun 8, 2006
Making an array of arrays?	1	Mar 15, 2011
most elegant way to split text file randomly into n parts?	6	Dec 14, 2007
splitting an array into sub_arrays ...need advice	4	Mar 25, 2007

Split line into an array vs multiple strings

scottmf

John W. Krahn

scottmf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads