Split line into an array vs multiple strings

Discussion in 'Perl Misc' started by scottmf, Jul 27, 2005.

  1. scottmf

    scottmf Guest

    Can anyone explain why when I am reading in a file and saving the data
    to a 2-d array it is faster if I split each line into an array rather
    than a group of strings? Also why with each subsequent line I read in
    does it take longer to process with the strings, whereas with the array
    it takes the same amount of time for each line?

    Thanks,
    Scott

    #!/usr/local/bin/perl
    #
    use Benchmark;
    use strict;

    # Create Sample File (sample.txt) and Array (@sample)
    open(SAMPLE,'>sample.txt');
    for (my $i=0;$i<20000;$i++) {
    my $line = "abc"." "."def"." ".rand()." ".rand()." ".rand()."
    ".rand()." ".rand()."\n";
    print SAMPLE $line;
    }
    close(SAMPLE);

    # Count how long it takes to run each each version
    my $count = 10;
    timethese $count, {
    'string_test' => \&string_test,
    'array_test' => \&array_test
    };

    sub string_test{
    my @array;
    my $i;
    open(SAMPLE, "sample.txt");
    while(my $line = <SAMPLE>){
    chomp($line);
    my($el1, $el2, $el3, $el4, $el5, $el6, $el7) = split/\s+/,$line;
    $array[$i][0] = $el1;
    $array[$i][1] = $el2;
    $array[$i][2] = $el3;
    $array[$i][3] = $el4;
    $array[$i][4] = $el5;
    $array[$i][5] = $el6;
    $array[$i][6] = $el7;
    $i++;
    }
    close(SAMPLE);
    }

    sub array_test{
    my @array;
    my $i;
    open(SAMPLE, "sample.txt");
    while(my $line = <SAMPLE>){
    chomp($line);
    my @line_data = split/\s+/, $line;
    $array[$i][0] = $line_data[0];
    $array[$i][1] = $line_data[1];
    $array[$i][2] = $line_data[2];
    $array[$i][3] = $line_data[3];
    $array[$i][4] = $line_data[4];
    $array[$i][5] = $line_data[5];
    $array[$i][6] = $line_data[6];
    $i++;
    }
    close(SAMPLE);
    }


    returns:

    Benchmark: timing 10 iterations of array_test, string_test...
    array_test: 4 wallclock secs ( 4.30 usr + 0.00 sys = 4.30 CPU) @
    2.33/s (n=10)
    string_test: 18 wallclock secs (18.00 usr + 0.00 sys = 18.00 CPU) @
    0.56/s (n=10)
     
    scottmf, Jul 27, 2005
    #1
    1. Advertising

  2. scottmf wrote:
    > Can anyone explain why when I am reading in a file and saving the data
    > to a 2-d array it is faster if I split each line into an array rather
    > than a group of strings?


    I can't explain it because on my computer the "string" version runs faster.

    > Also why with each subsequent line I read in
    > does it take longer to process with the strings, whereas with the array
    > it takes the same amount of time for each line?
    >
    >
    > #!/usr/local/bin/perl
    > #
    > use Benchmark;
    > use strict;
    >
    > # Create Sample File (sample.txt) and Array (@sample)
    > open(SAMPLE,'>sample.txt');
    > for (my $i=0;$i<20000;$i++) {
    > my $line = "abc"." "."def"." ".rand()." ".rand()." ".rand()."
    > ".rand()." ".rand()."\n";
    > print SAMPLE $line;
    > }
    > close(SAMPLE);
    >
    > # Count how long it takes to run each each version
    > my $count = 10;
    > timethese $count, {
    > 'string_test' => \&string_test,
    > 'array_test' => \&array_test
    > };
    >
    > sub string_test{
    > my @array;
    > my $i;
    > open(SAMPLE, "sample.txt");
    > while(my $line = <SAMPLE>){
    > chomp($line);
    > my($el1, $el2, $el3, $el4, $el5, $el6, $el7) = split/\s+/,$line;
    > $array[$i][0] = $el1;
    > $array[$i][1] = $el2;
    > $array[$i][2] = $el3;
    > $array[$i][3] = $el4;
    > $array[$i][4] = $el5;
    > $array[$i][5] = $el6;
    > $array[$i][6] = $el7;
    > $i++;
    > }
    > close(SAMPLE);
    > }


    The usual way to do something like that in perl is:

    sub some_test {
    my @array;
    open SAMPLE, '<', 'sample.txt' or die "Cannot open 'sample.txt' $!";
    while ( <SAMPLE> ) {
    push @array, [ split ];
    }
    close SAMPLE;
    }

    Which is a bit faster then your two examples.

    And if you need to limit it to only the first seven fields:

    sub some_test {
    my @array;
    open SAMPLE, '<', 'sample.txt' or die "Cannot open 'sample.txt' $!";
    while ( <SAMPLE> ) {
    push @array, [ ( split )[ 0 .. 6 ] ];
    }
    close SAMPLE;
    }





    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Jul 28, 2005
    #2
    1. Advertising

  3. scottmf

    scottmf Guest

    I ran some more tests starting with an input file of 10000 lines, and
    increasing the filesize by 10000 lines for each benchmark, and I get
    the following.
    At this rate if my input file had 80000 lines it would take the string
    method almost 30 times longer than the array method to just grab the
    data. Also does anyone know why in the benchmark comparison the first
    column changes from iterations per second to seconds per iteration?

    Benchmark: timing 10 iterations of array_test, string_test...
    array_test: 2 wallclock secs ( 2.09 usr + 0.02 sys = 2.11 CPU) @
    4.74/s (n=10)
    string_test: 6 wallclock secs ( 5.17 usr + 0.01 sys = 5.19 CPU) @
    1.93/s (n=10)
    Rate string_test array_test
    string_test 1.93/s -- -59%
    array_test 4.74/s 146% --
    Benchmark: timing 10 iterations of array_test, string_test...
    array_test: 4 wallclock secs ( 4.20 usr + 0.03 sys = 4.23 CPU) @
    2.36/s (n=10)
    string_test: 17 wallclock secs (16.52 usr + 0.02 sys = 16.53 CPU) @
    0.60/s (n=10)
    s/iter string_test array_test
    string_test 1.65 -- -74%
    array_test 0.423 290% --
    Benchmark: timing 10 iterations of array_test, string_test...
    array_test: 6 wallclock secs ( 6.31 usr + 0.02 sys = 6.33 CPU) @
    1.58/s (n=10)
    string_test: 39 wallclock secs (39.33 usr + 0.11 sys = 39.44 CPU) @
    0.25/s (n=10)
    s/iter string_test array_test
    string_test 3.94 -- -84%
    array_test 0.633 523% --
    Benchmark: timing 10 iterations of array_test, string_test...
    array_test: 8 wallclock secs ( 8.39 usr + 0.03 sys = 8.42 CPU) @
    1.19/s (n=10)
    string_test: 84 wallclock secs (83.25 usr + 0.05 sys = 83.30 CPU) @
    0.12/s (n=10)
    s/iter string_test array_test
    string_test 8.33 -- -90%
    array_test 0.842 889% --
     
    scottmf, Jul 28, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    767
    Malcolm
    Jun 24, 2006
  2. Eric
    Replies:
    2
    Views:
    118
  3. Marek Stepanek
    Replies:
    12
    Views:
    416
    Peter J. Holzer
    Sep 2, 2006
  4. Kurt Mueller
    Replies:
    0
    Views:
    99
    Kurt Mueller
    Aug 28, 2013
  5. Dave Angel
    Replies:
    7
    Views:
    173
    Kurt Mueller
    Sep 5, 2013
Loading...

Share This Page