Performance question

Discussion in 'Perl Misc' started by Dave Sill, Mar 29, 2005.

  1. Dave Sill

    Dave Sill Guest

    One of my users made the following observation. I'm only an
    occasional, lightweight Perl user, so I can't explain what he's
    seeing. Can anyone shed some light on it? H/W is a pretty large/fast
    Dell server running RHEL 3.

    ----
    I manufactured a 401x401 [ linearly =160801 element] array [@judy]
    each element having string values like

    01000000001110000000000000000001

    I needed to make a comma delimited ascii file of this data.

    I decided a single IO write of a string would be the fastest, so i
    made a string
    $str="";
    foreach $i(0..$#judy-1)
    {
    $str=$str."$judy[$i],"
    }
    $str=$str."$judy[$#judy]"; open(OUT,">$output_file");print OUT
    $str;close(OUT);
    `gzip -f $output_file`;

    this took 16 minutes.

    i tried it the slow way,

    open(OUT,">$output_file");
    foreach $i(0..$#judy-1)
    {
    print OUT "$judy[$i],";
    }
    print OUT "$judy[$#judy]"; close(OUT);
    `gzip -f $output_file`;

    with 160K IOs, this took about 3 seconds.

    the gz files were different, but diff said uncompressed they were the
    same.
    ----

    Thanks.

    --
    Dave Sill Oak Ridge National Lab, Workstation Support
    Author, The qmail Handbook <http://web.infoave.net/~dsill>
    <http://lifewithqmail.org/>: Almost everything you always wanted to know.
    Dave Sill, Mar 29, 2005
    #1
    1. Advertising

  2. Dave Sill

    Paul Lalli Guest

    Dave Sill wrote:
    > One of my users made the following observation. I'm only an
    > occasional, lightweight Perl user, so I can't explain what he's
    > seeing. Can anyone shed some light on it? H/W is a pretty large/fast
    > Dell server running RHEL 3.
    >
    > ----
    > I manufactured a 401x401 [ linearly =160801 element] array [@judy]
    > each element having string values like
    >
    > 01000000001110000000000000000001
    >
    > I needed to make a comma delimited ascii file of this data.
    >
    > I decided a single IO write of a string would be the fastest, so i
    > made a string
    > $str="";
    > foreach $i(0..$#judy-1)
    > {
    > $str=$str."$judy[$i],"
    > }
    > $str=$str."$judy[$#judy]"; open(OUT,">$output_file");print OUT
    > $str;close(OUT);
    > `gzip -f $output_file`;
    >
    > this took 16 minutes.
    >
    > i tried it the slow way,
    >
    > open(OUT,">$output_file");
    > foreach $i(0..$#judy-1)
    > {
    > print OUT "$judy[$i],";
    > }
    > print OUT "$judy[$#judy]"; close(OUT);
    > `gzip -f $output_file`;
    >
    > with 160K IOs, this took about 3 seconds.
    >
    > the gz files were different, but diff said uncompressed they were the
    > same.


    Your user has an odd definition of "faster" and "slower". I don't know
    what would make the user think that storing the entire 160,801 element
    array in memory TWICE would be faster than just printing what's needed
    when it's needed.

    In the first algorithm, the user is storing one large string, and each
    time through the loop, appending to that string. Towards the end, this
    means storing over (160,000 x 32) bytes in a single scalar, and asking
    perl to append to the end of that string. Then finally you ask perl to
    make one absurdly large I/O access.

    In the second algorithm, you're simply printing 32 bytes repeatedly.

    Neither of those ways are especially good perl code, of course. The
    first would be better written:

    my $str = join (',', @judy);
    open my $out, '>', $output_file or die "Can't open output: $!";
    print $out $str;
    close $out;

    The second would be better written

    open my $out, '>', $output_file or die "Can't open output: $!";
    {
    local $, = ',';
    print $out @judy;
    }
    close $out;

    I would suggest your user use the Benchmark module to determine which of
    these is actually faster.

    Paul Lalli
    Paul Lalli, Mar 29, 2005
    #2
    1. Advertising

  3. Dave Sill

    Guest

    Dave Sill <> wrote:
    > One of my users made the following observation. I'm only an
    > occasional, lightweight Perl user, so I can't explain what he's
    > seeing. Can anyone shed some light on it? H/W is a pretty large/fast
    > Dell server running RHEL 3.
    >
    > ----
    > I manufactured a 401x401 [ linearly =160801 element] array [@judy]
    > each element having string values like
    >
    > 01000000001110000000000000000001
    >
    > I needed to make a comma delimited ascii file of this data.
    >
    > I decided a single IO write of a string would be the fastest, so i
    > made a string
    > $str="";
    > foreach $i(0..$#judy-1)
    > {
    > $str=$str."$judy[$i],"


    This has to copy the contents of $str (which towards the end is quite
    huge) each time through the loop. Maybe even twice. Using:

    $str.="judy[$i],";

    is tremendously faster, because it just tacks something onto the end of the
    string when possible, rather than copying the entire string each time. (I
    would have thought perl would have optimized the first into the second, but
    apparently it doesn't. Maybe such optimization would cause overloading to
    break.)


    > }
    > $str=$str."$judy[$#judy]";


    Of course, I do have to wonder why you just don't use
    $str = join ",", @judy;

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Mar 29, 2005
    #3
  4. Dave Sill

    Guest

    Paul Lalli <> wrote:
    >
    > In the first algorithm, the user is storing one large string, and each
    > time through the loop, appending to that string. Towards the end, this
    > means storing over (160,000 x 32) bytes in a single scalar, and asking
    > perl to append to the end of that string.


    If he were doing that, it wouldn't be so bad. Perl is pretty good at
    handling that. But he isn't asking Perl to append to the end of that
    string, but rather to copy that string and then append to the end of that
    copy.


    > Then finally you ask perl to
    > make one absurdly large I/O access.


    There is nothing absurd about the size of the I/O.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Mar 29, 2005
    #4
  5. Dave Sill

    Guest

    On Tue, 29 Mar 2005 13:25:59 -0500, Dave Sill
    <> wrote:

    >One of my users made the following observation. I'm only an
    >occasional, lightweight Perl user, so I can't explain what he's
    >seeing. Can anyone shed some light on it? H/W is a pretty large/fast
    >Dell server running RHEL 3.
    >
    >----
    >I manufactured a 401x401 [ linearly =160801 element] array [@judy]
    >each element having string values like
    >

    <snip>

    I won't be critical of anything beyond this point in your description.
    The fact of the matter is that the idea of pre-defining, allocating
    large, multi-dimensional arrays are strictly mental masturbation
    of college professors that have nothing at all to do with real-
    world programming !!

    If the balck box idea is to get data, perform an operation on
    it, then put the results somewhere, then this is done on the
    micro level -- not the macro level.....
    Imagine a cpu holding an entire "exe" in its cache before it
    is written to memory and executed.
    , Apr 1, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Don Beal
    Replies:
    13
    Views:
    840
    Richard Grimes [MVP]
    Sep 29, 2003
  2. jm
    Replies:
    1
    Views:
    508
    alien2_51
    Dec 12, 2003
  3. Cris Rock

    Performance related Question.....

    Cris Rock, Feb 12, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    312
    Stefano Mostarda
    Feb 12, 2004
  4. cjl
    Replies:
    3
    Views:
    987
    John Nagle
    May 21, 2007
  5. Software Engineer
    Replies:
    0
    Views:
    329
    Software Engineer
    Jun 10, 2011
Loading...

Share This Page