m// on very long lines leaks memory

Discussion in 'Perl Misc' started by ShaunJ, Mar 13, 2008.

  1. ShaunJ

    ShaunJ Guest

    The following snippet leaks memory until it breaks and falls down when
    m// is used on a very long line. It works fine if the line lengths are
    short. Try
    ../test.pl /usr/share/dict/words /usr/share/dict/words
    Depending on your dictionary, you'll see that compiling the regex
    takes about 200 MB. However the following matching loop leaks memory
    at an alarming rate. Start up `top` and watch it run. I'm using Perl
    5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
    or deny this behaviour for other architectures or version of Perl,
    that would be interesting too.

    Cheers,
    Shaun

    #!/usr/bin/perl
    use strict;
    use English;
    open REFILE, '<' . shift;
    chomp (my @restrings = <REFILE>);
    close REFILE;
    my @re = map { qr/$_/ } @restrings;

    open TEXTFILE, '<' . shift;
    chomp (my @text = <TEXTFILE>);
    close TEXTFILE;
    my $text = join '', @text;

    foreach my $re (@re) {
    if ($text =~ m/$re/) {
    print $LAST_MATCH_START[0], "\n";
    }
    }
    ShaunJ, Mar 13, 2008
    #1
    1. Advertising

  2. ShaunJ wrote:
    > The following snippet leaks memory until it breaks and falls down when
    > m// is used on a very long line. It works fine if the line lengths are
    > short. Try
    > ./test.pl /usr/share/dict/words /usr/share/dict/words
    > Depending on your dictionary, you'll see that compiling the regex
    > takes about 200 MB. However the following matching loop leaks memory
    > at an alarming rate. Start up `top` and watch it run. I'm using Perl
    > 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
    > or deny this behaviour for other architectures or version of Perl,
    > that would be interesting too.
    >
    > Cheers,
    > Shaun
    >
    > #!/usr/bin/perl
    > use strict;
    > use English;
    > open REFILE, '<' . shift;
    > chomp (my @restrings = <REFILE>);
    > close REFILE;
    > my @re = map { qr/$_/ } @restrings;
    >
    > open TEXTFILE, '<' . shift;
    > chomp (my @text = <TEXTFILE>);
    > close TEXTFILE;
    > my $text = join '', @text;
    >
    > foreach my $re (@re) {
    > if ($text =~ m/$re/) {
    > print $LAST_MATCH_START[0], "\n";
    > }
    > }


    I tested it and if I remove the English module it works fine.
    (So don't use English.pm!)



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Mar 13, 2008
    #2
    1. Advertising

  3. John W. Krahn wrote:
    > ShaunJ wrote:
    >> The following snippet leaks memory until it breaks and falls down when
    >> m// is used on a very long line. It works fine if the line lengths are
    >> short. Try
    >> ./test.pl /usr/share/dict/words /usr/share/dict/words
    >> Depending on your dictionary, you'll see that compiling the regex
    >> takes about 200 MB. However the following matching loop leaks memory
    >> at an alarming rate. Start up `top` and watch it run. I'm using Perl
    >> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
    >> or deny this behaviour for other architectures or version of Perl,
    >> that would be interesting too.
    >>
    >> Cheers,
    >> Shaun
    >>
    >> #!/usr/bin/perl
    >> use strict;
    >> use English;
    >> open REFILE, '<' . shift;
    >> chomp (my @restrings = <REFILE>);
    >> close REFILE;
    >> my @re = map { qr/$_/ } @restrings;
    >>
    >> open TEXTFILE, '<' . shift;
    >> chomp (my @text = <TEXTFILE>);
    >> close TEXTFILE;
    >> my $text = join '', @text;
    >>
    >> foreach my $re (@re) {
    >> if ($text =~ m/$re/) {
    >> print $LAST_MATCH_START[0], "\n";
    >> }
    >> }

    >
    > I tested it and if I remove the English module it works fine.
    > (So don't use English.pm!)


    Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:

    use English qw( -no_match_vars );



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Mar 13, 2008
    #3
  4. ShaunJ

    Guest

    ShaunJ <> wrote:
    > The following snippet leaks memory until it breaks and falls down when
    > m// is used on a very long line. It works fine if the line lengths are
    > short. Try
    > ./test.pl /usr/share/dict/words /usr/share/dict/words
    > Depending on your dictionary, you'll see that compiling the regex
    > takes about 200 MB. However the following matching loop leaks memory
    > at an alarming rate. Start up `top` and watch it run. I'm using Perl
    > 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
    > or deny this behaviour for other architectures or version of Perl,
    > that would be interesting too.


    Technically, this does not seem to be a leak. If I throw in infinite
    loop around your foreach my $re (@re) loop, then memory only grows
    up to 15.5Gig when the inner loop completes. Upon the next iteration of
    the outer loop, memory stops growing. So it seems like it is an
    inefficiency rather than a leak. With idle speculation, I'd say that each
    $re maintains some kind of independent state, that that state is
    proportional to the size of the string it was last used on, and that that
    storage is reused next time that $re gets invoked, but not before then.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Mar 13, 2008
    #4
  5. ShaunJ

    ShaunJ Guest

    On Mar 13, 2:53 pm, "John W. Krahn" <> wrote:
    ....
    > > I tested it and if I remove the English module it works fine.
    > > (So don't use English.pm!)

    >
    > Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:
    >
    > use English qw( -no_match_vars );


    Wow, thanks! If I use either English.pm or $& (even without
    English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
    10.4.11). If I use neither English.pm or $& it works fine.

    If I use Perl 5.10.0 built from source it works for every case.

    Cheers,
    Shaun
    ShaunJ, Mar 13, 2008
    #5
  6. ShaunJ

    Uri Guttman Guest

    >>>>> "S" == ShaunJ <> writes:

    S> On Mar 13, 2:53 pm, "John W. Krahn" <> wrote:
    S> ...
    >> > I tested it and if I remove the English module it works fine.
    >> > (So don't use English.pm!)

    >>
    >> Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:
    >>
    >> use English qw( -no_match_vars );


    S> Wow, thanks! If I use either English.pm or $& (even without
    S> English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
    S> 10.4.11). If I use neither English.pm or $& it works fine.

    i was going to mention that but didn't want to get into this thread. $&
    (which is used in english.pm without that option) is a known memory hog
    (not a leak). since $& is global it must copy the entire match string
    for each regex in case it might be used later anywhere in the
    program. this is a well known issue and you should google for more about
    it or find the points in perldoc perlvar.

    S> If I use Perl 5.10.0 built from source it works for every case.

    they seem to have fixed this problem (partially from what i heard but i
    could be wrong) in 5.10. i still recommend never using $& and no one who
    knows perl uses english.pm.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Architecture, Development, Training, Support, Code Review ------
    ----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Mar 13, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. shanx__=|;-

    very very very long integer

    shanx__=|;-, Oct 16, 2004, in forum: C Programming
    Replies:
    19
    Views:
    1,613
    Merrill & Michele
    Oct 19, 2004
  2. Abhishek Jha

    very very very long integer

    Abhishek Jha, Oct 16, 2004, in forum: C Programming
    Replies:
    4
    Views:
    417
    jacob navia
    Oct 17, 2004
  3. McGregor
    Replies:
    2
    Views:
    1,650
    Tom Anderson
    Jan 29, 2009
  4. Toby DiPasquale
    Replies:
    4
    Views:
    218
    Booker C. Bense
    Mar 17, 2006
  5. gry
    Replies:
    15
    Views:
    143
    Chris Angelico
    Apr 11, 2013
Loading...

Share This Page