Need workaround for regex bug in 5.8.6

Discussion in 'Perl Misc' started by James Marshall, Nov 9, 2005.

  1. I found a weird bug in Perl 5.8.6: If a variable in a CGI script (only)
    is long enough, the script dies when it matches the variable against the
    pattern /(.|ab)*/ . The critical length seems to vary by machine, or even
    by data size or other environmental conditions-- memory or heap problem,
    maybe? Here's an NPH CGI script that demonstrates the bug on my machine:

    ---------------------------------------

    #!/usr/bin/perl

    use strict ;

    my($s)= ' ' x 15881 ; # 15880 is fine, but 15881 crashes
    $s=~ /(.|ab)*/ ; # dies here with no warning
    &HTTPdie('got here') ; # never gets here


    # Die, outputting full HTTP response.
    sub HTTPdie {
    my($msg)= @_ ;

    print <<EOF ;
    HTTP/1.0 200 OK
    Cache-Control: no-cache
    Pragma: no-cache
    Content-Type: text/plain

    $msg
    EOF

    exit ;
    }

    ---------------------------------------

    This bug doesn't happen if the script is run from the command line, no
    matter how large $s is.

    Have you seen this bug, and if so do you know a good workaround? Do you
    know if it's fixed in 5.8.7? Even if so, I'd like a workaround for 5.8.6,
    since the software will be used in many environments where the user has no
    control over the Perl version.

    I'm running this on Linux, kernel 2.6.11 (SuSE 9.3).

    If it helps, running this script with Perl 5.8.4 results in a segmentation
    fault, even when run from the command line. (The critical length of $s is
    smaller.)

    Thanks a lot for any help!

    James
    .............................................................................
    James Marshall Berkeley, CA @}-'-,--
    "Teach people what you know."
    .............................................................................
     
    James Marshall, Nov 9, 2005
    #1
    1. Advertisements

  2. I would not call it a bug. Rather, you are getting what you deserve.

    ....
    Yeah, but did you ever read the server logs?
    Now you are lying.

    D:\Home\asu1\UseNet\clpmisc> cat r.pl
    use strict;
    use warnings;

    my($s)= ' ' x 100000 ; # 15880 is fine, but 15881 crashes
    $s=~ /(.|ab)*/ ; # dies here with no warning

    D:\Home\asu1\UseNet\clpmisc> perl r.pl
    Complex regular subexpression recursion limit (32766) exceeded at r.pl
    line 5.
    It is not a bug, and the workaround is not to do something this stupid.
    By the way, your signature is formatted incorrectly. It should be around
    70 characters wide, and there should be a sig separator on the line
    above it. A sig separator is dash-dash-space-newline.

    Sinan
     
    A. Sinan Unur, Nov 9, 2005
    #2
    1. Advertisements

  3. James Marshall

    Sisyphus Guest

    "A. Sinan Unur"
    ..
    ..
    I get the same on Windows 2000, perl5.8.4 - but on Windows 2000, perl5.8.7
    all I get is an "Unknown software exception ..." WIndows popup - which in
    the past has usually meant that the stack overflowed.
    On linux, perl 5.8.7, it just outputs "Segmentation fault". Seems that
    somewhere along the way, perl has lost the capability of handling the error,
    and it's now left up to the operating system to deal with.

    Something else has changed, too. On my Win32 box, using perl 5.8.7, the
    "Unknown sopftware exception..." occurs with just 5207 spaces assigned to
    $s. Using perl 5.8.4 (on the same box/os) there's no problem until at least
    32767 spaces are assigned to $s (when the perl error occurs).

    Cheers,
    Rob
     
    Sisyphus, Nov 10, 2005
    #3
  4. Interesting because I am using AS Perl 5.8.7 on Windows, and I cannot
    observe the behavior.

    Sinan
     
    A. Sinan Unur, Nov 10, 2005
    #4
  5. James Marshall

    Sisyphus Guest

    Aaah ... my perl 5.8.7 was built using gcc (MinGW port), whereas my perl
    5.8.4 is AS build 810. So it looks like the compiler used has a bearing.

    In fact, I also have a perl 5.8.7 built using MSVC++ 7.0 (.NET), and I now
    find it exhibits the same behaviour as my perl 5.8.4 (and your AS perl
    5.8.7).

    That's notable in that I can't recall ever coming across a situation where
    the compiler used to build a native Win32 perl has had such a marked effect
    as we're seeing here.

    Cheers,
    Rob
     
    Sisyphus, Nov 10, 2005
    #5
  6. [A complimentary Cc of this posting was sent to
    James Marshall
    This is a very old limitation of the Perl REx engine: it uses C stack
    for backtracking-data storage; since C stack is a very scarse
    resource, and running out of stack is a catastrophic process (as
    opposed to running out of heap), this makes things very restrictive.

    Actually, about 5 years ago I added the necessary infrastructure to
    the REx engine to keep these data on Perl stacks (as opposed to C
    stacks, Perl stacks can grow, and running out of stack can be caught -
    at least in some situations); moreover, I converted one part of the
    REx engine (out of 4 or 5 different parts) to use this infrastructure.

    At this moment I had no time to convert the remaining constructs. I
    hoped that "everybody" will be able to continue and "copy" the
    provided modification to the other constructs. Apparently, nobody
    volunteered.

    =======================================================

    Meanwhile, you have several alternatives:

    a) Make sure that your Perl is compiled with "stack checking code",
    so that running out of stack is not catastrophic (will not help
    with data processing :-(, but will help with bookkeeping ;-);

    b) Increase amount of stack so that your data can be processed (not
    always feasible);

    c) Do not use ()* on complicated constructs (likewise).

    Sorry to be a bearer of a sad news,
    Ilya
     
    Ilya Zakharevich, Nov 10, 2005
    #6
  7. James Marshall

    xhoster Guest

    Why match against that in the first place? Is there any case in which that
    pattern match will fail?

    Xho
     
    xhoster, Nov 10, 2005
    #7
  8. The actual pattern I'm using is much longer and more complex. The pattern
    above was the result of reducing it to a simple test case.


    James
     
    James Marshall, Nov 10, 2005
    #8
  9. OK, thanks very much for the explanation. Unfortunately, none of the
    alternatives are possible in this situation, except maybe the third-- I'll
    have to think about it some more. Thanks for fixing part of it five years
    ago; if I knew more about Perl internals I'd finish it myself.

    Thanks also to Rob for his experimentation and feedback under Windows.

    Cheers,
    James
    .............................................................................
    James Marshall Berkeley, CA @}-'-,--
    "Teach people what you know."
    .............................................................................


    On Thu, 10 Nov 2005, Ilya Zakharevich wrote:

    IZ> [A complimentary Cc of this posting was sent to
    IZ> James Marshall
    IZ> > I found a weird bug in Perl 5.8.6: If a variable in a CGI script (only)
    IZ> > is long enough, the script dies when it matches the variable against the
    IZ> > pattern /(.|ab)*/ .
    IZ>
    IZ> This is a very old limitation of the Perl REx engine: it uses C stack
    IZ> for backtracking-data storage; since C stack is a very scarse
    IZ> resource, and running out of stack is a catastrophic process (as
    IZ> opposed to running out of heap), this makes things very restrictive.
    IZ>
    IZ> Actually, about 5 years ago I added the necessary infrastructure to
    IZ> the REx engine to keep these data on Perl stacks (as opposed to C
    IZ> stacks, Perl stacks can grow, and running out of stack can be caught -
    IZ> at least in some situations); moreover, I converted one part of the
    IZ> REx engine (out of 4 or 5 different parts) to use this infrastructure.
    IZ>
    IZ> At this moment I had no time to convert the remaining constructs. I
    IZ> hoped that "everybody" will be able to continue and "copy" the
    IZ> provided modification to the other constructs. Apparently, nobody
    IZ> volunteered.
    IZ>
    IZ> =======================================================
    IZ>
    IZ> Meanwhile, you have several alternatives:
    IZ>
    IZ> a) Make sure that your Perl is compiled with "stack checking code",
    IZ> so that running out of stack is not catastrophic (will not help
    IZ> with data processing :-(, but will help with bookkeeping ;-);
    IZ>
    IZ> b) Increase amount of stack so that your data can be processed (not
    IZ> always feasible);
    IZ>
    IZ> c) Do not use ()* on complicated constructs (likewise).
    IZ>
    IZ> Sorry to be a bearer of a sad news,
    IZ> Ilya
    IZ>
     
    James Marshall, Nov 10, 2005
    #9
  10. [A complimentary Cc of this posting was sent to
    James Marshall
    Actually, there is

    d) Use ()* only on construct I fixed 5 years ago. It may have been
    the "constant length of the group case" (do not remember...); so
    if you could use
    (..|ab)*
    instead of yours
    (.|ab)*

    this may be crucial. (Or maybe it was length=1 case only?)

    You need to experiment,
    Ilya
     
    Ilya Zakharevich, Nov 10, 2005
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.