Endless loop in Text::Format::reformat()

G

Gunnar Hjalmarsson

It has happened recently (twice) that invoking CGI::ContactForm at my
web site has resulted in endless loops, which have made the web server
error log grow to about 700 Mb in a couple of seconds, thus consuming
the remaining space of my VPS. :(

This is what was written repeatedly to the error log: "Use of
uninitialized value in concatenation (.) or string at
/usr/lib/perl5/site_perl/5.8.1/CGI/ContactForm.pm line 600."

Line 600 of ContactForm.pm consists of:

push @output, "$1 ";

and to see it in context, a URL to the CVS repository:
http://gunnar.cc/cgi-bin/cvsweb.cgi/contactform/lib/CGI/ContactForm.pm?annotate=1.45;cvsroot=gunnar

The code in question is the reformat() subroutine, which was copied from
the module Text::Format. It's used to format the body nicely before
sending a message.

I have failed to figure out what kind of data that might cause
reformat() to go into those endless loops. Therefore, any idea of what
the problem is would be much appreciated.
 
B

Brian McCauley

Gunnar said:
It has happened recently (twice) that invoking CGI::ContactForm at my
web site has resulted in endless loops, which have made the web server
error log grow to about 700 Mb in a couple of seconds, thus consuming
the remaining space of my VPS. :(

This is what was written repeatedly to the error log: "Use of
uninitialized value in concatenation (.) or string at
/usr/lib/perl5/site_perl/5.8.1/CGI/ContactForm.pm line 600."

Line 600 of ContactForm.pm consists of:

push @output, "$1 ";

and to see it in context,

if ( $line =~ /^(.{$min,$opt1}) (.*)/ ||
$line =~ /^(.{$min,$max1}) (.*)/ || $line =~ /^(.{$min,})? (.*)/ )
{
push @output, "$1 ";
$line = $2;
}

The way that $1 can be undef is if the first to two patterns don't
match and the third pattern matches but the optional subexpression does
not. This will happen if the first character of $line is a space.

I can't immediately see the cause of the endless looping but there's
your warning.
 
B

Bob Walton

Gunnar said:
It has happened recently (twice) that invoking CGI::ContactForm at my
web site has resulted in endless loops, which have made the web server
error log grow to about 700 Mb in a couple of seconds, thus consuming
the remaining space of my VPS. :(

This is what was written repeatedly to the error log: "Use of
uninitialized value in concatenation (.) or string at
/usr/lib/perl5/site_perl/5.8.1/CGI/ContactForm.pm line 600."

Line 600 of ContactForm.pm consists of:

push @output, "$1 ";

and to see it in context, a URL to the CVS repository:
http://gunnar.cc/cgi-bin/cvsweb.cgi/contactform/lib/CGI/ContactForm.pm?annotate=1.45;cvsroot=gunnar


The code in question is the reformat() subroutine, which was copied from
the module Text::Format. It's used to format the body nicely before
sending a message.

I have failed to figure out what kind of data that might cause
reformat() to go into those endless loops. Therefore, any idea of what
the problem is would be much appreciated.

Hmmm...almost certainly this is happening because $1 is undef at line
600, since that is the only concatenation (interpolation actually does a
concatenation). Looking at your code:

....
595:} elsif ( $line =~ /^(.{$min,$opt1}) (.*)/ ||
596: $line =~ /^(.{$min,$max1}) (.*)/ ||
$line =~ /^(.{$min,})? (.*)/ ) {
597: # 1. Try to find a string as long as opt_length.
598: # 2. Try to find a string as long as max_length.
599: # 3. Take the first word.
600: push @output, "$1 ";
....

it looks like the third alternative could match with $1 undef if $line
started with a space character and there were no other space characters
in $line (in which case the first two alternatives would fail, assuming
$min>0). Is that a possibility?
 
B

Brian McCauley

Brian said:
if ( $line =~ /^(.{$min,$opt1}) (.*)/ ||
$line =~ /^(.{$min,$max1}) (.*)/ || $line =~ /^(.{$min,})? (.*)/ )
{
push @output, "$1 ";
$line = $2;
}

The way that $1 can be undef is if the first to two patterns don't
match and the third pattern matches but the optional subexpression does
not. This will happen if the first character of $line is a space.

I can't immediately see the cause of the endless looping but there's
your warning.

Got it...

Look at _stuff.

sub _stuff {
my ($text, $num_quotes) = @_;
if ($text =~ /^ / || $text =~ /^>/ || $text =~ /^From / ||
$num_quotes > 0) {
return " $text";
}
$text;
}

If $num_quotes=0 and $line starts with a space then the line

$line = '>' x $num_quotes . _stuff($line, $num_quotes);

Simply prepends a space to $line and then the code quoted in my
previous message simply strips of a leading space.

There's your infinite loop.
 
G

Gunnar Hjalmarsson

Brian said:
Got it...

Look at _stuff.

sub _stuff {
my ($text, $num_quotes) = @_;
if ($text =~ /^ / || $text =~ /^>/ || $text =~ /^From / ||
$num_quotes > 0) {
return " $text";
}
$text;
}

If $num_quotes=0 and $line starts with a space then the line

$line = '>' x $num_quotes . _stuff($line, $num_quotes);

Simply prepends a space to $line and then the code quoted in my
previous message simply strips of a leading space.

No, in that case it strips off both the leading spaces, since $min is 1,
so the contents of $line won't be identical at the next iteration of the
while loop.
There's your infinite loop.

Unfortunately I think it's more subtle than that. Please note that
CGI::ContactForm has worked just fine for about 3.5 years, and the
infinite loop problem has just appeared twice. I suspect it happened
when some spam robot invoked the script. Maybe I'd better concentrate on
preventing such robots from successfully submitting their crap...

Anyway, thanks for trying!
 
G

Gunnar Hjalmarsson

Bob said:
Hmmm...almost certainly this is happening because $1 is undef at line
600, since that is the only concatenation (interpolation actually does a
concatenation). Looking at your code:

...
595:} elsif ( $line =~ /^(.{$min,$opt1}) (.*)/ ||
596: $line =~ /^(.{$min,$max1}) (.*)/ ||
$line =~ /^(.{$min,})? (.*)/ ) {
597: # 1. Try to find a string as long as opt_length.
598: # 2. Try to find a string as long as max_length.
599: # 3. Take the first word.
600: push @output, "$1 ";
...

it looks like the third alternative could match with $1 undef if $line
started with a space character and there were no other space characters
in $line (in which case the first two alternatives would fail, assuming
$min>0). Is that a possibility?

Don't think so. Because of this line:

$line = '>' x $num_quotes . _stuff($line, $num_quotes);

$line will not begin with a single space when the regex at line 596 is run.

See also reply to Brian in this thread.

Thanks for trying, Bob!
 
B

Bob Walton

Gunnar said:
Bob Walton wrote:
....
Aha. Here is an input line that causes an infinite loop complete with
your warning :

$line=' >'.('a' x 114);

Try input to reformat() that contains a line like that. The number of
a's isn't important as long as there are enough of them.

Before the while loop, _unstuff gets rid of the leading space, then in
the while loop, _stuff puts the space back in front because of the >,
and then the third alternative in the elseif matches with $1 undef and
$2 the string starting with >. Then the while repeats on exactly the
same $line forever.
 
G

Gunnar Hjalmarsson

Bob said:
Aha. Here is an input line that causes an infinite loop complete with
your warning :

$line=' >'.('a' x 114);

Try input to reformat() that contains a line like that. The number of
a's isn't important as long as there are enough of them.

Before the while loop, _unstuff gets rid of the leading space, then in
the while loop, _stuff puts the space back in front because of the >,
and then the third alternative in the elseif matches with $1 undef and
$2 the string starting with >. Then the while repeats on exactly the
same $line forever.

I can't believe it was that simple!! Thought I'd tried everything and
anything; obviously not. :(

Thank you _very_ much, Bob!

Since I can't figure out why the first capturing parenthesis is
optional, I guess I'll simply remove the question mark.

$line =~ /^(.{$min,})? (.*)/
-------------------------^

Thanks again!
 
B

Brian McCauley

Gunnar said:
No, in that case it strips off both the leading spaces, since $min is 1,
so the contents of $line won't be identical at the next iteration of the
while loop.

Ah, well since $1 is null we know that /^(.+)? (.*)/ is matching but
the first subexpression is not. As far as I can see this can only be is
there's only one space. Which as you said means I can't explain the
infinite loop.
 
G

Gunnar Hjalmarsson

Brian said:
Ah, well since $1 is null we know that /^(.+)? (.*)/ is matching but
the first subexpression is not. As far as I can see this can only be is
there's only one space. Which as you said means I can't explain the
infinite loop.

Right, but you were close - see Bob's explanation in another sub-thread.

I'm about to update CGI::ContactForm, and have just reported the bug in
Text::Flowed: http://rt.cpan.org/Ticket/Display.html?id=21739
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top