Script dumps core....? Any suggestions...

G

Gancy

Here is the snipet of the perl script, I have perl version v5.8.5 built
for sun4-solaris. I have run this script on thousands of 'c','C++'
headers and source files. Runs smoothly as my new ESTEEM car. But i
have one surce file toke.c in my test case. soon this scripts hits this
file at it dumps. I have tried and still trying to debug, but still no
solutions. If anybody can help me with this would be of great
appreciation. I can uploaded source file (toke.c) as well as core file
frames(core), if needed.

#!/usr/bin/perl

$np = qr{
\(
(?:
(?>[^()]+ )
|
(??{ $np })
)*
\)
}x;

$funpat = qr/((\W)?(\*?\*?\w+)\s*($np))/;
my $temp;

open (FILE, "toke.c") || die "Cannot open file";

while($temp = <FILE>)
{
$tstring.=$temp;
}

close FILE;

get_fn_call($tstring);

sub get_fn_call($){
my ($cur_str) = @_;
while( $cur_str =~ m/$funpat/g )
{
$4 =~ /^\(((.*\n*.*)*)\)$/;
get_fn_call($1);
}
}









Message 604 of 606 | Pre
 
X

xhoster

Gancy said:
Here is the snipet of the perl script, I have perl version v5.8.5 built
for sun4-solaris. I have run this script on thousands of 'c','C++'
headers and source files. Runs smoothly as my new ESTEEM car. But i
have one surce file toke.c in my test case. soon this scripts hits this
file at it dumps.

You are using a highly experimental feature of Perl. I guess the
experiment failed in your case. Can you monitor memory usage and see if it
becomes exorbitant somehwere? Perhaps from excessive nesting of
paranthesis in the toke.c file?

I can't make it coredump with a simple test case below, but it doesn't
seem to give the right answer, either. It seems like it should always
matche the outermost set of parenthesis, but it doesn't:

my $re = qr{\((?:(?>[^()]+)|(??{$re}))*\)};
my $x='(j)';
foreach (1..10) {
print "Starting loop $_ $x\n";
print "matched $1\n" if $x=~/($re)/;
$x = "($x)";
};
__END__
Starting loop 1 (j)
matched (j)
Starting loop 2 ((j))
matched ((j))
Starting loop 3 (((j)))
matched ((j))
Starting loop 4 ((((j))))
matched ((j))
Starting loop 5 (((((j)))))
matched ((j))
Starting loop 6 ((((((j))))))
matched ((j))
Starting loop 7 (((((((j)))))))
matched ((j))
Starting loop 8 ((((((((j))))))))
matched ((j))

Xho
 
X

xhoster

Gancy said:
Here is the snipet of the perl script, I have perl version v5.8.5 built
for sun4-solaris. I have run this script on thousands of 'c','C++'
headers and source files. Runs smoothly as my new ESTEEM car. But i
have one surce file toke.c in my test case. soon this scripts hits this
file at it dumps.

Is this feature still considered highly experimental in 5.8.5? I guess the
experiment failed in your case. Can you monitor memory usage and see if it
becomes exorbitant somehwere? Perhaps from excessive nesting of
paranthesis in the toke.c file?

I can't make it coredump with a simple test case below, but it doesn't
seem to give the right answer, either. It seems like it should always
matche the outermost set of parenthesis, but it doesn't:

(I should mention, I'm using 5.8.0)

my $re = qr{\((?:(?>[^()]+)|(??{$re}))*\)};
my $x='(j)';
foreach (1..10) {
print "Starting loop $_ $x\n";
print "matched $1\n" if $x=~/($re)/;
$x = "($x)";
};
__END__
Starting loop 1 (j)
matched (j)
Starting loop 2 ((j))
matched ((j))
Starting loop 3 (((j)))
matched ((j))
Starting loop 4 ((((j))))
matched ((j))
Starting loop 5 (((((j)))))
matched ((j))
Starting loop 6 ((((((j))))))
matched ((j))
Starting loop 7 (((((((j)))))))
matched ((j))
Starting loop 8 ((((((((j))))))))
matched ((j))

Xho
 
A

Anno Siegel

Is this feature still considered highly experimental in 5.8.5? I guess the
experiment failed in your case. Can you monitor memory usage and see if it
becomes exorbitant somehwere? Perhaps from excessive nesting of
paranthesis in the toke.c file?

I can't make it coredump with a simple test case below, but it doesn't

[...]

I have replied to this in the other thread the OP started about it. To
summarize: It does indeed segfault with toke.c from the perl source
(v5.8.6) as the input. The reason is simple: The recursive regex goes
into deep recursion. Perl doesn't warn about that, that's the bug, if
there is one. Otherwise, it's just that the regex is broken.

I am only slightly curious what bit of legal C in toke.c is throwing
it. The OP claims it ran fine with many other sources.

Anno
 
X

xhoster

Is this feature still considered highly experimental in 5.8.5? I guess
the experiment failed in your case. Can you monitor memory usage and
see if it becomes exorbitant somehwere? Perhaps from excessive nesting
of paranthesis in the toke.c file?

I can't make it coredump with a simple test case below, but it doesn't

[...]

I have replied to this in the other thread the OP started about it.

Multiposting strikes again! Gancy, shame on you.


To
summarize: It does indeed segfault with toke.c from the perl source
(v5.8.6) as the input. The reason is simple: The recursive regex goes
into deep recursion. Perl doesn't warn about that, that's the bug, if
there is one. Otherwise, it's just that the regex is broken.

His regex seems to come right out of the documentation (serves as an
example for use of (??{}) feature in perldoc perlre), which doesn't
necessarily means it isn't broken, but it does give it enough exposure that
I was interested in looking into it. But I looked into, was confused,
and gave up.

Xho
 
A

Anno Siegel

Here is the snipet of the perl script, I have perl version v5.8.5
built for sun4-solaris. I have run this script on thousands of 'c',
'C++' headers and source files. Runs smoothly as my new ESTEEM car.
But i have one surce file toke.c in my test case. soon this scripts
hits this file at it dumps.

Is this feature still considered highly experimental in 5.8.5? I guess
the experiment failed in your case. Can you monitor memory usage and
see if it becomes exorbitant somehwere? Perhaps from excessive nesting
of paranthesis in the toke.c file?

I can't make it coredump with a simple test case below, but it doesn't

[...]

I have replied to this in the other thread the OP started about it.

Multiposting strikes again! Gancy, shame on you.


To
summarize: It does indeed segfault with toke.c from the perl source
(v5.8.6) as the input. The reason is simple: The recursive regex goes
into deep recursion. Perl doesn't warn about that, that's the bug, if
there is one. Otherwise, it's just that the regex is broken.

His regex seems to come right out of the documentation (serves as an
example for use of (??{}) feature in perldoc perlre), which doesn't
necessarily means it isn't broken, but it does give it enough exposure that
I was interested in looking into it. But I looked into, was confused,
and gave up.

Ah, that's where it's from.

It's fragile, not broken per se. Too many parentheses inside one outer
pair throw it:

my $re;
$re = qr{
\(
(?:
(?> [^()]+ ) # Non-parens without backtracking
|
(??{ $re }) # Group with matching parens
)*
\)
}x;

$_ = 'a (' . '(many)' x 5000 . ')';
/$re/;

An initial non-parenthesized part ("a ") must be present to trigger
recursion on the outer parentheses. From then on, recursion goes one
deeper for every "(" and every ")". Too much is too much.

How does this happen in toke.c? It's C, not Lisp, after all. Early on
(line 111 of 8090) there is this part of a comment:

* TOKEN : generic token (used for '(', DOLSHARP, etc)

The regex knows nothing about comments or quoting and tries to parse
the rest of the source enclosed in the spurious "'('". Too much is
too much.

The error isn't in the regex, as I prematurely assumed, but in its
careless application.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top