Regex: Why is overreaching necessary?

S

Shannon Jacobs

I'm not sure how to take your curiosity, but as I noted, here is a
minimal example of a test case for the latest (but not greatest)
problem.

http://shanenj.tripod.com/cgi-bin/p...orttype=none&datetype=comp&numorname=authnums

http://shanenj.tripod.com/cgi-bin/p...orttype=none&datetype=comp&numorname=authnums

The first one properly returns exactly two books, and the second one
somehow returns a third book. These two queries came out of a webpage
that uses a fair amount of JavaScript, but I feel like that's not
related to the problem, since as near as I can see, the resulting
queries are identical except that the two numbers are reversed.

For comparison, the following query produces the same three hits as
the second query, but that is the correct result given the code branch
it follows...

http://shanenj.tripod.com/cgi-bin/p...orttype=none&datetype=comp&numorname=authnums

The code where the regex is used has already been posted (and another
tip of the hat to anno4), but I can't really figure out how to snip it
down more than that in line with your suggestion. It is obvious that
something slightly odd is going on. However, I'm pretty sure Perl is
doing exactly what I told it to do, and the problem is that what I
told it is not quite what I wanted...

Obviously my original hypothesis about the leading space is wrong...
(Unless I'm also wrong about it being a single problem. It certainly
could be two...) Right now I'm classifying it as a problem I don't
understand well enough to even ask about, though I will probably have
an opportunity to discuss it with some heavy Perlers at the next Linux
meeting.

Shannon Jacobs said:
Having fixed one problem, my latest testing discovered yet another
peculiarity which could easily consume much more time than it's
worth... I'm only going to mention it as an example of the peculiarity
of my code... I have discovered that using the search target 2471|2396
returns different results from the search target 2396|2471. I don't
think this can really be Perl's fault. However and fortuitously, every
problem that I've discovered (so far) is in the direction of false
positives, and that is not very troublesome for this application...
My current belief is that this newly discovered flaw is somewhere on
the HTML side, possibly in my JavaScript. However if I can't find it
there, and if it seems to be in the Perl, I may be back. Thanks again.

Can you post an example demonstrating this remarkable discovery?
Something along the lines of the following:

#!/usr/local/bin/perl
use strict;
use warnings;
my $target1 = '2471|2396';
my $target2 = '2396|2471';
while(<DATA>) {
chomp;
if( /$target1/ ) {
print "$_ matches $target1\n";
}else{
print "$_ doesn't match $target1\n";
}
if( /$target2/ ) {
print "$_ matches $target2\n";
}else{
print "$_ doesn't match $target2\n";
}}

__DATA__
2471
2396
247
239

... which produces...

2471 matches 2471|2396
2471 matches 2396|2471
2396 matches 2471|2396
2396 matches 2396|2471
247 doesn't match 2471|2396
247 doesn't match 2396|2471
239 doesn't match 2471|2396
239 doesn't match 2396|2471

Thanks.
 
S

shanen

Thank you for your interest, but I'm trying to make it clear that I'm
willing to leave the issue at this time. It's not actually a piece of
production code...

I do think the new problem might be slightly interesting, but it also
seems complicated and not worth much effort. My latest hypothesis is
that it involves an ordering assumption which is normally preserved.
The original design design of the program assumes that the values in
the search target are in sorted order, and that is still true in the
actual execution of the program. However, my special test data
violates that ordering assumption, and the interesting thing is that
this may somehow allow the flow of control to bypass a critical
branch, or possibly to reenter a later code branch that is normally
skipped.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,277
Latest member
VytoKetoReview

Latest Threads

Top