Hard regexp problem

A

Aaron Sherman

I was stumped for a while today, when I ran into a regular expression
problem. Now, I'm a pretty good regexp hacker, but this problem was so
constrained and seemed at first glance to be so wrong for a regexp
that I didn't think it could be done. However, in specifying it to a
friend, I realized that it could.

I'm curious to see if there's a better solution.

The problem is this: You have a function call in a library that takes
a regexp, anchors it on both ends by adding "^" and "$" to the
beginning end, and then applies that to an input string.

sub foo {
my $re = shift;
my $in = <STDIN>;
die "'$in' is bad" unless $in =~ /^$re$/;
}

It exits with an error if the string does not match the regexp. I
wanted to accept strings that did NOT contain a particular substring.
Normally, I would have written:

!/xyz/

But a) my regexp is going to have to match the whole line, which the
above does not, and b) I can't tell the function to negate the regexp.

In the end, this is what I came up with:

qr{((?=[^x]|x[^y]|xy[^z]).)*}

That was the final version. I had done this previously:

qr{([^x]|x[^y]|xy[^z])*}

but that fails on:

xyxyz

which it allows to slip through because it consumes the leading "xyx"
on the first pass.
 
B

Brian McCauley

Abigail said:
Aaron Sherman ([email protected]) wrote on MMMMLXX September
MCMXCIII in <URL:() I was stumped for a while today, when I ran into a regular expression
() problem. Now, I'm a pretty good regexp hacker, but this problem was so
() constrained and seemed at first glance to be so wrong for a regexp
() that I didn't think it could be done. However, in specifying it to a
() friend, I realized that it could.
()
() I'm curious to see if there's a better solution.
()
() The problem is this: You have a function call in a library that takes
() a regexp, anchors it on both ends by adding "^" and "$" to the
() beginning end, and then applies that to an input string.
()
() sub foo {
() my $re = shift;
() my $in = <STDIN>;
() die "'$in' is bad" unless $in =~ /^$re$/;
() }
()
() It exits with an error if the string does not match the regexp. I
() wanted to accept strings that did NOT contain a particular substring.
() Normally, I would have written:
()
() !/xyz/

/^[^x]*(?:x(?!yx)[^x]*)*$/

Or:

/^(?:(?!xyz).)*$/

Er no,

/^(?!.*xyz).*$/

There have been numerous previous threads on this question.
 
J

Jeff 'japhy' Pinyan

I was stumped for a while today, when I ran into a regular expression
problem. Now, I'm a pretty good regexp hacker, but this problem was so
constrained and seemed at first glance to be so wrong for a regexp
that I didn't think it could be done. However, in specifying it to a
friend, I realized that it could.

I'm curious to see if there's a better solution.

The problem is this: You have a function call in a library that takes
a regexp, anchors it on both ends by adding "^" and "$" to the
beginning end, and then applies that to an input string.

sub foo {
my $re = shift;
my $in = <STDIN>;
die "'$in' is bad" unless $in =~ /^$re$/;
}

It exits with an error if the string does not match the regexp. I
wanted to accept strings that did NOT contain a particular substring.
Normally, I would have written:

!/xyz/

But a) my regexp is going to have to match the whole line, which the
above does not, and b) I can't tell the function to negate the regexp.

So make your regex consist of a negative look-ahead:

(?!.*xyz).*

When that gets put into your program, it'll be:

/^(?!.*xyz).*$/

which says "at the beginning of the string, if we CAN'T match '.*xyz',
then match .* and then the end of the string" which, as long as there
aren't embedded newlines, will work just fine.

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
Senior Dean, Fall 2004 % have long ago been overpaid?
RPI Corporation Secretary %
http://japhy.perlmonk.org/ % -- Meister Eckhart
 
L

Lukas Mai

Aaron Sherman schrob:
[...]
The problem is this: You have a function call in a library that takes
a regexp, anchors it on both ends by adding "^" and "$" to the
beginning end, and then applies that to an input string.
sub foo {
my $re = shift;
my $in = <STDIN>;
die "'$in' is bad" unless $in =~ /^$re$/;
}

Note that this doesn't work with foo('this|that'); you need /^(?:$re)$/.

HTH, Lukas
 
B

Brian McCauley

Abigail said:
Brian McCauley ([email protected]) wrote on MMMMLXXI September MCMXCIII in
<URL://
// Abigail wrote:
// > /^(?:(?!xyz).)*$/
//
// Er no,

No? You mean, any of my regexen are wrong?

For certain values of 'wrong' :)
Could you give an example of a
string were it gives the wrong result?

No. But not giving the right result is not the only way to be wrong.
Your solution is more complex than necessary both from a human and a
computational point of view.

use Benchmark;

my $s ='dshidahsidahsdias' x 1000;

timethese 1000 => {
Abigail => sub { $s =~ /^(?:(?!xyz).)*$/ },
Usual => sub { $s =~ /^(?!.*xyz).*$/ },
};
__END__
Benchmark: timing 1000 iterations of Abigail, Usual...
Abigail: 4 wallclock secs ( 3.85 usr + 0.00 sys = 3.85 CPU) @
259.74/s (n=1000)
Usual: 1 wallclock secs ( 0.60 usr + 0.00 sys = 0.60 CPU) @
1666.67/s (n=1000)
// /^(?!.*xyz).*$/

A trailing .*, always very useful in a regexp.

Given that the problem specification required the regex start with ^ and
end with $ the trailing .* seemed the best way way to fulfil this
requirement.
The disadvantage of '/^(?!.*xyz).*$/' is that it isn't easy to use it as
part of a larger expression - for instance if only part of your string
should not contain "xyz". My expression OTOH can easily be taken and
put in a larger expression.

This is completely true. And had the ability to be included in a larger
expression been part of the requirement then your solution would
probably have been the best one. Except of course if the ability to
be included in a larger expression _had_ been part of the requirement
then the problem specifiaction would have had to said what was to be
done with xyz that crossed the endpoint of the matched substring.
Depending on what was desired in that situation your solution may not
have met the requirement.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top