A
Aaron Sherman
I was stumped for a while today, when I ran into a regular expression
problem. Now, I'm a pretty good regexp hacker, but this problem was so
constrained and seemed at first glance to be so wrong for a regexp
that I didn't think it could be done. However, in specifying it to a
friend, I realized that it could.
I'm curious to see if there's a better solution.
The problem is this: You have a function call in a library that takes
a regexp, anchors it on both ends by adding "^" and "$" to the
beginning end, and then applies that to an input string.
sub foo {
my $re = shift;
my $in = <STDIN>;
die "'$in' is bad" unless $in =~ /^$re$/;
}
It exits with an error if the string does not match the regexp. I
wanted to accept strings that did NOT contain a particular substring.
Normally, I would have written:
!/xyz/
But a) my regexp is going to have to match the whole line, which the
above does not, and b) I can't tell the function to negate the regexp.
In the end, this is what I came up with:
qr{((?=[^x]|x[^y]|xy[^z]).)*}
That was the final version. I had done this previously:
qr{([^x]|x[^y]|xy[^z])*}
but that fails on:
xyxyz
which it allows to slip through because it consumes the leading "xyx"
on the first pass.
problem. Now, I'm a pretty good regexp hacker, but this problem was so
constrained and seemed at first glance to be so wrong for a regexp
that I didn't think it could be done. However, in specifying it to a
friend, I realized that it could.
I'm curious to see if there's a better solution.
The problem is this: You have a function call in a library that takes
a regexp, anchors it on both ends by adding "^" and "$" to the
beginning end, and then applies that to an input string.
sub foo {
my $re = shift;
my $in = <STDIN>;
die "'$in' is bad" unless $in =~ /^$re$/;
}
It exits with an error if the string does not match the regexp. I
wanted to accept strings that did NOT contain a particular substring.
Normally, I would have written:
!/xyz/
But a) my regexp is going to have to match the whole line, which the
above does not, and b) I can't tell the function to negate the regexp.
In the end, this is what I came up with:
qr{((?=[^x]|x[^y]|xy[^z]).)*}
That was the final version. I had done this previously:
qr{([^x]|x[^y]|xy[^z])*}
but that fails on:
xyxyz
which it allows to slip through because it consumes the leading "xyx"
on the first pass.