Kevin said:
I have a string with comma separated tags:
"a, b, c, d, e, f"
It's rather easy to write something to express a boolean OR:
a OR b OR c = (^|,(\s)+)(a|b|c)[,$]
Um, I don't think the "[,$]" is doing what you think it should be
doing (in fact, I don't think that'll even compile). What you should
use in its place is "(,|$)".
What I would like to know is if there is a way to express AND or NOT:
1. (a OR b) AND c
2. (a OR b) AND NOT c
3. (a OR b) AND NOT (c OR d)
I imagine there is no nice way to do this without doing something like
writing out your AND clause before and after whatever OR clause you are
using, which would become really messy for more complicated expressions,
but perhaps someone knows of some way to do this.
As far as I know, writing out your AND clause before and after
whatever OR clause you are using will be the simplest and most readable
solution. However, you can do what you want using more complicated
expressions, taking advantage of Perl's extended regular expressions.
(You can read "perldoc perlre" to find out more about them.)
Let me warn you, though, they can be rather "messy." For this
reason, instead of searching for commas (or the beginning/end of the
string) like you have, I'll just leave those out, as if all the
elements were one character long. This won't always be the case, of
course, but I figure that you'll be able to add the delimeter detection
in yourself later, but for now, I won't put it in for simplicity's
sake.
For this one, you basically want to search use (a|b), but you also
want to look for c, which may come before or after (a|b). So you can
use this regular expression:
m/(a|b).*c|c.*(a|b)/
This one is trickier, because in order to verify that there is no
'c', you must search the entire line. If 'c' were just one character
long, we could get away with using the [^c] character class, like this:
m/^[^c]*(a|b)[^c]*$/
This searches for 'a' or 'b', but makes sure that ALL the characters
before AND after are NOT 'c'.
However, it's likely that your 'c' term won't be one character long.
In that case, you'll probably want to use a "negative look-ahead"
assertion (again, look it up in "perldoc perlre" if you want to read
details about it -- this is one of extended regular expressions I
mentioned earlier). That way we would have:
m/^((?!c).)*(a|b)((?!c).)*$/
This pattern is essentially the same as the previous one, except
instead of having "[^c]" (which assumes that 'c' is one character
long), we have "((?!c).)". What this pattern matches is any character
provided that 'c' is not immediately found at that spot.
To clarify, if 'c' was actually the string "car", you would write
the term as "((?!car).)". Notice that you still use one '.' even
though "car" is three letters long. That's because the '.' only
matches one character, but with the (?!car) in front of it it'll only
match if that character is not a 'c' that is followed by an 'a' and an
'r'.
(If you put three '.' instead of just one, then the "((?!car)...)*"
expression would match multiples of three characters, which is not what
you want.)
Of course, just as a '*' follows "[^c]", one should also follow
"((?!c).)" because you are necessarily searching through more than one
character (we'll assume that there is more than one character that
comes before and after "(a|b)").
3. (a OR b) AND NOT (c OR d)
This one is pretty much the same as the previous example, except
that instead of using "(?!c)" you'll replace it with "(?!c|d)", like
this:
m/^((?!c|d).)*(a|b)((?!c|d).)*$/
That's pretty much it. Are the expressions messy? Most people
would say yes, so you might want to seriously consider breaking out
each of the above regular expressions into more than one, if only for
readability's sake.
Another tip: Whenever you use a complicated regular expression,
consider putting a comment right above it that clearly states what it's
searching for. For example, you might write your code to look like:
# Look for (a OR b) AND NOT (c OR d):
if ($string =~ m/^((?!c|d).)*(a|b)((?!c|d).)*$/)
This will make your code easier to understand and to debug. Without
the comment, any maintainer that comes after you will have a puzzle to
solve in order to figure out what you really meant. And if for some
reason you (or a future maintainer) introduced a bug in your regular
expression, the comment can serve as a guide to determine whether or
not a bug actually exists in the regular expression (otherwise, it
would be difficult to know for sure).
I hope this helps, Kevin.
-- Jean-Luc