[posted & mailed]
In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other. The best solution I came up with is the
following,
sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}
Is there a shorter or faster or more elegant solution, especially one
that uses just a single regexp, i.e. without that intermediate hash
array?
You can do it by making sure you can't match a character, followed by
other characters, followed by that character again.
sub unique_7_11 {
my $str = shift;
return $str =~ /^(?!.*([a-z]).*\1)[a-z]{7,11}$/;
}
That function returns true if the string is 7 to 11 lowercase letters,
with no letter used more than once. It returns false if the string
contains characters that are NOT lowercase letters, if it contains less
than 7 or more than 11 lowercase characters, or if any lowercase letter is
used more than once. (It DOES allow for a newline at the end of the
string, though, just in case.)
First, notice that I added ^ and $ anchors to the regex. These anchor to
the beginning and end of the string. Your regex didn't have them, so it
allowed strings of lengths GREATER than 11 to make it through.
Second, I've added something to the beginning of the regex. It's a
"negative look-ahead assertion". Here it is:
(?!.*([a-z]).*\1)
It looks complex, but let's break it down:
.* # match any number of characters
([a-z]) # match (and capture to $1) a lowercase letter
.* # match any number of characters
\1 # match what was captured into $1
What this is doing is trying to match a character, and then seeing if it
can match that character again later in the string. This is ALL wrapped
inside a (?!...), which I said is a negative look-ahead. This means two
things: first, if the pattern inside it SUCCEEDS, the look-ahead FAILS
(because it's a "negative" look-ahead). Second, it only LOOKS AHEAD, it
doesn't actually consume anything in the string. It's like sending a
scout ahead of you to see if the coast is clear, and then having the scout
return. It doesn't end up changing your position in the string.
You could reorder the regex a bit to make it more efficient:
sub unique_7_11 {
my $str = shift;
return $str =~ /^(?=[a-z]{7,11}$)(?!.*([a-z]).*\1)/;
}
This time, we use a positive look-ahead at the front, to make sure FIRST
that the string matches our 7-11 lowercase character requisite. THEN we
use the negative look-ahead to make sure no character is repeated.
Or, you could use two regexes:
sub unique_7_11 {
my $str = shift;
return $str =~ /^[a-z]{7,11}$/ and $str !~ /(.).*\1/;
}
Here, we first make sure the string has 7-11 lowercase characters. THEN,
we make sure the string CANNOT match a character twice.
For more information, please read the regex documentation that comes with
Perl, such as:
perldoc perlre
perldoc perlretut
perldoc perlreref