How to match some characters different to each other?

Georg Wittig · Feb 2, 2004

Hi,

In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other. The best solution I came up with is the
following,

sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}

Is there a shorter or faster or more elegant solution, especially one
that uses just a single regexp, i.e. without that intermediate hash
array?

Thanks for your help,

Jeff 'japhy' Pinyan · Feb 2, 2004

[posted & mailed]

In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other. The best solution I came up with is the
following,

sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}

Is there a shorter or faster or more elegant solution, especially one
that uses just a single regexp, i.e. without that intermediate hash
array?

You can do it by making sure you can't match a character, followed by
other characters, followed by that character again.

sub unique_7_11 {
my $str = shift;
return $str =~ /^(?!.*([a-z]).*\1)[a-z]{7,11}$/;
}

That function returns true if the string is 7 to 11 lowercase letters,
with no letter used more than once. It returns false if the string
contains characters that are NOT lowercase letters, if it contains less
than 7 or more than 11 lowercase characters, or if any lowercase letter is
used more than once. (It DOES allow for a newline at the end of the
string, though, just in case.)

First, notice that I added ^ and $ anchors to the regex. These anchor to
the beginning and end of the string. Your regex didn't have them, so it
allowed strings of lengths GREATER than 11 to make it through.

Second, I've added something to the beginning of the regex. It's a
"negative look-ahead assertion". Here it is:

(?!.*([a-z]).*\1)

It looks complex, but let's break it down:

.* # match any number of characters
([a-z]) # match (and capture to $1) a lowercase letter
.* # match any number of characters
\1 # match what was captured into $1

What this is doing is trying to match a character, and then seeing if it
can match that character again later in the string. This is ALL wrapped
inside a (?!...), which I said is a negative look-ahead. This means two
things: first, if the pattern inside it SUCCEEDS, the look-ahead FAILS
(because it's a "negative" look-ahead). Second, it only LOOKS AHEAD, it
doesn't actually consume anything in the string. It's like sending a
scout ahead of you to see if the coast is clear, and then having the scout
return. It doesn't end up changing your position in the string.

You could reorder the regex a bit to make it more efficient:

sub unique_7_11 {
my $str = shift;
return $str =~ /^(?=[a-z]{7,11}$)(?!.*([a-z]).*\1)/;
}

This time, we use a positive look-ahead at the front, to make sure FIRST
that the string matches our 7-11 lowercase character requisite. THEN we
use the negative look-ahead to make sure no character is repeated.

Or, you could use two regexes:

sub unique_7_11 {
my $str = shift;
return $str =~ /^[a-z]{7,11}$/ and $str !~ /(.).*\1/;
}

Here, we first make sure the string has 7-11 lowercase characters. THEN,
we make sure the string CANNOT match a character twice.

For more information, please read the regex documentation that comes with
Perl, such as:

perldoc perlre
perldoc perlretut
perldoc perlreref

Brian McCauley · Feb 2, 2004

In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other.

Well the way to text that a string has no repeated characters is

!/(.).*\1/

The way to get a sequence of 7 to 11 characters is

/(.{7,11})/

What is annoying is that the (?{ code }) assertion always succedes
even if code evalutes to false.

I'd like to know what bright spark decided it should work this way
rather than doing the intuatively obvious and much more useful thing
of succeding iff code returns a true value.

So we can't just do:

/(.{7,11})(?{ $1 !~ m{(.).*\1} })/

In priciple I believe you can do:

/(.{7,11})(??{ $1 !~ m{(.).*\1} ? '' : '^' })/

(We know that '' will always match and '^' never ).

But that segfaults on 5.8.0 - I've not tried it on a later version.

The best solution I came up with is the following,

sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}

That inner bit can be replaced with

return 1 if $1 !~ /(.).*\1/;

But your "solution" considers the first string of 7 to 11 lowercase
letters in $arg. That does not match your problem definition (which
says nothing about letters) and also misses 'abcdefg' in 'fooabcdefg'
and in 'abcdefgfoo'.

Also you forgot to mention that you are only looking for true false!
You don't want to extract the match. Your problem simplifies to
looking for a string of exactly 7 characters all different since any
string of 11 non-repeated characters must start with a string of 7
non-repeated characters.

Is there a shorter or faster or more elegant solution,

for ( $arg =~ /(?=(.{7}))/g ) {
return 1 if $1 !~ /(.).*\1/;
}
return 0;

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Tore Aursand · Feb 2, 2004

In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other.

What do you mean, actually? Do you want to make sure that the characters
on position 7 through 11 are unique?

sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}

You don't need all those parantheses. Your sub seems to be better off
written like this, IMO:

sub find711 {
my $arg = shift;
my %chars;
if ( $arg =~ m,([a-z]{7,11}), ) {
@chars{split(//, $1)}++;
return 1 if keys %chars == length( $string );
}
return 0;
}

I still don't understand what your problem _really_ is, so if you could
give us some examples...?

--
Tore Aursand <[email protected]>
"Omit needless words. Vigorous writing is concise. A sentence should
contain no unnecessary words, a paragraph no unnecessary sentences,
for the same reason that a drawing should have no unnecessary lines
and a machine no unnecessary parts." -- William Strunk Jr.

FAQ 6.23 How can I match strings with multibyte characters?	0	Jan 11, 2011
Regex to match a numerical IP range	7	Dec 11, 2010
FAQ 6.12 Can I use Perl regular expressions to match balanced text?	0	Jan 9, 2011
Trying to parse/match a C string literal	12	Sep 24, 2009
How to escape # hash character in regex match strings	8	Jun 10, 2009
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
Match a pattern multiple times, returning matches, captures andoffset?	9	Apr 5, 2011
Idiot Q: How to find index number of HASH match?	25	Jul 21, 2006

How to match some characters different to each other?

Georg Wittig

Jeff 'japhy' Pinyan

Brian McCauley

Tore Aursand

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads