How to match some characters different to each other?

G

Georg Wittig

Hi,

In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other. The best solution I came up with is the
following,

sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}

Is there a shorter or faster or more elegant solution, especially one
that uses just a single regexp, i.e. without that intermediate hash
array?

Thanks for your help,
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other. The best solution I came up with is the
following,
sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}

Is there a shorter or faster or more elegant solution, especially one
that uses just a single regexp, i.e. without that intermediate hash
array?

You can do it by making sure you can't match a character, followed by
other characters, followed by that character again.

sub unique_7_11 {
my $str = shift;
return $str =~ /^(?!.*([a-z]).*\1)[a-z]{7,11}$/;
}

That function returns true if the string is 7 to 11 lowercase letters,
with no letter used more than once. It returns false if the string
contains characters that are NOT lowercase letters, if it contains less
than 7 or more than 11 lowercase characters, or if any lowercase letter is
used more than once. (It DOES allow for a newline at the end of the
string, though, just in case.)

First, notice that I added ^ and $ anchors to the regex. These anchor to
the beginning and end of the string. Your regex didn't have them, so it
allowed strings of lengths GREATER than 11 to make it through.

Second, I've added something to the beginning of the regex. It's a
"negative look-ahead assertion". Here it is:

(?!.*([a-z]).*\1)

It looks complex, but let's break it down:

.* # match any number of characters
([a-z]) # match (and capture to $1) a lowercase letter
.* # match any number of characters
\1 # match what was captured into $1

What this is doing is trying to match a character, and then seeing if it
can match that character again later in the string. This is ALL wrapped
inside a (?!...), which I said is a negative look-ahead. This means two
things: first, if the pattern inside it SUCCEEDS, the look-ahead FAILS
(because it's a "negative" look-ahead). Second, it only LOOKS AHEAD, it
doesn't actually consume anything in the string. It's like sending a
scout ahead of you to see if the coast is clear, and then having the scout
return. It doesn't end up changing your position in the string.

You could reorder the regex a bit to make it more efficient:

sub unique_7_11 {
my $str = shift;
return $str =~ /^(?=[a-z]{7,11}$)(?!.*([a-z]).*\1)/;
}

This time, we use a positive look-ahead at the front, to make sure FIRST
that the string matches our 7-11 lowercase character requisite. THEN we
use the negative look-ahead to make sure no character is repeated.

Or, you could use two regexes:

sub unique_7_11 {
my $str = shift;
return $str =~ /^[a-z]{7,11}$/ and $str !~ /(.).*\1/;
}

Here, we first make sure the string has 7-11 lowercase characters. THEN,
we make sure the string CANNOT match a character twice.

For more information, please read the regex documentation that comes with
Perl, such as:

perldoc perlre
perldoc perlretut
perldoc perlreref
 
B

Brian McCauley

In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other.

Well the way to text that a string has no repeated characters is

!/(.).*\1/

The way to get a sequence of 7 to 11 characters is

/(.{7,11})/

What is annoying is that the (?{ code }) assertion always succedes
even if code evalutes to false.

I'd like to know what bright spark decided it should work this way
rather than doing the intuatively obvious and much more useful thing
of succeding iff code returns a true value.

So we can't just do:

/(.{7,11})(?{ $1 !~ m{(.).*\1} })/

In priciple I believe you can do:

/(.{7,11})(??{ $1 !~ m{(.).*\1} ? '' : '^' })/

(We know that '' will always match and '^' never ).

But that segfaults on 5.8.0 - I've not tried it on a later version.
The best solution I came up with is the following,

sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}

That inner bit can be replaced with

return 1 if $1 !~ /(.).*\1/;

But your "solution" considers the first string of 7 to 11 lowercase
letters in $arg. That does not match your problem definition (which
says nothing about letters) and also misses 'abcdefg' in 'fooabcdefg'
and in 'abcdefgfoo'.

Also you forgot to mention that you are only looking for true false!
You don't want to extract the match. Your problem simplifies to
looking for a string of exactly 7 characters all different since any
string of 11 non-repeated characters must start with a string of 7
non-repeated characters.
Is there a shorter or faster or more elegant solution,

for ( $arg =~ /(?=(.{7}))/g ) {
return 1 if $1 !~ /(.).*\1/;
}
return 0;

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
T

Tore Aursand

In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
are different to each other.

What do you mean, actually? Do you want to make sure that the characters
on position 7 through 11 are unique?
sub find711 {
my ($arg) = shift;
my ($string);
my (%chars) = ();
if ($arg =~ /([a-z]{7,11})/) {
$string = $1;
@chars{split(//,$string)}++;
return 1 if keys (%chars) == length ($string);
}
return 0;
}

You don't need all those parantheses. Your sub seems to be better off
written like this, IMO:

sub find711 {
my $arg = shift;
my %chars;
if ( $arg =~ m,([a-z]{7,11}), ) {
@chars{split(//, $1)}++;
return 1 if keys %chars == length( $string );
}
return 0;
}

I still don't understand what your problem _really_ is, so if you could
give us some examples...?


--
Tore Aursand <[email protected]>
"Omit needless words. Vigorous writing is concise. A sentence should
contain no unnecessary words, a paragraph no unnecessary sentences,
for the same reason that a drawing should have no unnecessary lines
and a machine no unnecessary parts." -- William Strunk Jr.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top