How to match some characters different to each other?

Discussion in 'Perl Misc' started by Georg Wittig, Feb 2, 2004.

  1. Georg Wittig

    Georg Wittig Guest

    Hi,

    In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
    are different to each other. The best solution I came up with is the
    following,

    sub find711 {
    my ($arg) = shift;
    my ($string);
    my (%chars) = ();
    if ($arg =~ /([a-z]{7,11})/) {
    $string = $1;
    @chars{split(//,$string)}++;
    return 1 if keys (%chars) == length ($string);
    }
    return 0;
    }

    Is there a shorter or faster or more elegant solution, especially one
    that uses just a single regexp, i.e. without that intermediate hash
    array?

    Thanks for your help,

    --
    /"\ ASCII ribbon | Georg Wittig, FhG -
    \ / campain against|
    X HTML e-mail and|
    / \ news | Der Bagger ist der natuerliche Feind des Internet.
    Georg Wittig, Feb 2, 2004
    #1
    1. Advertising

  2. [posted & mailed]

    On 2 Feb 2004, Georg Wittig wrote:

    >In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
    >are different to each other. The best solution I came up with is the
    >following,


    >sub find711 {
    > my ($arg) = shift;
    > my ($string);
    > my (%chars) = ();
    > if ($arg =~ /([a-z]{7,11})/) {
    > $string = $1;
    > @chars{split(//,$string)}++;
    > return 1 if keys (%chars) == length ($string);
    > }
    > return 0;
    >}
    >
    >Is there a shorter or faster or more elegant solution, especially one
    >that uses just a single regexp, i.e. without that intermediate hash
    >array?


    You can do it by making sure you can't match a character, followed by
    other characters, followed by that character again.

    sub unique_7_11 {
    my $str = shift;
    return $str =~ /^(?!.*([a-z]).*\1)[a-z]{7,11}$/;
    }

    That function returns true if the string is 7 to 11 lowercase letters,
    with no letter used more than once. It returns false if the string
    contains characters that are NOT lowercase letters, if it contains less
    than 7 or more than 11 lowercase characters, or if any lowercase letter is
    used more than once. (It DOES allow for a newline at the end of the
    string, though, just in case.)

    First, notice that I added ^ and $ anchors to the regex. These anchor to
    the beginning and end of the string. Your regex didn't have them, so it
    allowed strings of lengths GREATER than 11 to make it through.

    Second, I've added something to the beginning of the regex. It's a
    "negative look-ahead assertion". Here it is:

    (?!.*([a-z]).*\1)

    It looks complex, but let's break it down:

    .* # match any number of characters
    ([a-z]) # match (and capture to $1) a lowercase letter
    .* # match any number of characters
    \1 # match what was captured into $1

    What this is doing is trying to match a character, and then seeing if it
    can match that character again later in the string. This is ALL wrapped
    inside a (?!...), which I said is a negative look-ahead. This means two
    things: first, if the pattern inside it SUCCEEDS, the look-ahead FAILS
    (because it's a "negative" look-ahead). Second, it only LOOKS AHEAD, it
    doesn't actually consume anything in the string. It's like sending a
    scout ahead of you to see if the coast is clear, and then having the scout
    return. It doesn't end up changing your position in the string.

    You could reorder the regex a bit to make it more efficient:

    sub unique_7_11 {
    my $str = shift;
    return $str =~ /^(?=[a-z]{7,11}$)(?!.*([a-z]).*\1)/;
    }

    This time, we use a positive look-ahead at the front, to make sure FIRST
    that the string matches our 7-11 lowercase character requisite. THEN we
    use the negative look-ahead to make sure no character is repeated.

    Or, you could use two regexes:

    sub unique_7_11 {
    my $str = shift;
    return $str =~ /^[a-z]{7,11}$/ and $str !~ /(.).*\1/;
    }

    Here, we first make sure the string has 7-11 lowercase characters. THEN,
    we make sure the string CANNOT match a character twice.

    For more information, please read the regex documentation that comes with
    Perl, such as:

    perldoc perlre
    perldoc perlretut
    perldoc perlreref

    --
    Jeff Pinyan RPI Acacia Brother #734 2003 Rush Chairman
    "And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
    years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
    Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
    Jeff 'japhy' Pinyan, Feb 2, 2004
    #2
    1. Advertising

  3. (Georg Wittig) writes:

    > In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
    > are different to each other.


    Well the way to text that a string has no repeated characters is

    !/(.).*\1/

    The way to get a sequence of 7 to 11 characters is

    /(.{7,11})/

    What is annoying is that the (?{ code }) assertion always succedes
    even if code evalutes to false.

    I'd like to know what bright spark decided it should work this way
    rather than doing the intuatively obvious and much more useful thing
    of succeding iff code returns a true value.

    So we can't just do:

    /(.{7,11})(?{ $1 !~ m{(.).*\1} })/

    In priciple I believe you can do:

    /(.{7,11})(??{ $1 !~ m{(.).*\1} ? '' : '^' })/

    (We know that '' will always match and '^' never ).

    But that segfaults on 5.8.0 - I've not tried it on a later version.

    > The best solution I came up with is the following,
    >
    > sub find711 {
    > my ($arg) = shift;
    > my ($string);
    > my (%chars) = ();
    > if ($arg =~ /([a-z]{7,11})/) {
    > $string = $1;
    > @chars{split(//,$string)}++;
    > return 1 if keys (%chars) == length ($string);
    > }
    > return 0;
    > }


    That inner bit can be replaced with

    return 1 if $1 !~ /(.).*\1/;

    But your "solution" considers the first string of 7 to 11 lowercase
    letters in $arg. That does not match your problem definition (which
    says nothing about letters) and also misses 'abcdefg' in 'fooabcdefg'
    and in 'abcdefgfoo'.

    Also you forgot to mention that you are only looking for true false!
    You don't want to extract the match. Your problem simplifies to
    looking for a string of exactly 7 characters all different since any
    string of 11 non-repeated characters must start with a string of 7
    non-repeated characters.

    > Is there a shorter or faster or more elegant solution,


    for ( $arg =~ /(?=(.{7}))/g ) {
    return 1 if $1 !~ /(.).*\1/;
    }
    return 0;

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
    Brian McCauley, Feb 2, 2004
    #3
  4. Georg Wittig

    Tore Aursand Guest

    On Mon, 02 Feb 2004 14:47:57 +0100, Georg Wittig wrote:
    > In perl 5.8.3 I need to match a sequence of 7 to 11 characters that
    > are different to each other.


    What do you mean, actually? Do you want to make sure that the characters
    on position 7 through 11 are unique?

    > sub find711 {
    > my ($arg) = shift;
    > my ($string);
    > my (%chars) = ();
    > if ($arg =~ /([a-z]{7,11})/) {
    > $string = $1;
    > @chars{split(//,$string)}++;
    > return 1 if keys (%chars) == length ($string);
    > }
    > return 0;
    > }


    You don't need all those parantheses. Your sub seems to be better off
    written like this, IMO:

    sub find711 {
    my $arg = shift;
    my %chars;
    if ( $arg =~ m,([a-z]{7,11}), ) {
    @chars{split(//, $1)}++;
    return 1 if keys %chars == length( $string );
    }
    return 0;
    }

    I still don't understand what your problem _really_ is, so if you could
    give us some examples...?


    --
    Tore Aursand <>
    "Omit needless words. Vigorous writing is concise. A sentence should
    contain no unnecessary words, a paragraph no unnecessary sentences,
    for the same reason that a drawing should have no unnecessary lines
    and a machine no unnecessary parts." -- William Strunk Jr.
    Tore Aursand, Feb 2, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    355
  2. ~~~ .NET Ed ~~~
    Replies:
    2
    Views:
    197
    ~~~ .NET Ed ~~~
    Nov 24, 2004
  3. Igor Nn
    Replies:
    7
    Views:
    435
    Johnny Morrice
    May 28, 2011
  4. Jack
    Replies:
    36
    Views:
    299
    J├╝rgen Exner
    May 31, 2006
  5. Andy Jeffries
    Replies:
    12
    Views:
    160
    Andy Jeffries
    Sep 27, 2005
Loading...

Share This Page