Match a number of repeated chars, but NO MORE.

Discussion in 'Perl Misc' started by usenet@DavidFilmer.com, Dec 2, 2005.

  1. Guest

    One particular aspect of a question in another newsgroup
    (http://tinyurl.com/cbakx) interested me; I played around with some
    solutions but couldn't come up with one that I thought was elegant. So
    I thought I would introduce the question to this group for further
    enlightenment.

    <paraphrase> of the OP's question:

    Suppose I have a string of characters: "abCCCdefg". I want to match
    three consecutive occurrences of any character in a class. In this
    example, my expression would match 'CCC'. OK, that's easy:

    #!/usr/bin/perl
    use warnings; use strict;
    my $string = "abCCCdefg";
    print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}/;
    __END__

    But, suppose I wanted to constrain the match so that it would match
    three consecutive occurrences, but NO MORE than three. In other words,
    'abCCCCCdefg' would NOT match. </paraphrase>

    I thought I could propose an 'elegant' answer like this:

    print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;

    but that doesn't work (it seems that \1 gets "used up" somehow). Of
    course, I could write a bunch of code to do it... that's trivial to do
    (but ugly, IMHO).

    Are there any elegant ideas?
     
    , Dec 2, 2005
    #1
    1. Advertising

  2. wrote in news:1133518600.589728.209460
    @g14g2000cwa.googlegroups.com:

    > But, suppose I wanted to constrain the match so that it would match
    > three consecutive occurrences, but NO MORE than three. In other words,
    > 'abCCCCCdefg' would NOT match. </paraphrase>
    >
    > I thought I could propose an 'elegant' answer like this:
    >
    > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;
    >
    > but that doesn't work (it seems that \1 gets "used up" somehow). Of
    > course, I could write a bunch of code to do it... that's trivial to do
    > (but ugly, IMHO).
    >
    > Are there any elegant ideas?


    I can get you part of the way there. Perhaps someone better at regexes
    can take you the rest of the way.

    First, use a "negative lookahead assertion": (and use the x
    modifier!)

    $string =~ / ([\w\d_-]+) \1{2} (?!\1) /x;

    But there's still a problem: Though it won't match the first or second
    "CCC" in your above string, it will match the third "CCC". In other
    words, it'll match the "CCC" that begins after "abCC".

    So you'll need to use a negative lookbehind assertion, too:

    $string =~ /([\w\d_-]+) # Your match
    \1{2} # Two more of it
    (?!\1) # But not another one
    (?<!\1{4}) # Not preceeded by 4 of \1 at this point
    /x;

    But there's a problem: since your match is variable-length (due to the +
    quantifier), the negative lookbehind is variable-length, and that is
    unfortunately not yet implemented in Perl.

    I'm not sure where to take it from here, sorry.

    --
    Eric
    `$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
    $!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
    $_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
    ;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`
     
    Eric J. Roode, Dec 2, 2005
    #2
    1. Advertising

  3. wrote:
    > One particular aspect of a question in another newsgroup
    > (http://tinyurl.com/cbakx) interested me; I played around with some
    > solutions but couldn't come up with one that I thought was elegant. So
    > I thought I would introduce the question to this group for further
    > enlightenment.
    >
    > <paraphrase> of the OP's question:
    >
    > Suppose I have a string of characters: "abCCCdefg". I want to match
    > three consecutive occurrences of any character in a class. In this
    > example, my expression would match 'CCC'. OK, that's easy:
    >
    > #!/usr/bin/perl
    > use warnings; use strict;
    > my $string = "abCCCdefg";
    > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}/;
    > __END__
    >
    > But, suppose I wanted to constrain the match so that it would match
    > three consecutive occurrences, but NO MORE than three. In other words,
    > 'abCCCCCdefg' would NOT match. </paraphrase>
    >
    > I thought I could propose an 'elegant' answer like this:
    >
    > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;
    >
    > but that doesn't work (it seems that \1 gets "used up" somehow). Of
    > course, I could write a bunch of code to do it... that's trivial to do
    > (but ugly, IMHO).
    >
    > Are there any elegant ideas?


    i believe that \w includes \d as well as '_', [\w-] would be the char
    class you want.
     
    it_says_BALLS_on_your_forehead, Dec 2, 2005
    #3
  4. Eric J. Roode wrote:
    > wrote in news:1133518600.589728.209460
    > @g14g2000cwa.googlegroups.com:
    >
    > > But, suppose I wanted to constrain the match so that it would match
    > > three consecutive occurrences, but NO MORE than three. In other words,
    > > 'abCCCCCdefg' would NOT match. </paraphrase>
    > >
    > > I thought I could propose an 'elegant' answer like this:
    > >
    > > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;
    > >
    > > but that doesn't work (it seems that \1 gets "used up" somehow). Of
    > > course, I could write a bunch of code to do it... that's trivial to do
    > > (but ugly, IMHO).
    > >
    > > Are there any elegant ideas?

    >
    > I can get you part of the way there. Perhaps someone better at regexes
    > can take you the rest of the way.
    >
    > First, use a "negative lookahead assertion": (and use the x
    > modifier!)
    >
    > $string =~ / ([\w\d_-]+) \1{2} (?!\1) /x;
    >
    > But there's still a problem: Though it won't match the first or second
    > "CCC" in your above string, it will match the third "CCC". In other
    > words, it'll match the "CCC" that begins after "abCC".
    >
    > So you'll need to use a negative lookbehind assertion, too:
    >
    > $string =~ /([\w\d_-]+) # Your match
    > \1{2} # Two more of it
    > (?!\1) # But not another one
    > (?<!\1{4}) # Not preceeded by 4 of \1 at this point
    > /x;
    >
    > But there's a problem: since your match is variable-length (due to the +
    > quantifier), the negative lookbehind is variable-length, and that is
    > unfortunately not yet implemented in Perl.
    >
    > I'm not sure where to take it from here, sorry.


    hmm, i'm aware of that constraint with lookbehinds. maybe it's too
    early in the morning, but would you need lookbehinds? don't the matches
    on the string occur from left to right, so you only need the negative
    lookahead?
     
    it_says_BALLS_on_your_forehead, Dec 2, 2005
    #4
  5. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > One particular aspect of a question in another newsgroup
    > (http://tinyurl.com/cbakx) interested me; I played around with some
    > solutions but couldn't come up with one that I thought was elegant. So
    > I thought I would introduce the question to this group for further
    > enlightenment.
    >
    > <paraphrase> of the OP's question:
    >
    > Suppose I have a string of characters: "abCCCdefg". I want to match
    > three consecutive occurrences of any character in a class. In this
    > example, my expression would match 'CCC'. OK, that's easy:
    >
    > #!/usr/bin/perl
    > use warnings; use strict;
    > my $string = "abCCCdefg";
    > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}/;
    > __END__


    That regex isn't quite correct, it should only capture one occurrence
    of the repeated character, not more. Also, \w already matches digits
    and underscore:

    /(\w)\1{2}/;

    > But, suppose I wanted to constrain the match so that it would match
    > three consecutive occurrences, but NO MORE than three. In other words,
    > 'abCCCCCdefg' would NOT match. </paraphrase>
    >
    > I thought I could propose an 'elegant' answer like this:
    >
    > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;
    >
    > but that doesn't work (it seems that \1 gets "used up" somehow). Of
    > course, I could write a bunch of code to do it... that's trivial to do
    > (but ugly, IMHO).


    \1 doesn't get consumed, it is interpolated as a character escape, not
    a backreference. [^\1] matches all characters except chr(1).

    A negative lookahead works as intended, but still doesn't solve the
    problem:

    qr/([\w\d_-])\1{2}(?!\1)/;

    This forces the following character to be different from \1, but
    then the regex just moves on and matches the last three "C" in
    "abCCCCCdefg". I don't see a way to force it to match only if
    the preceding character is different from the repeated one.

    Following this vein leads to something like this

    my $re = qr/
    (.) # any character
    (?!\1) # ...followed by a different character
    (\w) # ...which is a word character
    \2{2} # ...followed by exactly two copies of itself
    (?!\2) # ...followed by a different character
    /x;

    That works with the given examples, but only if there is actual text
    before and after the repeated group, not if the repetitions appear
    in the beginning or end of the string. Not to mention elegance...

    Conclusion: It probably can be done in a single regex, but I doubt it
    is worth the effort.

    /((\w)\2{2,})/ and length( $1) == 3

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Dec 2, 2005
    #5
  6. Anno Siegel Guest

    Eric J. Roode <> wrote in comp.lang.perl.misc:
    > wrote in news:1133518600.589728.209460
    > @g14g2000cwa.googlegroups.com:
    >
    > > But, suppose I wanted to constrain the match so that it would match
    > > three consecutive occurrences, but NO MORE than three. In other words,
    > > 'abCCCCCdefg' would NOT match. </paraphrase>


    [...]

    > First, use a "negative lookahead assertion": (and use the x
    > modifier!)
    >
    > $string =~ / ([\w\d_-]+) \1{2} (?!\1) /x;
    >
    > But there's still a problem: Though it won't match the first or second
    > "CCC" in your above string, it will match the third "CCC". In other
    > words, it'll match the "CCC" that begins after "abCC".
    >
    > So you'll need to use a negative lookbehind assertion, too:
    >
    > $string =~ /([\w\d_-]+) # Your match
    > \1{2} # Two more of it
    > (?!\1) # But not another one
    > (?<!\1{4}) # Not preceeded by 4 of \1 at this point
    > /x;
    >
    > But there's a problem: since your match is variable-length (due to the +
    > quantifier), the negative lookbehind is variable-length, and that is
    > unfortunately not yet implemented in Perl.


    Capturing multiple characters isn't right anyway, the "+" ought to
    be outside the parentheses. (With 6 or more "C", the difference shows.)
    But that doesn't solve the problem with variable-length lookbehind.
    It complains if you try to interpolate a backreference, even if the
    backreference can logically only have one definite length.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Dec 2, 2005
    #6
  7. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > One particular aspect of a question in another newsgroup
    > (http://tinyurl.com/cbakx) interested me; I played around with some
    > solutions but couldn't come up with one that I thought was elegant. So
    > I thought I would introduce the question to this group for further
    > enlightenment.
    >
    > <paraphrase> of the OP's question:
    >
    > Suppose I have a string of characters: "abCCCdefg". I want to match
    > three consecutive occurrences of any character in a class. In this
    > example, my expression would match 'CCC'. OK, that's easy:
    >
    > #!/usr/bin/perl
    > use warnings; use strict;
    > my $string = "abCCCdefg";
    > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}/;
    > __END__


    That regex isn't quite correct, it should only capture one occurrence
    of the repeated character, not more. Also, \w already matches digits
    and underscore:

    [Later correction: It doesn't match underscore. I'm not correcting the
    code, id doesn't matter to the discussion]

    /(\w)\1{2}/;

    > But, suppose I wanted to constrain the match so that it would match
    > three consecutive occurrences, but NO MORE than three. In other words,
    > 'abCCCCCdefg' would NOT match. </paraphrase>
    >
    > I thought I could propose an 'elegant' answer like this:
    >
    > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;
    >
    > but that doesn't work (it seems that \1 gets "used up" somehow). Of
    > course, I could write a bunch of code to do it... that's trivial to do
    > (but ugly, IMHO).


    \1 doesn't get consumed, it is interpolated as a character escape, not
    a backreference. [^\1] matches all characters except chr(1).

    A negative lookahead works as intended, but still doesn't solve the
    problem:

    qr/([\w\d_-])\1{2}(?!\1)/;

    This forces the following character to be different from \1, but
    then the regex just moves on and matches the last three "C" in
    "abCCCCCdefg". I don't see a way to force it to match only if
    the preceding character is different from the repeated one.

    Following this vein leads to something like this

    my $re = qr/
    (.) # any character
    (?!\1) # ...followed by a different character
    (\w) # ...which is a word character
    \2{2} # ...followed by exactly two copies of itself
    (?!\2) # ...followed by a different character
    /x;

    That works with the given examples, but only if there is actual text
    before and after the repeated group, not if the repetitions appear
    in the beginning or end of the string. Not to mention elegance...

    Conclusion: It probably can be done in a single regex, but I doubt it
    is worth the effort.

    /((\w)\2{2,})/ and length( $1) == 3

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.

    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Dec 2, 2005
    #7
  8. Anno Siegel wrote:
    > <> wrote in comp.lang.perl.misc:
    > > One particular aspect of a question in another newsgroup
    > > (http://tinyurl.com/cbakx) interested me; I played around with some
    > > solutions but couldn't come up with one that I thought was elegant. So
    > > I thought I would introduce the question to this group for further
    > > enlightenment.
    > >
    > > <paraphrase> of the OP's question:
    > >
    > > Suppose I have a string of characters: "abCCCdefg". I want to match
    > > three consecutive occurrences of any character in a class. In this
    > > example, my expression would match 'CCC'. OK, that's easy:
    > >
    > > #!/usr/bin/perl
    > > use warnings; use strict;
    > > my $string = "abCCCdefg";
    > > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}/;
    > > __END__

    >
    > That regex isn't quite correct, it should only capture one occurrence
    > of the repeated character, not more. Also, \w already matches digits
    > and underscore:
    >
    > [Later correction: It doesn't match underscore. I'm not correcting the
    > code, id doesn't matter to the discussion]


    are you sure it doesn't match underscore?

    my $string2 = '_';
    if ( $string2 =~ m/\w/ ) {
    print "underscore matched.\n";
    }
    else {
    print "underscore did not match.\n";
    }

    __OUTPUT__
    underscore matched.
     
    it_says_BALLS_on_your forehead, Dec 2, 2005
    #8
  9. Anno Siegel wrote:
    > <> wrote in comp.lang.perl.misc:
    > > One particular aspect of a question in another newsgroup
    > > (http://tinyurl.com/cbakx) interested me; I played around with some
    > > solutions but couldn't come up with one that I thought was elegant. So
    > > I thought I would introduce the question to this group for further
    > > enlightenment.
    > >
    > > <paraphrase> of the OP's question:
    > >
    > > Suppose I have a string of characters: "abCCCdefg". I want to match
    > > three consecutive occurrences of any character in a class. In this
    > > example, my expression would match 'CCC'. OK, that's easy:
    > >
    > > #!/usr/bin/perl
    > > use warnings; use strict;
    > > my $string = "abCCCdefg";
    > > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}/;
    > > __END__

    >
    > That regex isn't quite correct, it should only capture one occurrence
    > of the repeated character, not more. Also, \w already matches digits
    > and underscore:
    >
    > [Later correction: It doesn't match underscore. I'm not correcting the
    > code, id doesn't matter to the discussion]
    >
    > /(\w)\1{2}/;
    >
    > > But, suppose I wanted to constrain the match so that it would match
    > > three consecutive occurrences, but NO MORE than three. In other words,
    > > 'abCCCCCdefg' would NOT match. </paraphrase>
    > >
    > > I thought I could propose an 'elegant' answer like this:
    > >
    > > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;
    > >
    > > but that doesn't work (it seems that \1 gets "used up" somehow). Of
    > > course, I could write a bunch of code to do it... that's trivial to do
    > > (but ugly, IMHO).

    >
    > \1 doesn't get consumed, it is interpolated as a character escape, not
    > a backreference. [^\1] matches all characters except chr(1).
    >
    > A negative lookahead works as intended, but still doesn't solve the
    > problem:
    >
    > qr/([\w\d_-])\1{2}(?!\1)/;
    >
    > This forces the following character to be different from \1, but
    > then the regex just moves on and matches the last three "C" in
    > "abCCCCCdefg". I don't see a way to force it to match only if
    > the preceding character is different from the repeated one.
    >


    actually, does the negative lookahead even work? it doesn't seem to. i
    appear to get the same results as the OP, although for a different
    reason perhaps, since you say that in the context of a character class,
    \1 simply is an escaped 1, which is the same as the number 1. when
    using the negative lookahead, it appears that the \1 is 'consumed'
    already.
    (in the example below, it would be \2).

    my $testString = "abCCCCd";
    if ($testString =~ m/((\w)\2{2})(?!\2)/) {
    print "$1. matched\n";
    }
    else {
    print "no match\n";
    }

    __OUTPUT__
    CCC. matched
     
    it_says_BALLS_on_your forehead, Dec 2, 2005
    #9
  10. Anno Siegel Guest

    it_says_BALLS_on_your forehead <> wrote in comp.lang.perl.misc:
    >
    > Anno Siegel wrote:
    > > <> wrote in comp.lang.perl.misc:


    [...]

    > > > <paraphrase> of the OP's question:
    > > >
    > > > Suppose I have a string of characters: "abCCCdefg". I want to match
    > > > three consecutive occurrences of any character in a class. In this
    > > > example, my expression would match 'CCC'. OK, that's easy:


    [...]

    > > > But, suppose I wanted to constrain the match so that it would match
    > > > three consecutive occurrences, but NO MORE than three. In other words,
    > > > 'abCCCCCdefg' would NOT match. </paraphrase>
    > > >
    > > > I thought I could propose an 'elegant' answer like this:
    > > >
    > > > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;
    > > >
    > > > but that doesn't work (it seems that \1 gets "used up" somehow). Of
    > > > course, I could write a bunch of code to do it... that's trivial to do
    > > > (but ugly, IMHO).

    > >
    > > \1 doesn't get consumed, it is interpolated as a character escape, not
    > > a backreference. [^\1] matches all characters except chr(1).
    > >
    > > A negative lookahead works as intended, but still doesn't solve the
    > > problem:
    > >
    > > qr/([\w\d_-])\1{2}(?!\1)/;
    > >
    > > This forces the following character to be different from \1, but
    > > then the regex just moves on and matches the last three "C" in
    > > "abCCCCCdefg". I don't see a way to force it to match only if
    > > the preceding character is different from the repeated one.
    > >

    >
    > actually, does the negative lookahead even work? it doesn't seem to. i
    > appear to get the same results as the OP, although for a different
    > reason perhaps, since you say that in the context of a character class,
    > \1 simply is an escaped 1, which is the same as the number 1. when


    No, it is a character escape. In a non-regex double-quotish string as the
    interior of [] in a regex, "\1" is the character chr( 1), etc.

    > using the negative lookahead, it appears that the \1 is 'consumed'
    > already.
    > (in the example below, it would be \2).
    >
    > my $testString = "abCCCCd";
    > if ($testString =~ m/((\w)\2{2})(?!\2)/) {
    > print "$1. matched\n";
    > }
    > else {
    > print "no match\n";
    > }
    >
    > __OUTPUT__
    > CCC. matched


    So? It matched the last three "C" before "d", as enforced by the
    lookahead:

    my $testString = "abCCCCd";
    if ($testString =~ m/((\w)\2{2})(?!\2)(.*)/) {
    print "$1. matched before $3\n";
    }
    else {
    print "no match\n";
    }

    CCC. matched before d

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Dec 2, 2005
    #10
  11. Anno Siegel wrote:
    > it_says_BALLS_on_your forehead <> wrote in comp.lang.perl.misc:
    > >
    > > Anno Siegel wrote:
    > > > <> wrote in comp.lang.perl.misc:

    >
    > [...]
    >
    > > > > <paraphrase> of the OP's question:
    > > > >
    > > > > Suppose I have a string of characters: "abCCCdefg". I want to match
    > > > > three consecutive occurrences of any character in a class. In this
    > > > > example, my expression would match 'CCC'. OK, that's easy:

    >
    > [...]
    >
    > > > > But, suppose I wanted to constrain the match so that it would match
    > > > > three consecutive occurrences, but NO MORE than three. In other words,
    > > > > 'abCCCCCdefg' would NOT match. </paraphrase>
    > > > >
    > > > > I thought I could propose an 'elegant' answer like this:
    > > > >
    > > > > print "Match!\n" if $string =~ /([\w\d_-]+)\1{2}[^\1]/;
    > > > >
    > > > > but that doesn't work (it seems that \1 gets "used up" somehow). Of
    > > > > course, I could write a bunch of code to do it... that's trivial to do
    > > > > (but ugly, IMHO).
    > > >
    > > > \1 doesn't get consumed, it is interpolated as a character escape, not
    > > > a backreference. [^\1] matches all characters except chr(1).
    > > >
    > > > A negative lookahead works as intended, but still doesn't solve the
    > > > problem:
    > > >
    > > > qr/([\w\d_-])\1{2}(?!\1)/;
    > > >
    > > > This forces the following character to be different from \1, but
    > > > then the regex just moves on and matches the last three "C" in
    > > > "abCCCCCdefg". I don't see a way to force it to match only if
    > > > the preceding character is different from the repeated one.
    > > >

    > >
    > > actually, does the negative lookahead even work? it doesn't seem to. i
    > > appear to get the same results as the OP, although for a different
    > > reason perhaps, since you say that in the context of a character class,
    > > \1 simply is an escaped 1, which is the same as the number 1. when

    >
    > No, it is a character escape. In a non-regex double-quotish string as the
    > interior of [] in a regex, "\1" is the character chr( 1), etc.
    >
    > > using the negative lookahead, it appears that the \1 is 'consumed'
    > > already.
    > > (in the example below, it would be \2).
    > >
    > > my $testString = "abCCCCd";
    > > if ($testString =~ m/((\w)\2{2})(?!\2)/) {
    > > print "$1. matched\n";
    > > }
    > > else {
    > > print "no match\n";
    > > }
    > >
    > > __OUTPUT__
    > > CCC. matched

    >
    > So? It matched the last three "C" before "d", as enforced by the
    > lookahead:
    >
    > my $testString = "abCCCCd";
    > if ($testString =~ m/((\w)\2{2})(?!\2)(.*)/) {
    > print "$1. matched before $3\n";
    > }
    > else {
    > print "no match\n";
    > }
    >
    > CCC. matched before d
    >


    ahh, i suspected that was happening, but hadn't pursued it further--my
    fault for being lazy. thanks for the illumination Anno.
     
    it_says_BALLS_on_your forehead, Dec 2, 2005
    #11
  12. Guest

    A test string for some proposed solutions:
    $_="asCCCCCCChwCCCsad";
    ------------------------------------------------------
    # it_says_BALLS_on_your forehead's solution:
    print $` if/((\w)\2{2})(?!\2)/;
    #asCCCC ==> no
    -------------------------------------------------------
    #Anno's solution:
    print $` if(/((\w)\2{2,})/ and length( $1) == 3);
    #empty ==> no
    #Anno's thought:
    while(/((\w)\2{2,})/g) {
    print $` if(length( $1) == 3);
    }
    }
    #asCCCCCCChw => ok
    --------------------------------------------------------
    #Steven's solution:
    print $` if /
    (\w) # a char
    (??{ '(?<=' . ("$1" x 3) .')' }) # as the third in a series
    (??{ '(?<!' . ("$1" x 4) .')' }) # but not the fourth
    (?!\1)/x; # not followd by the same
    --------------------------------------------------------------
    #empty
    Best,
    XC
     
    , Dec 2, 2005
    #12
  13. wrote:
    > A test string for some proposed solutions:
    > $_="asCCCCCCChwCCCsad";
    > ------------------------------------------------------
    > # it_says_BALLS_on_your forehead's solution:
    > print $` if/((\w)\2{2})(?!\2)/;
    > #asCCCC ==> no


    actually, that was NOT my solution. i stated that the above regex did
    NOT work.
     
    it_says_BALLS_on_your forehead, Dec 2, 2005
    #13
  14. Guest

    wrote:
    > Agreed. The latter is much better than this:
    >
    > print if /
    > (\w) # a char
    > (??{ '(?<=' . ("$1" x 3) .')' }) # as the third in a series
    > (??{ '(?<!' . ("$1" x 4) .')' }) # but not the fourth
    > (?!\1)/x; # not followd by same char


    I also agree that Anno's solution is probably the most practical
    solution that's been proposed, but this is a VERY interesting approach
    (and I learned something today!) Thanks!
     
    , Dec 2, 2005
    #14
  15. robic0 Guest

    On 2 Dec 2005 12:55:23 -0800, wrote:

    > wrote:
    >> Agreed. The latter is much better than this:
    >>
    >> print if /
    >> (\w) # a char
    >> (??{ '(?<=' . ("$1" x 3) .')' }) # as the third in a series
    >> (??{ '(?<!' . ("$1" x 4) .')' }) # but not the fourth
    >> (?!\1)/x; # not followd by same char

    >
    >I also agree that Anno's solution is probably the most practical
    >solution that's been proposed, but this is a VERY interesting approach
    >(and I learned something today!) Thanks!

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    Perl sucks?
     
    robic0, Dec 3, 2005
    #15
  16. Guest

    robic0 wrote:
    > On 2 Dec 2005 12:55:23 -0800, wrote:
    > >(and I learned something today!) Thanks!

    > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    > Perl sucks?


    Actually, my appreciation of Perl was raised a little. But I wasn't
    really thinking about Perl at all when I wrote those comments. I was
    thinking of the logic of Steven's algorithm, which is independent of
    the programming language which expresses it. I really do believe that
    Donald Knuth himself would admire Steven's approach.
     
    , Dec 3, 2005
    #16
  17. Guest

    wrote:
    > robic0 wrote:
    > > On 2 Dec 2005 12:55:23 -0800, wrote:
    > > >(and I learned something today!) Thanks!

    > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    > > Perl sucks?

    >
    > Actually, my appreciation of Perl was raised a little. But I wasn't
    > really thinking about Perl at all when I wrote those comments. I was
    > thinking of the logic of Steven's algorithm, which is independent of
    > the programming language which expresses it. I really do believe that
    > Donald Knuth himself would admire Steven's approach.




    Credit should go to Anno who analyzed the problem in
    succinct and logical way. I just worked the problem a little
    bit further.

    To allow the regex to match a sequence of any character, one
    should add the \Q escape sequence -- that's something that I
    previously neglected:

    print if (/(.)
    (??{ '(?<=' . ("\Q$1\E" x 3) . ')' })
    (??{ '(?<!' . ("\Q$1\E" x 4) . ')' })
    (?!\1)/x;


    And if I ever were to make the same offer as Knuth -- to pay
    a small finder's fee to others who could find a bug in my programs
    -- I'd end up owing more than the U.S. National Debt. :)

    --
    Regards,
    Steven
     
    , Dec 3, 2005
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kosio

    Floats to chars and chars to floats

    Kosio, Sep 16, 2005, in forum: C Programming
    Replies:
    44
    Views:
    1,295
    Tim Rentsch
    Sep 23, 2005
  2. Sathyaish
    Replies:
    11
    Views:
    1,678
    Daniel Pitts
    Apr 4, 2007
  3. Hongyu
    Replies:
    9
    Views:
    916
    James Kanze
    Aug 8, 2008
  4. Lars Schouw
    Replies:
    1
    Views:
    374
    Sousuke
    Mar 26, 2010
  5. M.Posseth

    receiving ??? chars instead of "special" chars

    M.Posseth, Nov 15, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    233
    Dan Rogers
    Nov 16, 2004
Loading...

Share This Page