Match on x instances of a character

Discussion in 'Perl Misc' started by John Burgess, Feb 4, 2006.

  1. John Burgess

    John Burgess Guest

    Hi,
    I am having some trouble with regexps and hope someone can help.

    Problem: Iterating through a list of newsgroups and matching only those
    with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
    comp.lang.perl.misc would not.

    (Broken) Solution: I have got something like this

    $test = "comp.lang.perl";
    if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
    {print STDERR "$test is not 2\n";}

    Clearly this doesn't work. I can't see what I'm doing wrong. Tips
    appreciated.

    John
    John Burgess, Feb 4, 2006
    #1
    1. Advertising

  2. John Burgess

    Brian Wakem Guest

    John Burgess wrote:

    > Hi,
    > I am having some trouble with regexps and hope someone can help.
    >
    > Problem: Iterating through a list of newsgroups and matching only those
    > with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
    > comp.lang.perl.misc would not.
    >
    > (Broken) Solution: I have got something like this
    >
    > $test = "comp.lang.perl";
    > if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
    > {print STDERR "$test is not 2\n";}
    >
    > Clearly this doesn't work. I can't see what I'm doing wrong. Tips
    > appreciated.
    >
    > John



    #!/usr/bin/perl

    use strict;
    use warnings;

    while(<DATA>){
    chomp;
    my $dots = tr/.//;
    print "$_ has $dots dots\n";
    }


    __DATA__
    comp.lang
    comp.lang.perl
    comp.lang.perl.misc

    ###########

    $ perl scripts/tmp/tmp72.pl
    comp.lang has 1 dots
    comp.lang.perl has 2 dots
    comp.lang.perl.misc has 3 dots


    See perldoc -q count


    --
    Brian Wakem
    Email: http://homepage.ntlworld.com/b.wakem/myemail.png
    Brian Wakem, Feb 4, 2006
    #2
    1. Advertising

  3. John Burgess

    Anno Siegel Guest

    John Burgess <> wrote in comp.lang.perl.misc:
    > Hi,
    > I am having some trouble with regexps and hope someone can help.
    >
    > Problem: Iterating through a list of newsgroups and matching only those
    > with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
    > comp.lang.perl.misc would not.
    >
    > (Broken) Solution: I have got something like this
    >
    > $test = "comp.lang.perl";
    > if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else


    What is the /g for? It makes no sense, you're not looking for multiple
    occurences of anything. Further, in a character class a dot is not
    special, so the "\" is not needed. Third, you forgot an asterisk after
    each character class that matches non-dots, so it can never match more
    than one non-dot in a row. Fourth, you are using capturing parentheses
    for grouping. Fifth, you didn't anchor your match to the beginning and
    the end of the string, so, even with the other corrections it would match
    anything with two or more dots in it.

    > {print STDERR "$test is not 2\n";}


    Applying all of this to your regex, it becomes

    /^(?:[^.]*\.[^.]*){2}$/

    which dies indeed match what you want.

    However, the easiest (and fastest) way of counting characters is the
    tr/// operator:

    if ( tr/.// == 2 ) { #...

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
    Anno Siegel, Feb 4, 2006
    #3
  4. John Burgess

    John Burgess Guest

    Thanks Brian, I was aware the tr function would do it. However I was
    planning to use the match in a grep and so I dont think the tr is so
    economical. I am also testing these options for speed and thats part of
    the reason for finding the match function. To see which is fastest.
    Thanks very much for your input though!

    Regards,
    John

    Brian Wakem wrote:
    > John Burgess wrote:
    >
    >
    >>Hi,
    >> I am having some trouble with regexps and hope someone can help.
    >>
    >>Problem: Iterating through a list of newsgroups and matching only those
    >>with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
    >>comp.lang.perl.misc would not.
    >>
    >>(Broken) Solution: I have got something like this
    >>
    >>$test = "comp.lang.perl";
    >>if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
    >>{print STDERR "$test is not 2\n";}
    >>
    >>Clearly this doesn't work. I can't see what I'm doing wrong. Tips
    >>appreciated.
    >>
    >>John

    >
    >
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > while(<DATA>){
    > chomp;
    > my $dots = tr/.//;
    > print "$_ has $dots dots\n";
    > }
    >
    >
    > __DATA__
    > comp.lang
    > comp.lang.perl
    > comp.lang.perl.misc
    >
    > ###########
    >
    > $ perl scripts/tmp/tmp72.pl
    > comp.lang has 1 dots
    > comp.lang.perl has 2 dots
    > comp.lang.perl.misc has 3 dots
    >
    >
    > See perldoc -q count
    >
    >
    John Burgess, Feb 4, 2006
    #4
  5. John Burgess

    John Burgess Guest

    Seems I really was off the track a bit. I am no regexp pro. I'm trying
    though. Your example does indeed work. Your comment about speed is
    interesting. Part of the reason for finding the correct match regexp was
    to test for speed, which I will still test. The other thing is I want to
    use this in a grep and I'm not sure the tr can be used economically in
    this context? Thanks for your help. I'll be sure and go over where you
    say I've got it wrong. Your comments make a lot of sense.

    Regards,
    John

    Anno Siegel wrote:
    > John Burgess <> wrote in comp.lang.perl.misc:
    >
    >>Hi,
    >> I am having some trouble with regexps and hope someone can help.
    >>
    >>Problem: Iterating through a list of newsgroups and matching only those
    >>with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
    >>comp.lang.perl.misc would not.
    >>
    >>(Broken) Solution: I have got something like this
    >>
    >>$test = "comp.lang.perl";
    >>if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else

    >
    >
    > What is the /g for? It makes no sense, you're not looking for multiple
    > occurences of anything. Further, in a character class a dot is not
    > special, so the "\" is not needed. Third, you forgot an asterisk after
    > each character class that matches non-dots, so it can never match more
    > than one non-dot in a row. Fourth, you are using capturing parentheses
    > for grouping. Fifth, you didn't anchor your match to the beginning and
    > the end of the string, so, even with the other corrections it would match
    > anything with two or more dots in it.
    >
    >
    >>{print STDERR "$test is not 2\n";}

    >
    >
    > Applying all of this to your regex, it becomes
    >
    > /^(?:[^.]*\.[^.]*){2}$/
    >
    > which dies indeed match what you want.
    >
    > However, the easiest (and fastest) way of counting characters is the
    > tr/// operator:
    >
    > if ( tr/.// == 2 ) { #...
    >
    > Anno
    John Burgess, Feb 4, 2006
    #5
  6. John Burgess

    MikeGee Guest

    John Burgess wrote:
    > Seems I really was off the track a bit. I am no regexp pro. I'm trying
    > though. Your example does indeed work. Your comment about speed is
    > interesting. Part of the reason for finding the correct match regexp was
    > to test for speed, which I will still test. The other thing is I want to
    > use this in a grep and I'm not sure the tr can be used economically in
    > this context? Thanks for your help. I'll be sure and go over where you
    > say I've got it wrong. Your comments make a lot of sense.
    >
    > Regards,
    > John
    >
    > Anno Siegel wrote:
    > > John Burgess <> wrote in comp.lang.perl.misc:
    > >
    > >>Hi,
    > >> I am having some trouble with regexps and hope someone can help.
    > >>
    > >>Problem: Iterating through a list of newsgroups and matching only those
    > >>with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
    > >>comp.lang.perl.misc would not.
    > >>
    > >>(Broken) Solution: I have got something like this
    > >>
    > >>$test = "comp.lang.perl";
    > >>if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else

    > >
    > >
    > > What is the /g for? It makes no sense, you're not looking for multiple
    > > occurences of anything. Further, in a character class a dot is not
    > > special, so the "\" is not needed. Third, you forgot an asterisk after
    > > each character class that matches non-dots, so it can never match more
    > > than one non-dot in a row. Fourth, you are using capturing parentheses
    > > for grouping. Fifth, you didn't anchor your match to the beginning and
    > > the end of the string, so, even with the other corrections it would match
    > > anything with two or more dots in it.
    > >
    > >
    > >>{print STDERR "$test is not 2\n";}

    > >
    > >
    > > Applying all of this to your regex, it becomes
    > >
    > > /^(?:[^.]*\.[^.]*){2}$/
    > >
    > > which dies indeed match what you want.
    > >
    > > However, the easiest (and fastest) way of counting characters is the
    > > tr/// operator:
    > >
    > > if ( tr/.// == 2 ) { #...
    > >
    > > Anno


    Why don't you think you can use tr/// in a grep?

    @two_dotted = grep { tr/.// == 2 } @newsgroups;
    MikeGee, Feb 4, 2006
    #6
  7. John Burgess

    Uri Guttman Guest

    >>>>> "JB" == John Burgess <> writes:

    JB> Seems I really was off the track a bit. I am no regexp pro. I'm
    JB> trying though. Your example does indeed work. Your comment about
    JB> speed is interesting. Part of the reason for finding the correct
    JB> match regexp was to test for speed, which I will still test. The
    JB> other thing is I want to use this in a grep and I'm not sure the
    JB> tr can be used economically in this context? Thanks for your
    JB> help. I'll be sure and go over where you say I've got it
    JB> wrong. Your comments make a lot of sense.

    please stop top posting. read the frequently posted group guidelines for
    more about that.

    what does 'used economically in this context' mean? what context? why
    are you so speed conscious about this? have you found it to be a major
    bottleneck and you need more speed? and tr/// isn't a regex so don't
    confuse it with them. and tr/// *IS* the fastest way to count chars in a
    string. there is no way a regex can beat it for something as simple as
    that. tr/// is designed for character oriented operations.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Feb 4, 2006
    #7
  8. [ Please do not top-post.
    Text rearranged into a more sensible order.
    ]


    John Burgess <> wrote:
    > Anno Siegel wrote:
    >> John Burgess <> wrote in comp.lang.perl.misc:


    >>>Problem: Iterating through a list of newsgroups and matching only those
    >>>with 2 .'s in the name.


    >> However, the easiest (and fastest) way of counting characters is the
    >> tr/// operator:
    >>
    >> if ( tr/.// == 2 ) { #...



    Note that there *are no* regular expressions used in Anno's suggestion.


    > Part of the reason for finding the correct match regexp was
    > to test for speed, which I will still test.



    Sounds like premature optimization to me...


    > The other thing is I want to
    > use this in a grep and I'm not sure the tr can be used economically in
    > this context?



    The docs for grep() say that it can take any EXPRession.

    tr/// is an expression.

    my @two_dot_groups = grep tr/.// == 2, @newsgroups;


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Feb 5, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hiwa
    Replies:
    0
    Views:
    624
  2. Victor
    Replies:
    2
    Views:
    625
    Victor
    May 17, 2004
  3. John Wohlbier
    Replies:
    2
    Views:
    349
    Josiah Carlson
    Feb 22, 2004
  4. ekzept
    Replies:
    0
    Views:
    351
    ekzept
    Aug 10, 2007
  5. Replies:
    8
    Views:
    439
    James Stroud
    Jan 29, 2009
Loading...

Share This Page