Match on x instances of a character

John Burgess · Feb 4, 2006

Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
{print STDERR "$test is not 2\n";}

Clearly this doesn't work. I can't see what I'm doing wrong. Tips
appreciated.

John

Brian Wakem · Feb 4, 2006

John said:
Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
{print STDERR "$test is not 2\n";}

Clearly this doesn't work. I can't see what I'm doing wrong. Tips
appreciated.

John

#!/usr/bin/perl

use strict;
use warnings;

while(<DATA>){
chomp;
my $dots = tr/.//;
print "$_ has $dots dots\n";
}

__DATA__
comp.lang
comp.lang.perl
comp.lang.perl.misc

###########

$ perl scripts/tmp/tmp72.pl
comp.lang has 1 dots
comp.lang.perl has 2 dots
comp.lang.perl.misc has 3 dots

See perldoc -q count

Anno Siegel · Feb 4, 2006

John Burgess said:
Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else

What is the /g for? It makes no sense, you're not looking for multiple
occurences of anything. Further, in a character class a dot is not
special, so the "\" is not needed. Third, you forgot an asterisk after
each character class that matches non-dots, so it can never match more
than one non-dot in a row. Fourth, you are using capturing parentheses
for grouping. Fifth, you didn't anchor your match to the beginning and
the end of the string, so, even with the other corrections it would match
anything with two or more dots in it.

{print STDERR "$test is not 2\n";}

Applying all of this to your regex, it becomes

/^(?:[^.]*\.[^.]*){2}$/

which dies indeed match what you want.

However, the easiest (and fastest) way of counting characters is the
tr/// operator:

if ( tr/.// == 2 ) { #...

Anno

John Burgess · Feb 4, 2006

Thanks Brian, I was aware the tr function would do it. However I was
planning to use the match in a grep and so I dont think the tr is so
economical. I am also testing these options for speed and thats part of
the reason for finding the match function. To see which is fastest.
Thanks very much for your input though!

Regards,
John

Brian said:
John Burgess wrote:

Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
{print STDERR "$test is not 2\n";}

Clearly this doesn't work. I can't see what I'm doing wrong. Tips
appreciated.

John

Click to expand...

#!/usr/bin/perl

use strict;
use warnings;

while(<DATA>){
chomp;
my $dots = tr/.//;
print "$_ has $dots dots\n";
}

__DATA__
comp.lang
comp.lang.perl
comp.lang.perl.misc

###########

$ perl scripts/tmp/tmp72.pl
comp.lang has 1 dots
comp.lang.perl has 2 dots
comp.lang.perl.misc has 3 dots

See perldoc -q count

John Burgess · Feb 4, 2006

Seems I really was off the track a bit. I am no regexp pro. I'm trying
though. Your example does indeed work. Your comment about speed is
interesting. Part of the reason for finding the correct match regexp was
to test for speed, which I will still test. The other thing is I want to
use this in a grep and I'm not sure the tr can be used economically in
this context? Thanks for your help. I'll be sure and go over where you
say I've got it wrong. Your comments make a lot of sense.

Regards,
John

Anno said:
John Burgess said:

Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else

Click to expand...

What is the /g for? It makes no sense, you're not looking for multiple
occurences of anything. Further, in a character class a dot is not
special, so the "\" is not needed. Third, you forgot an asterisk after
each character class that matches non-dots, so it can never match more
than one non-dot in a row. Fourth, you are using capturing parentheses
for grouping. Fifth, you didn't anchor your match to the beginning and
the end of the string, so, even with the other corrections it would match
anything with two or more dots in it.

{print STDERR "$test is not 2\n";}

Click to expand...

Applying all of this to your regex, it becomes

/^(?:[^.]*\.[^.]*){2}$/

which dies indeed match what you want.

However, the easiest (and fastest) way of counting characters is the
tr/// operator:

if ( tr/.// == 2 ) { #...

Anno

MikeGee · Feb 4, 2006

John said:
Seems I really was off the track a bit. I am no regexp pro. I'm trying
though. Your example does indeed work. Your comment about speed is
interesting. Part of the reason for finding the correct match regexp was
to test for speed, which I will still test. The other thing is I want to
use this in a grep and I'm not sure the tr can be used economically in
this context? Thanks for your help. I'll be sure and go over where you
say I've got it wrong. Your comments make a lot of sense.

Regards,
John

Anno said:

John Burgess said:

Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else

Click to expand...

What is the /g for? It makes no sense, you're not looking for multiple
occurences of anything. Further, in a character class a dot is not
special, so the "\" is not needed. Third, you forgot an asterisk after
each character class that matches non-dots, so it can never match more
than one non-dot in a row. Fourth, you are using capturing parentheses
for grouping. Fifth, you didn't anchor your match to the beginning and
the end of the string, so, even with the other corrections it would match
anything with two or more dots in it.

{print STDERR "$test is not 2\n";}

Click to expand...

Applying all of this to your regex, it becomes

/^(?:[^.]*\.[^.]*){2}$/

which dies indeed match what you want.

However, the easiest (and fastest) way of counting characters is the
tr/// operator:

if ( tr/.// == 2 ) { #...

Anno

Click to expand...

Why don't you think you can use tr/// in a grep?

@two_dotted = grep { tr/.// == 2 } @newsgroups;

Uri Guttman · Feb 4, 2006

JB> Seems I really was off the track a bit. I am no regexp pro. I'm
JB> trying though. Your example does indeed work. Your comment about
JB> speed is interesting. Part of the reason for finding the correct
JB> match regexp was to test for speed, which I will still test. The
JB> other thing is I want to use this in a grep and I'm not sure the
JB> tr can be used economically in this context? Thanks for your
JB> help. I'll be sure and go over where you say I've got it
JB> wrong. Your comments make a lot of sense.

please stop top posting. read the frequently posted group guidelines for
more about that.

what does 'used economically in this context' mean? what context? why
are you so speed conscious about this? have you found it to be a major
bottleneck and you need more speed? and tr/// isn't a regex so don't
confuse it with them. and tr/// *IS* the fastest way to count chars in a
string. there is no way a regex can beat it for something as simple as
that. tr/// is designed for character oriented operations.

uri

Tad McClellan · Feb 5, 2006

[ Please do not top-post.
Text rearranged into a more sensible order.
]

Note that there *are no* regular expressions used in Anno's suggestion.

Part of the reason for finding the correct match regexp was
to test for speed, which I will still test.

Sounds like premature optimization to me...

The other thing is I want to
use this in a grep and I'm not sure the tr can be used economically in
this context?

The docs for grep() say that it can take any EXPRession.

tr/// is an expression.

my @two_dot_groups = grep tr/.// == 2, @newsgroups;

Replace an occurrence of a regexp with a function call on a substringof the match, multiple times on	4	Sep 16, 2013
Regex to match a numerical IP range	7	Dec 11, 2010
How to disregard the first match of a loop?	22	Aug 9, 2011
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
FAQ 6.23 How can I match strings with multibyte characters?	0	Jan 11, 2011
RegEx - matching previous match	4	Feb 27, 2008
Store a single character AFTER a match	3	Jan 15, 2005
Trying to parse/match a C string literal	12	Sep 24, 2009

Match on x instances of a character

John Burgess

Brian Wakem

Anno Siegel

John Burgess

John Burgess

MikeGee

Uri Guttman

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads