Match on x instances of a character

J

John Burgess

Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
{print STDERR "$test is not 2\n";}

Clearly this doesn't work. I can't see what I'm doing wrong. Tips
appreciated.

John
 
B

Brian Wakem

John said:
Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
{print STDERR "$test is not 2\n";}

Clearly this doesn't work. I can't see what I'm doing wrong. Tips
appreciated.

John


#!/usr/bin/perl

use strict;
use warnings;

while(<DATA>){
chomp;
my $dots = tr/.//;
print "$_ has $dots dots\n";
}


__DATA__
comp.lang
comp.lang.perl
comp.lang.perl.misc

###########

$ perl scripts/tmp/tmp72.pl
comp.lang has 1 dots
comp.lang.perl has 2 dots
comp.lang.perl.misc has 3 dots


See perldoc -q count
 
A

Anno Siegel

John Burgess said:
Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else

What is the /g for? It makes no sense, you're not looking for multiple
occurences of anything. Further, in a character class a dot is not
special, so the "\" is not needed. Third, you forgot an asterisk after
each character class that matches non-dots, so it can never match more
than one non-dot in a row. Fourth, you are using capturing parentheses
for grouping. Fifth, you didn't anchor your match to the beginning and
the end of the string, so, even with the other corrections it would match
anything with two or more dots in it.
{print STDERR "$test is not 2\n";}

Applying all of this to your regex, it becomes

/^(?:[^.]*\.[^.]*){2}$/

which dies indeed match what you want.

However, the easiest (and fastest) way of counting characters is the
tr/// operator:

if ( tr/.// == 2 ) { #...

Anno
 
J

John Burgess

Thanks Brian, I was aware the tr function would do it. However I was
planning to use the match in a grep and so I dont think the tr is so
economical. I am also testing these options for speed and thats part of
the reason for finding the match function. To see which is fastest.
Thanks very much for your input though!

Regards,
John

Brian said:
John Burgess wrote:

Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
{print STDERR "$test is not 2\n";}

Clearly this doesn't work. I can't see what I'm doing wrong. Tips
appreciated.

John



#!/usr/bin/perl

use strict;
use warnings;

while(<DATA>){
chomp;
my $dots = tr/.//;
print "$_ has $dots dots\n";
}


__DATA__
comp.lang
comp.lang.perl
comp.lang.perl.misc

###########

$ perl scripts/tmp/tmp72.pl
comp.lang has 1 dots
comp.lang.perl has 2 dots
comp.lang.perl.misc has 3 dots


See perldoc -q count
 
J

John Burgess

Seems I really was off the track a bit. I am no regexp pro. I'm trying
though. Your example does indeed work. Your comment about speed is
interesting. Part of the reason for finding the correct match regexp was
to test for speed, which I will still test. The other thing is I want to
use this in a grep and I'm not sure the tr can be used economically in
this context? Thanks for your help. I'll be sure and go over where you
say I've got it wrong. Your comments make a lot of sense.

Regards,
John

Anno said:
John Burgess said:
Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else


What is the /g for? It makes no sense, you're not looking for multiple
occurences of anything. Further, in a character class a dot is not
special, so the "\" is not needed. Third, you forgot an asterisk after
each character class that matches non-dots, so it can never match more
than one non-dot in a row. Fourth, you are using capturing parentheses
for grouping. Fifth, you didn't anchor your match to the beginning and
the end of the string, so, even with the other corrections it would match
anything with two or more dots in it.

{print STDERR "$test is not 2\n";}


Applying all of this to your regex, it becomes

/^(?:[^.]*\.[^.]*){2}$/

which dies indeed match what you want.

However, the easiest (and fastest) way of counting characters is the
tr/// operator:

if ( tr/.// == 2 ) { #...

Anno
 
M

MikeGee

John said:
Seems I really was off the track a bit. I am no regexp pro. I'm trying
though. Your example does indeed work. Your comment about speed is
interesting. Part of the reason for finding the correct match regexp was
to test for speed, which I will still test. The other thing is I want to
use this in a grep and I'm not sure the tr can be used economically in
this context? Thanks for your help. I'll be sure and go over where you
say I've got it wrong. Your comments make a lot of sense.

Regards,
John

Anno said:
John Burgess said:
Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else


What is the /g for? It makes no sense, you're not looking for multiple
occurences of anything. Further, in a character class a dot is not
special, so the "\" is not needed. Third, you forgot an asterisk after
each character class that matches non-dots, so it can never match more
than one non-dot in a row. Fourth, you are using capturing parentheses
for grouping. Fifth, you didn't anchor your match to the beginning and
the end of the string, so, even with the other corrections it would match
anything with two or more dots in it.

{print STDERR "$test is not 2\n";}


Applying all of this to your regex, it becomes

/^(?:[^.]*\.[^.]*){2}$/

which dies indeed match what you want.

However, the easiest (and fastest) way of counting characters is the
tr/// operator:

if ( tr/.// == 2 ) { #...

Anno

Why don't you think you can use tr/// in a grep?

@two_dotted = grep { tr/.// == 2 } @newsgroups;
 
U

Uri Guttman

JB> Seems I really was off the track a bit. I am no regexp pro. I'm
JB> trying though. Your example does indeed work. Your comment about
JB> speed is interesting. Part of the reason for finding the correct
JB> match regexp was to test for speed, which I will still test. The
JB> other thing is I want to use this in a grep and I'm not sure the
JB> tr can be used economically in this context? Thanks for your
JB> help. I'll be sure and go over where you say I've got it
JB> wrong. Your comments make a lot of sense.

please stop top posting. read the frequently posted group guidelines for
more about that.

what does 'used economically in this context' mean? what context? why
are you so speed conscious about this? have you found it to be a major
bottleneck and you need more speed? and tr/// isn't a regex so don't
confuse it with them. and tr/// *IS* the fastest way to count chars in a
string. there is no way a regex can beat it for something as simple as
that. tr/// is designed for character oriented operations.

uri
 
T

Tad McClellan

[ Please do not top-post.
Text rearranged into a more sensible order.
]




Note that there *are no* regular expressions used in Anno's suggestion.

Part of the reason for finding the correct match regexp was
to test for speed, which I will still test.


Sounds like premature optimization to me...

The other thing is I want to
use this in a grep and I'm not sure the tr can be used economically in
this context?


The docs for grep() say that it can take any EXPRession.

tr/// is an expression.

my @two_dot_groups = grep tr/.// == 2, @newsgroups;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,444
Messages
2,571,709
Members
48,796
Latest member
Greg L.
Top