Regular Expression confusion

C

Christie Taylor

I'm new to regular expressions and am trying to create a fairly simple
one to validate input. The goal is to accept an optional word followed
by zero or one spaces, then a required number of 1 to 8 digits.

I tried m/^word\s{0,1}\d{1,7}/i to get started (ignoring my optional
part) but this is not working, it's matching on way too much. What am I
missing?


Thanks!
 
M

Matt Garrish

Christie Taylor said:
I'm new to regular expressions and am trying to create a fairly simple
one to validate input. The goal is to accept an optional word followed
by zero or one spaces, then a required number of 1 to 8 digits.

I tried m/^word\s{0,1}\d{1,7}/i to get started (ignoring my optional
part) but this is not working, it's matching on way too much. What am I
missing?

"matching on way too much" is not a very good description of your problem.
The only thing I see that is obviously wrong compared to your description is
the \s{0,1}. This will match on any whitespace, not just spaces (i.e., tabs,
newlines, etc.). You might want to try writing it as:

/^word ?\d{1,7}/i

That or provide examples of what it is matching compared to what you were
expecting.

Matt
 
J

John Bokma

Christie said:
I'm new to regular expressions and am trying to create a fairly simple
one to validate input. The goal is to accept an optional word followed
by zero or one spaces, then a required number of 1 to 8 digits.

I tried m/^word\s{0,1}\d{1,7}/i

Your word is not optional
? is a shortcut for zero or one
1,7 v.s. 8 digits..
to get started (ignoring my optional
part) but this is not working, it's matching on way too much. What am I
missing?

Post *always* real code, and real examples. Perl is shorter and more clear
than (your) English.
 
J

John Bokma

Christie said:
I'm new to regular expressions and am trying to create a fairly simple
one to validate input. The goal is to accept an optional word followed
by zero or one spaces, then a required number of 1 to 8 digits.

I tried m/^word\s{0,1}\d{1,7}/i to get started (ignoring my optional
part) but this is not working, it's matching on way too much. What am I
missing?

Also note that your digit match is not anchored, you can replace it with \d
since the pattern also matches 8, 9, .... digits.
 
C

Christie Taylor

John said:
Also note that your digit match is not anchored, you can replace it with \d
since the pattern also matches 8, 9, .... digits.

Ok, here's what I have.

if ($mystring =~ m/(^word)?\s?\d{1,8}$/i) {
print "it matches!\n";
}

word should be optional, but must be first if present.
a number between 1 & 8 digits must be present.
a space may or may not be between the word and the number.

Unfortunately it's matching on a lot more than I want it to :(
Thanks!
 
P

Paul Lalli

Christie said:
To be more specific it is matching on any number of digits and any letters
instead of just _word_. :(

Most likely this is because you've included the 'start of string' anchor
within the optional match. So there's no requirement for the match to
be anchored to the beginning of the string. Try putting the ^ at the
beginning of the pattern, outside the parentheses.

Paul Lalli
 
C

Christie Taylor

Bob said:
Well, one thing you are missing is following the posting guidelines for
this newsgroup where is says something along the lines of "include a
short but complete (with data) program that anyone can copy/paste/run
which illustrates your difficulty". Also "this is not working, it's
matching on way too much" is as vague as "it doesn't work" -- it tells
us nothing -- be specific -- what *exactly* does it match that you don't
think it should?

The regexp you included doesn't appear to match your stated criteria
very well: the "word" isn't optional; the digits matched are one
through seven, not eight -- and, since the trailing end is not anchored,
it will also match on a string with 8, 9, 10, or 100000 digits at that
location. Going from your stated criteria, I would think:

m/^(?:word)?\s?\d{1,8}(?:\D|$)/i

That looks like what I'm trying to do! I'm trying to understand how
everything works. It looks like the ?: inside the first parenthesis makes
word optional. I'm not sure how the final parenthesis works though. Why is
\D (non-digit) used?

Thanks!
 
C

Christie Taylor

Bob said:
No, it is the "?" after the parens that makes "word" optional. The ?:
inside the parens makes the parens non-capturing. Don't guess at
syntax, refer to the truly wonderful reference material present in:

perldoc perlre


Well, in brief explanation:

^ <--causes the rest of the regexp to start
matching at the beginning of the string.
(?:word) <--is just like (word) except it doesn't
capture -- it looks like you're not
capturing.
(?:word)? <--the ? makes the presence of "word"
optional (zero or one occurrences).
\s? <--matches zero or one whitespace characters
\d{1,8} <--matches one to eight digits
(?:\D|$) <--matches without capturing either a
non-digit or the end of the string.
Same as (\D|$) except it doesn't capture.

The non-digit is used to permit trailing non-digit characters after the
last of the digits, so something like:

word 123 blah blah blah

will match (with \D matching a space character). I still don't know if
that is part of what you desire, but if optional content starting with a
non-digit is to be permitted after the digits, this is one of I'm sure
many ways of making that happen.

For additional detail, please refer to:

perldoc perlre
perldoc perlretut
perldoc perlreref

etc etc. perlre is the "bible" of regular expressions -- you'll need to
master it. A good book on the subject, like "Mastering Regular
Expressions", would probably help a lot too.

Ok, thanks so much! I have been reading "Mastering Regular Expressions" by
O'Reilly although I've been overwhelmed by it so far. I'll try experimenting with
some more practice regexes.
 
L

Lukas Mai

Bob Walton schrob:
Christie Taylor wrote:
I'm new to regular expressions and am trying to create a fairly simple
one to validate input. The goal is to accept an optional word followed
by zero or one spaces, then a required number of 1 to 8 digits.

I tried m/^word\s{0,1}\d{1,7}/i to get started (ignoring my optional
part) but this is not working, it's matching on way too much. What am I
missing?
[...]

m/^(?:word)?\s?\d{1,8}(?:\D|$)/i

Note that you can replace (?:\D|$) by a negative look-ahead assertion:

m/^(?:word)?\s?\d{1,8}(?!\d)/i

This looks more intuitive to me (... 1 to 8 digits, not followed by
another digit).

[...]

HTH, Lukas
 
C

Christie Taylor

Lukas said:
Bob Walton schrob:
Christie Taylor wrote:
I'm new to regular expressions and am trying to create a fairly simple
one to validate input. The goal is to accept an optional word followed
by zero or one spaces, then a required number of 1 to 8 digits.

I tried m/^word\s{0,1}\d{1,7}/i to get started (ignoring my optional
part) but this is not working, it's matching on way too much. What am I
missing?
[...]

m/^(?:word)?\s?\d{1,8}(?:\D|$)/i

Note that you can replace (?:\D|$) by a negative look-ahead assertion:

m/^(?:word)?\s?\d{1,8}(?!\d)/i

This looks more intuitive to me (... 1 to 8 digits, not followed by
another digit).

Ok, I'm having trouble. When I do this and type as my test string:

word444 it works
444 it works
word 444 it doesn't work; no match

if ($teststring =~ m/^(?:word)?\s?\d{1,8}(?!\d)/i) {
print "matches\n";
}

What am I doing wrong? thanks.
 
L

Lukas Mai

Christie Taylor schrob:
[...]
Ok, I'm having trouble. When I do this and type as my test string:
word444 it works
444 it works
word 444 it doesn't work; no match
if ($teststring =~ m/^(?:word)?\s?\d{1,8}(?!\d)/i) {
print "matches\n";
}

~/programming/perl $ cat try.pl
#!/usr/local/bin/perl
use warnings;
use strict;

my @strings = ('word444', '444', 'word 444');
for my $teststring (@strings) {
if ($teststring =~ m/^(?:word)?\s?\d{1,8}(?!\d)/i) {
print "[$teststring] matches\n";
} else {
print "[$teststring] doesn't match\n";
}
}
__END__
~/programming/perl $ perl try.pl
[word444] matches
[444] matches
[word 444] matches
~/programming/perl $

What am I doing wrong? thanks.

No idea; it works here.

HTH, Lukas
 
B

Brian McCauley

Bob said:
Also, you are using capturing parentheses. Did you really intend that,
since you're not making use of the captured values (or if you are,
you're doing it wrong, since the capture isn't inside the if block)? If
you don't intend a capture, use (?:regexp) instead, which is identical
to (regexp) except it doesn't capture. That makes your intent clearer.

To those of us who learnt Perl regex before there were non-capturing
parentheses that is not necessarily the case.

In my mind capturing is a natual side-effect of the grouping
paraentheses. If I see the programmer explicily suppress this
side-effect then I start (at least subconciously) looking for a positive
reason why. Bare in mind that even a regex with no captures will reset
the "most recent match" buffer that is accessed via the $1 etc variables.

Hense replacing () with (?:) will make the intent clearer to some
readers and less clear to others and overall is a clarity-neutral change.

Given the rule "always write less code unless this will make your code
less clear" I would say don't use (?:).
 
J

John Bokma

Brian McCauley wrote:

[ snip ]
Hense replacing () with (?:) will make the intent clearer to some
readers and less clear to others and overall is a clarity-neutral
change.

The ones that don't know (?: ... ) should learn it. Capturing it and not
using is way more confusing.
 
B

Brian McCauley

John said:
Brian McCauley wrote:

[ snip ]
Hense replacing () with (?:) will make the intent clearer to some
readers and less clear to others and overall is a clarity-neutral
change.


The ones that don't know (?: ... ) should learn it.

I did not speak of people who did not know about (?:...). I spoke of
people who use or have use regex in some context other than current
versions of Perl. The normal way to do grouping in most dialects of RE
is simply ().

As I said before I learnt Perl before it had (?:...). I also learnt RE
before I learnt Perl and continue to use RE in other environments too.
Capturing it and not using is way more confusing.

You bland assertion that this is the case will not make it true.

I will admit (and indeed have already done so in this thread) that
_some_ people will be confused by the failure to suppress the redundant
capturing side effect of grouping.

Personally when I see /I said "(foo|bar)"/ I am not at all confused by
the fact that it's not being used as a capture.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top