Regexp small question

S

Shai

Hi,

I'm trying to check a string that will not contain some characters and
bump into some problems. The code is:

if (!($str =~/((\w+)|(\.)|(\_)|(\-))$/)
{
print "\nString: $str contains wrong characters!!!\n";
}
else
{
print "\nString is OK.\n";
}

The string can be composed of the following chars: All letters and
digits, "-"(minus), "."(dot) and "_"(underscore).

Any idea how to fix the condition???

Thanks in advanced,
Shai.
 
B

Brian McCauley

Bernard said:
if ($str =~ m/^[\w.-]+$/) {

That works but it's also a common idiom to simplify this by inverting
the char-class and the condition.

if ($str !~ /[^\w.-]/) {
 
A

Anno Siegel

Bernard El-Hagin said:
Brian McCauley said:
Bernard said:
if ($str =~ m/^[\w.-]+$/) {

That works but it's also a common idiom to simplify this by
inverting the char-class and the condition.

if ($str !~ /[^\w.-]/) {


Really? That's a common idiom? Personally, I that is absolutely horrid.
I would *never* use it and I most certainly wouldn't call it a
simplification. I guess it's a matter of what one is used to, but
inverting *two* things to get a result one get get without inverting
*any*thing seems...perverse to me. :)

I agree with brian here (hey, it happens :). I find it perfectly natural
to go from "consists entirely of ..." to "contains nothing outside of ...".
Since the latter doesn't need anchoring and a quantifier, I often prefer
it.

Anno
 
A

Arndt Jonasson

Bernard El-Hagin said:
Brian McCauley said:
Bernard said:
if ($str =~ m/^[\w.-]+$/) {

That works but it's also a common idiom to simplify this by
inverting the char-class and the condition.

if ($str !~ /[^\w.-]/) {


Really? That's a common idiom? Personally, I that is absolutely horrid.
I would *never* use it and I most certainly wouldn't call it a
simplification. I guess it's a matter of what one is used to, but
inverting *two* things to get a result one get get without inverting
*any*thing seems...perverse to me. :)

You can switch the following clauses if the "!=" offends you:

if ($str =~ /[^\w.-]/) {
# bad string
} else {
# good string
}

(To me, it _is_ a simplification in that the ^...+$ makes the other
construction more error-prone.)

However, what about empty strings? The two constructions don't treat
empty strings the same way. Replacing the '+' with '*' would make them
equivalent.
 
B

Brian McCauley

Anno said:
Bernard El-Hagin said:
Bernard El-Hagin wrote:


if ($str =~ m/^[\w.-]+$/) {

That works but it's also a common idiom to simplify this by
inverting the char-class and the condition.

if ($str !~ /[^\w.-]/) {


Really? That's a common idiom? Personally, I that is absolutely horrid.
I would *never* use it and I most certainly wouldn't call it a
simplification. I guess it's a matter of what one is used to, but
inverting *two* things to get a result one get get without inverting
*any*thing seems...perverse to me. :)


I agree with brian here (hey, it happens :). I find it perfectly natural
to go from "consists entirely of ..." to "contains nothing outside of ...".
Since the latter doesn't need anchoring and a quantifier, I often prefer
it.

The OP was expressed "check a string that will not contain some
characters", so, in fact, it is Bernard's solution that is a double
invertion relative to the OP.
 
B

Brian McCauley

Arndt said:
Bernard El-Hagin said:
Bernard El-Hagin wrote:


if ($str =~ m/^[\w.-]+$/) {

That works but it's also a common idiom to simplify this by
inverting the char-class and the condition.

if ($str !~ /[^\w.-]/) {
However, what about empty strings? The two constructions don't treat
empty strings the same way. Replacing the '+' with '*' would make them
equivalent.

Yes, I hadn't spotted that Bernard's solution did the wrong thing with
respect to empty strings. For that matter it also does the wrong thing
with respect to strings with a terminal newline.
 
B

Brian McCauley

Bernard said:
The OP stated that he wants to identify strings which contain *only* \w
. and -. My solution will not match an empty string (since it doesn't
contain any of those characters)

The null string may not contain any of those characters but in a strict
logical sense it _does_ contain *only* those characters. And when it
comes to programming is pays to express things strictly.
nor will it match a string with a
newline (terminal or otherwise) since a newline is *not* one of \w, .
or -.

You didn't test this, did you?

$ perl -e'print "Bernard is wrong\n" if "A\n" =~ m/^[\w.-]+$/'
Bernard is wrong
 
B

Brian McCauley

Bernard said:
The OP said

"The string can be composed of the following chars: All letters and
digits, "-"(minus), "."(dot) and "_"(underscore).

Yes, you are right the OP first expressed the general problem one way,
then expressed a particular example of the problem the other way round.

Since the OP had already noted the equivalance of the two ways of
expressing the problem neither solution could be considered a double
invertion of the OP.

I stand corrected.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top