unexplained warning message in m{...} regexp


K

Klaus

I am trying to match a literal string '{0,0}' using the syntax m{...}.
I know that I have to escape both the '{' and '}' characters.

Here is my program
========================
use strict;
use warnings;

$_ = '{0,0}';
if (m{\A\{0,0\}\z}) {
print "yes\n";
}
else {
print "no\n";
}
========================

The regexp works as intended and prints "yes", but there is an
unexplained warning message:

========================
Quantifier unexpected on zero-length expression in regex; marked by
<-- HERE in m/\A{0,0}\z <-- HERE / at Testregexp.pl line 5.
yes
========================

The message does not appear if I use /\A\{0,0\}\z/.

It seems to me that Perl is confused about using '{' and '}' inside a
match of the form m{...}

I am using Activestate Perl 5.10 on Windows XP.

C:\>perl -v

This is perl, v5.10.0 built for MSWin32-x86-multi-thread
(with 5 registered patches, see perl -V for more detail)

Copyright 1987-2007, Larry Wall

Binary build 1004 [287188] provided by ActiveState http://www.ActiveState.com
Built Sep 3 2008 13:16:37
 
Ad

Advertisements

F

Frank Seitz

Klaus said:
It seems to me that Perl is confused about using '{' and '}' inside a
match of the form m{...}

Perl is not confused. It's a syntax error, because { and } have a special
meaning in regexes. See perldoc perlre (Section "Quantifiers").

Frank
 
K

Klaus

Perl is not confused. It's a syntax error, because { and } have a special
meaning in regexes. See perldoc perlre (Section "Quantifiers").

Please note that I have escaped '\{' and '\}' inside m{\A\{0,0\}\z}...

....and why does the message disappear if I use /\A\{0,0\}\z/. ?
 
F

Frank Seitz

Klaus said:
Please note that I have escaped '\{' and '\}' inside m{\A\{0,0\}\z}...

Here, the \-escape eliminates the meaning as delimiter.
The { } become metacharacters.
...and why does the message disappear if I use /\A\{0,0\}\z/. ?

Here, the \-escape eliminates the meaning as metacharacter.
The { } become normal characters.

Frank
 
T

Teo

Dear Franz,

Perl is not confused. It's a syntax error, because { and } have a special
meaning in regexes. See perldoc perlre (Section "Quantifiers").

No is not: the curly brackets are correctly escaped. The problem only
occurs if {} are used: other bracketing delimiters (e.g., m(\A\{0,0\}
\z) ) do not provoke the warning.

I can reproduce the problem with both 5.8.9 and 5.10.0

Matteo
 
F

Frank Seitz

Teo said:
No is not: the curly brackets are correctly escaped. The problem only
occurs if {} are used: other bracketing delimiters (e.g., m(\A\{0,0\}
\z) ) do not provoke the warning.

I can reproduce the problem with both 5.8.9 and 5.10.0

See <[email protected]>

Frank
 
Ad

Advertisements

H

Helmut Wollmersdorfer

Klaus said:
I am trying to match a literal string '{0,0}' using the syntax m{...}.
I know that I have to escape both the '{' and '}' characters.

Here is my program
========================
use strict;
use warnings;

$_ = '{0,0}';
if (m{\A\{0,0\}\z}) {
print "yes\n";
} [...]
========================
Quantifier unexpected on zero-length expression in regex; marked by
<-- HERE in m/\A{0,0}\z <-- HERE / at Testregexp.pl line 5.
yes
========================

Same here - Perl 5.19 on Debian/Linux.

Seems to be bug.

Workarounds:

m{\A[{]0,0[}]\z}
m{\A\{0\,0\}\z}

Helmut Wollmersdorfer
 
T

Teo

Here, the \-escape eliminates the meaning as delimiter.
The { } become metacharacters.


Here, the \-escape eliminates the meaning as metacharacter.
The { } become normal characters.

Ok I see but this is rather confusing:

* in a // delimited regex the literal '/' has to be escaped
* in a {} delimited regex the literal '{' has *not* to be escaped

am I getting it right?

But then when I look at perlop

When searching for single-character delimiters, escaped delimiters
and "\\" are skipped. For example, while
searching for terminating "/", combinations of "\\" and "\/" are
skipped. If the delimiters are bracketing, nested
pairs are also skipped. For example, while searching for closing
"]" paired with the opening "[", combinations of
"\\", "\]", and "\[" are all skipped, and nested "[" and "]" are
skipped as well. However, when backslashes are
used as the delimiters (like "qq\\" and "tr\\\"), nothing is
skipped. During the search for the end, backslashes
that escape delimiters are removed (exactly speaking, they are not
copied to the safe location).

it gets more confusing. If I understand correctly the difference is
only there if the { } are paired.
In m{ aaa{bbb } the { is not escaped and it is understood as the
beginning of a quantifier.

In fact I get:

Search pattern not terminated at ./test.pl line 6.

So to have a literal '{' I should escape it if not paired and not
escape it if not paired.

Did I get it wrong? (I sincerly hope so :)

Matteo
 
K

Klaus

Here, the \-escape eliminates the meaning as delimiter.

I see, thanks for the explanation.
The { } become metacharacters.

I find it unfortunate that they become metacharacters, particularly so
because there is no reason to quote metacharacters { } in the first
place, as they always come in pairs and are handled natuarally by m{...
{ }...} as a nested pair of curlies, for example m{a{1,2}}
 
K

Klaus

Klaus said:
I am trying to match a literal string '{0,0}' using the syntax m{...}.
I know that I have to escape both the '{' and '}' characters.
Here is my program
========================
use strict;
use warnings;
$_ = '{0,0}';
if (m{\A\{0,0\}\z}) {
    print "yes\n";
} [...]
========================
Quantifier unexpected on zero-length expression in regex; marked by
<-- HERE in m/\A{0,0}\z <-- HERE / at Testregexp.pl line 5.
yes
========================

Same here - Perl 5.19 on Debian/Linux.

Seems to be bug.

Workarounds:

m{\A[{]0,0[}]\z}
m{\A\{0\,0\}\z}

Thanks for the workarounds, they work ok. But I agree that the
original problem is a bug. Where can I post a bug report ?
 
Ad

Advertisements

J

Jürgen Exner

Klaus said:
I am trying to match a literal string '{0,0}' using the syntax m{...}.

If you don't mind me asking: why? I mean why do you use REs if you don't
want their functionality ("match a literal string")?
A simple index() is so much easier to use.

jue
 
K

Klaus

If you don't mind me asking: why? I mean why do you use REs if you don't
want their functionality ("match a literal string")?
A simple index() is so much easier to use.

You are right, in fact my regexp is anchored with \A and \z, so a
simple $_ eq '{0,0}' should suffice, that's easier to read and
probably much faster.

However, I maintain that a regexp m{\A\{0,0\}\z} should not emit a
warning message and I have filed a bugreport.
 
W

Willem

Klaus wrote:
) You are right, in fact my regexp is anchored with \A and \z, so a
) simple $_ eq '{0,0}' should suffice, that's easier to read and
) probably much faster.
)
) However, I maintain that a regexp m{\A\{0,0\}\z} should not emit a
) warning message and I have filed a bugreport.

Then how are you supposed to put a quantifier in an m{...} expression ?


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
K

Klaus

Klaus wrote:

) You are right, in fact my regexp is anchored with \A and \z, so a
) simple $_ eq '{0,0}' should suffice, that's easier to read and
) probably much faster.
)
) However, I maintain that a regexp m{\A\{0,0\}\z} should not emit a
) warning message and I have filed a bugreport.

Then how are you supposed to put a quantifier in an m{...} expression ?

By not escaping the curlies { }, for example

m{a{1,2}} matches 'a' or 'aa'
m{a\{1,2\}} matches 'a{1,2}'
 
Ad

Advertisements

I

Ilya Zakharevich

Ok I see but this is rather confusing:

* in a // delimited regex the literal '/' has to be escaped
* in a {} delimited regex the literal '{' has *not* to be escaped

am I getting it right?

No. Let me try (untested):

* in a {}-delimited regex escaping '{' won't make it into a literal.
(AND unescaped '{' should properly nest).

[There are two different mechanisms of unescaping in the lifetime of
a REx.

a) First, the parser removes delimiters (and unescapes escaped
delimiters) (it may also remove certain other escapes - do not
remember details).

b) The result is passed to REx engine. It processes all the
remaining special-for-REx escapes.

I did not have time to document it when I was working on Perl
RExes. I doubt the docs improved from that time...]

The difference is kinda subtle. E.g., variables interpolated in RExes
are subject ONLY to "b"-unescaping. Also, one can see the result of
"a" in debugging output of

use re 'debugcolor';

Hope this helps,
Ilya

P.S. If one tries to use \ as a delimiter, one can get yet funnier
quirks of this 2-step semantic... ;-)
 
X

Xho Jingleheimerschmidt

Klaus said:
I am trying to match a literal string '{0,0}' using the syntax m{...}.
I know that I have to escape both the '{' and '}' characters.

Here is my program
========================
use strict;
use warnings;

$_ = '{0,0}';
if (m{\A\{0,0\}\z}) {
print "yes\n";
}
else {
print "no\n";
}
========================

The regexp works as intended and prints "yes",

Just because it prints yes when you expect it to doesn't mean it works
as intended. If you intend to do addition, then 2*2 gives the expected
answer, yet doesn't work as intended.
It seems to me that Perl is confused about using '{' and '}' inside a
match of the form m{...}

Turn the m into a q and print the result:

/home/user> perl -wle 'print q{\A\{0,0\}\Z}'
\A{0,0}\Z
/home/user> perl -wle 'print q/\A\{0,0\}\Z/'
\A\{0,0\}\Z

This is a generic property of quote like operators, not peculiar to the
regex variety of them.

Xho
 
X

Xho Jingleheimerschmidt

Klaus said:
I find it unfortunate that they become metacharacters,

I wouldn't say they become metacharacters, they are metacharacters.
That is what they started as, and that is what they return to when their
backwhacks get eaten.

particularly so
because there is no reason to quote metacharacters { } in the first
place,

Of course there is. If they were not quoted, they would be either hash
constructors or code blocks, rather than either literal characters or
regex special characters.
as they always come in pairs and are handled natuarally by m{...
{ }...} as a nested pair of curlies, for example m{a{1,2}}

They don't always come in pairs. What if the literal string you wanted
to match were '{0,0' ?

Xho
 
Ad

Advertisements

K

Klaus

  [There are two different mechanisms of unescaping in the lifetime of
   a REx.

   a) First, the parser removes delimiters

Ok, so far I am with you.
(and unescapes escaped delimiters)
(it may also remove certain other escapes - do not remember details).

Why unescaping escaped delimiters ?
   b) The result is passed to REx engine.  It processes all the
      remaining special-for-REx escapes.

I don't know how the REx engine works, but I would be surprised if it
could not handle an escaped delimiter (such as '\{' in my case),
whereas at the same time it can handle an escaped non-delimiter (such
as '\[' for example).
   I did not have time to document it when I was working on Perl
   RExes.  I doubt the docs improved from that time...]

That's why I was confused when I came across this case.
The difference is kinda subtle.  E.g., variables interpolated in RExes
are subject ONLY to "b"-unescaping.

This information would be useful in the documentation.
 Also, one can see the result of
"a" in debugging output of

   use re 'debugcolor';

Hope this helps,
Ilya

P.S.  If one tries to use \ as a delimiter, one can get yet funnier
      quirks of this 2-step semantic...  ;-)

This information would also be useful in the documentation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top