unexplained warning message in m{...} regexp

K

Klaus

 > as they always come in pairs and are handled natuarally by m{...


They don't always come in pairs.  What if the literal string you wanted
to match were '{0,0' ?

I was talking about '{' and '}' as metacharacters. A '{' in /a{0,0}/
is a metacharacter and always comes paired with a '}'. Those { }
metacharacters are never escaped in a regexp.

The '{' in '{0,0' is a literal curly, and they, of course, can come in
any number or combination. Any { } literal has to be always escaped in
a regexp.
 
S

sln

Dear Franz,



No is not: the curly brackets are correctly escaped. The problem only occurs if {} are used:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wrong! same problem, different side effects
other bracketing delimiters (e.g., m(\A\{0,0\}
^^^^^^^^^^
The problem is with all closure delimeters, including {} and ().

Perl doesn't look like it has had unit testing.
What about that Illya?

See if you can spot the error using () as delimeters. Consider this:

use strict;
use warnings;

$_ = '(0,0)';

my $Ax = qr (\(0,0\));
print "Ax (bad) = $Ax \n";

if ( /$Ax/ ) {
print "yes - \$Ax matched '$1'\n";
}
else {
print "no\n";
}

print "\n";

my $Bx = qr /(\(0,0\))/;
print "Bx (good) = $Bx \n";

if ( /$Bx/ ) {
print "yes - \$Bx matched '$1'\n";
}
else {
print "no\n";
}
__END__

Output:

Ax (bad) = (?-xism:(0,0))
yes - $Ax matched '0,0'

Bx (good) = (?-xism:(\(0,0\)))
yes - $Bx matched '(0,0)'
 
S

sln

By not escaping the curlies { }, for example

m{a{1,2}} matches 'a' or 'aa'
m{a\{1,2\}} matches 'a{1,2}'
^^^^^^^^^^^^^
These look identical, matching 'a' or 'aa'.
I think thats the point, escapes are striped on delimeters with closure's.
The same thing happesn when using () as delimeters.
This is not the case with single character delimiters.

-sln

-----------------------
Output:
(?-xism:a{1,2})
(?-xism:a{1,2})
(?-xism:(a{1,2}))
(?-xism:(a{1,2}))
(?-xism:/a{1,2}/)
------------------------
use strict;
use warnings;

my $rx1 = qr {a{1,2}};
my $rx2 = qr {a\{1,2\}};
print $rx1,"\n";
print $rx2,"\n";

my $rx3 = qr ((a{1,2}));
my $rx4 = qr (\(a{1,2}\));
print $rx3,"\n";
print $rx4,"\n";

my $rx5 = qr /\/a{1,2}\//;
print $rx5,"\n";
 
I

Ilya Zakharevich

  [There are two different mechanisms of unescaping in the lifetime of
   a REx.

   a) First, the parser removes delimiters

Ok, so far I am with you.
(and unescapes escaped delimiters)
(it may also remove certain other escapes - do not remember details).

Why unescaping escaped delimiters ?

The step "b" knows nothing about delimiters (it operates on strings).
The "native operation" of step "b" is as in

my $foo = WHATEVER;
/$foo/;

Yours,
Ilya
 
I

Ilya Zakharevich

A '{' in /a{0,0}/
is a metacharacter and always comes paired with a '}'. Those { }
metacharacters are never escaped in a regexp.
The '{' in '{0,0' is a literal curly, and they, of course, can come in
any number or combination. Any { } literal has to be always escaped in
a regexp.

I do not think this makes a lot of sense (especially "has to be"). I
can read it (because I knew already the ideas you wanted to express),
but I'm pretty sure it would confuse many other readers...

[The semantic of `{' in a REx is nothing to proud of (I mean the
fact that it DWIMs, so one can never be sure). I think this
semantic is inherited from Henri's old implementation.]

=======================================================

In a saner Perl, /a{0,0/ would be a syntax error. But it is not, and
it is accepted as a literal string match.

So "unless one is really-really sure" they know what they are doing,
literal `{' should better be escaped, since SOMETIMES it is may be
interpreted as a metachar...

Ilya
 
S

sln

^^^^^^^^^^^^^
These look identical, matching 'a' or 'aa'.
I think thats the point, escapes are striped on delimeters with closure's.
The same thing happesn when using () as delimeters.
This is not the case with single character delimiters.
Looking at it again, all escaped delimiters are unescaped by the parser
before it gets to the regex engine (Illya's step a.).

So, if you will be looking for thier literals,
I guess the trick is to *not* use a delimiter that is/are regex metachars; the escape itself '\',
nor {},+*.?, nor paranthetical grouping (), nor character class [], etc..

As luck would have it there is the forward slash '/' and many more.

qr {a{1,2}} ~ (?-xism:a{1,2})
qr {a\{1,2\}} ~ (?-xism:a{1,2})
qr ((a{1,2})) ~ (?-xism:(a{1,2}))
qr (\(a{1,2}\)) ~ (?-xism:(a{1,2}))
qr [\[a{1,2}\]] ~ (?-xism:[a{1,2}])
qr /\/a{1,2}\// ~ (?-xism:/a{1,2}/)

Thanks for this exercise.
-sln
 
J

John W. Krahn

hymie! said:
In our last episode, the evil Dr. Lacto had captured our hero,


I've read through about half of this thread, and not that I'm expert,
but the problem seems to be three-fold:

(1) using { } as your RE delimiters
(2) using { } as part of your RE
(3) { } already having their own meaning as RE meta-characters.

Since (3) is part of the language and (2) is part of your data, altering
(1) is probably your best bet.

(3) is irrelevant because it does the same thing with
non-meta-characters as well:

$ perl -le'print qr{\A\{oops\}\z}'
(?-xism:\A{oops}\z)
$ perl -le'print qr<\A\<oops\>\z>'
(?-xism:\A<oops>\z)



John
 
J

John W. Krahn

Ben said:
It's not, since /(?-xism:\A<oops>\z)/ matches "<oops>", whereas the
equivalent with {} doesn't.

$ perl -le'print "<oops>" =~ m<\A\<oops\>\z> ? "Match" : "Oops"'
Match
$ perl -le'print "{oops}" =~ m{\A\{oops\}\z} ? "Match" : "Oops"'
Match



John
 
F

Frank Seitz

Ben said:
~% perl -E'say "a<1>" =~ m<\Aa\<1\>\z>; say "a{1}" =~ m{\Aa\{1\}\z}'
1

~% perl -E'say "a<1>" =~ m<\Aa<1>\z>; say "a{1}" =~ m{\Aa{1}\z}'
1

~%

So it's impossible to (reliably) match a literal "{" in a {}-delimited
regex.

I disagree, it is possible with quotemeta (\Q...\E):

$ perl -E'say "a<1>" =~ m<\Aa<1>\z>; say "a{1}" =~ m{\Aa\Q{\E1\Q}\E\z}'
1
1

Frank
 
U

Uri Guttman

FS> I disagree, it is possible with quotemeta (\Q...\E):

FS> $ perl -E'say "a<1>" =~ m<\Aa<1>\z>; say "a{1}" =~ m{\Aa\Q{\E1\Q}\E\z}'
FS> 1
FS> 1

you don't have to quote each of {} separately. one \Q\E works as 1 won't
get quoted.

perl -E'say "a{1}" =~ m{\Aa\Q{1}\E\z}'
1

also you can put the {1} in a variable and \Q it:

perl -E'$x ="{1}"; say "a{1}" =~ m{\Aa\Q$x\E\z}'
1

so there are ways around it. look at these:

perl -E'say qr{\Aa\Q{1}\E\z}'
(?-xism:\Aa\{1\}\z)

perl -E'say qr{\Aa\{1\}\z}'
(?-xism:\Aa{1}\z)

the latter doesn't have escaped braces since the string parser removed
them so the regex parser sees a quantifier. in the former the \Q is done
in the regex parser so you get a literal {1}.

uri
 
F

Frank Seitz

Uri said:
FS> I disagree, it is possible with quotemeta (\Q...\E):

FS> $ perl -E'say "a<1>" =~ m<\Aa<1>\z>; say "a{1}" =~ m{\Aa\Q{\E1\Q}\E\z}'
FS> 1
FS> 1

you don't have to quote each of {} separately. one \Q\E works as 1 won't
get quoted.

perl -E'say "a{1}" =~ m{\Aa\Q{1}\E\z}'
1

also you can put the {1} in a variable and \Q it:

perl -E'$x ="{1}"; say "a{1}" =~ m{\Aa\Q$x\E\z}'
1

so there are ways around it. look at these:

perl -E'say qr{\Aa\Q{1}\E\z}'
(?-xism:\Aa\{1\}\z)

perl -E'say qr{\Aa\{1\}\z}'
(?-xism:\Aa{1}\z)

the latter doesn't have escaped braces since the string parser removed
them so the regex parser sees a quantifier. in the former the \Q is done
in the regex parser so you get a literal {1}.

I know all that. I wanted to show how to put a literal "{" or "}"
in a {}-delimited regex. "\{" doesn't work but "\Q{\E" does.

Frank
 
U

Uri Guttman

FS> I know all that. I wanted to show how to put a literal "{" or "}"
FS> in a {}-delimited regex. "\{" doesn't work but "\Q{\E" does.

then consider my post an explanation to others why \Q\E works and \'s
don't. but you deserve credit for posting the correct solution.

uri
 
F

Frank Seitz

Uri said:
FS> I know all that. I wanted to show how to put a literal "{" or "}"
FS> in a {}-delimited regex. "\{" doesn't work but "\Q{\E" does.

then consider my post an explanation to others why \Q\E works and \'s
don't.

Your explanation sounds plausible. But why doesn't it help
to double the backslashes then:

say "a{1}" =~ m{\Aa\\{1\\}\z}? 1: 0;
__END__
0

Frank
 
S

sln

Your explanation sounds plausible. But why doesn't it help
to double the backslashes then:

say "a{1}" =~ m{\Aa\\{1\\}\z}? 1: 0;
__END__
0

Frank

Why don't you print it and find out.
The answer is \\ is the escaped escape, not the escaped delimeters {}.

my $rx = qr {\Aa\\{1\\}\z};
print $rx,"\n";

-sln
 
S

sln

The *parser* un-escapes all delimeters before the string goes to the
regex engine. Below is 1..5 escapes on the {. Check the output, its all
even number of escapes after it is parsed. This makes it *impossible*
to escape the delimeter in normal fashion.

my $rx = qr {\Aa\{1\}\z};
print $rx,"\n";
$rx = qr {\Aa\\{1\\}\z};
print $rx,"\n";
$rx = qr {\Aa\\\{1\\\}\z};
print $rx,"\n";
$rx = qr {\Aa\\\\{1\\\\}\z};
print $rx,"\n";
$rx = qr {\Aa\\\\\{1\\\\\}\z};
print $rx,"\n";

__END__

(?-xism:\Aa{1}\z)
(?-xism:\Aa\\{1\\}\z)
(?-xism:\Aa\\{1\\}\z)
(?-xism:\Aa\\\\{1\\\\}\z)
(?-xism:\Aa\\\\{1\\\\}\z)

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top