Regex: Exact semantics of ^ and $ when using /m

W

Wolfgang Thomas

Hi,

I am afraid that this question has been asked before, but I could not
find the answer in the FAQ nor in the "Programming Perl" book, nor by
googling.

My question refers to the /m modifier for regular expressions.
According to "Programming Perl" /m lets ^ and $ match next to new lines
within the string instead of considering only the beginning and end of
the string.

Therefore I wonder why the following example does not match:

my $s = "123\n456";
if ($s =~ /3$^4/m) {print "match (4)\n";}

Even more confusing (for me) is that
if ($s =~ /3$4/m) {print "match (2)\n";}
matches, whereas
if ($s =~ /34/m) {print "match (3)\n";}
does not match.

Could someone please point me to an explanation of that behavior?
 
D

DJ Stunks

Wolfgang said:
Hi,

I am afraid that this question has been asked before, but I could not
find the answer in the FAQ nor in the "Programming Perl" book, nor by
googling.

are you aware that Perl comes with documentation of its own for all the
functions and syntax that you might ever want to use?

I would suggest perlre.
My question refers to the /m modifier for regular expressions.
According to "Programming Perl" /m lets ^ and $ match next to new lines
within the string instead of considering only the beginning and end of
the string.

you have your answer: "next to". they are called "zero width
assertions" which means they match, but they do not consume any
characters from the string.
From perlre:
By default, the "^" character is guaranteed to match only the
beginning of the string, the "$" character only the end (or
before the newline at the end), and Perl does certain optimizations
with the assumption that the string contains only one line. Embedded
newlines will not be matched by "^" or "$". You may, however, wish
to treat a string as a multi-line buffer, such that the "^" will
match after any newline within the string, and "$" will match before
any newline. At the cost of a little more overhead, you can do this
by using the /m modifier on the pattern match operator.
Therefore I wonder why the following example does not match:

my $s = "123\n456";
if ($s =~ /3$^4/m) {print "match (4)\n";}

this is because there's a character after that $ and before that ^: a
\n.

try: if ($s =~ m'3$.*^4'ms) {print "match (4)\n";}
Even more confusing (for me) is that
if ($s =~ /3$4/m) {print "match (2)\n";}
matches,

did you have warnings enabled? if so, did you notice the complaint
"Use of uninitialized value in concatenation (.) or string at..."? The
compiler is not taking that '$' as a regex metacharacter - it is
grouping it with the 4 and assuming you are trying to interpolate $4.
$4 is not defined, the match is now for /3/ which matches.
Could someone please point me to an explanation of that behavior?

HTH,
-jp
 
M

Mumia W.

Wolfgang said:

Hi Wolfgang.
Therefore I wonder why the following example does not match:

my $s = "123\n456";
if ($s =~ /3$^4/m) {print "match (4)\n";}
[...]

^ only matches the beginning of a line when it appears at the beginning
of the RE, and $ only matches the end of a line when it appears at the
end of the RE.

Use \n to match newlines embedded inside an RE:
if ($s =~ /3\n4/) { ... }
 
D

Dr.Ruud

Mumia W. schreef:
^ only matches the beginning of a line when it appears at the
beginning of the RE, and $ only matches the end of a line when it
appears at the end of the RE.


No.

perl -wle '"a\nb" =~ /a$(?:\n)^b/m and print 1'

perl -wle '"a\nb" =~ / a $ \n ^ b /mx and print 1'

perl -wle '"a\nb" =~ / a $ \s ^ b /mx and print 1'

perl -wle '"a\nb" =~ / a $ (?:[^^]) ^ b /mx and print 1'

etc.
 
M

Mumia W.

Dr.Ruud said:
Mumia W. schreef:
^ only matches the beginning of a line when it appears at the
beginning of the RE, and $ only matches the end of a line when it
appears at the end of the RE.


No.

perl -wle '"a\nb" =~ /a$(?:\n)^b/m and print 1'

perl -wle '"a\nb" =~ / a $ \n ^ b /mx and print 1'

perl -wle '"a\nb" =~ / a $ \s ^ b /mx and print 1'

perl -wle '"a\nb" =~ / a $ (?:[^^]) ^ b /mx and print 1'

etc.

Hmm, it seems I was wrong about "^"; thanks.
 
W

Wolfgang Thomas

DJ said:
you have your answer: "next to". they are called "zero width
assertions" which means they match, but they do not consume any
characters from the string.

That's the key to understanding the behavior.


Thank you guys.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top