Regex: Exact semantics of ^ and $ when using /m

Discussion in 'Perl Misc' started by Wolfgang Thomas, Jun 24, 2006.

  1. Hi,

    I am afraid that this question has been asked before, but I could not
    find the answer in the FAQ nor in the "Programming Perl" book, nor by
    googling.

    My question refers to the /m modifier for regular expressions.
    According to "Programming Perl" /m lets ^ and $ match next to new lines
    within the string instead of considering only the beginning and end of
    the string.

    Therefore I wonder why the following example does not match:

    my $s = "123\n456";
    if ($s =~ /3$^4/m) {print "match (4)\n";}

    Even more confusing (for me) is that
    if ($s =~ /3$4/m) {print "match (2)\n";}
    matches, whereas
    if ($s =~ /34/m) {print "match (3)\n";}
    does not match.

    Could someone please point me to an explanation of that behavior?
    Wolfgang Thomas, Jun 24, 2006
    #1
    1. Advertising

  2. Wolfgang Thomas

    DJ Stunks Guest

    Wolfgang Thomas wrote:
    > Hi,
    >
    > I am afraid that this question has been asked before, but I could not
    > find the answer in the FAQ nor in the "Programming Perl" book, nor by
    > googling.


    are you aware that Perl comes with documentation of its own for all the
    functions and syntax that you might ever want to use?

    I would suggest perlre.

    > My question refers to the /m modifier for regular expressions.
    > According to "Programming Perl" /m lets ^ and $ match next to new lines
    > within the string instead of considering only the beginning and end of
    > the string.


    you have your answer: "next to". they are called "zero width
    assertions" which means they match, but they do not consume any
    characters from the string.

    >From perlre:

    By default, the "^" character is guaranteed to match only the
    beginning of the string, the "$" character only the end (or
    before the newline at the end), and Perl does certain optimizations
    with the assumption that the string contains only one line. Embedded
    newlines will not be matched by "^" or "$". You may, however, wish
    to treat a string as a multi-line buffer, such that the "^" will
    match after any newline within the string, and "$" will match before
    any newline. At the cost of a little more overhead, you can do this
    by using the /m modifier on the pattern match operator.

    > Therefore I wonder why the following example does not match:
    >
    > my $s = "123\n456";
    > if ($s =~ /3$^4/m) {print "match (4)\n";}


    this is because there's a character after that $ and before that ^: a
    \n.

    try: if ($s =~ m'3$.*^4'ms) {print "match (4)\n";}

    > Even more confusing (for me) is that
    > if ($s =~ /3$4/m) {print "match (2)\n";}
    > matches,


    did you have warnings enabled? if so, did you notice the complaint
    "Use of uninitialized value in concatenation (.) or string at..."? The
    compiler is not taking that '$' as a regex metacharacter - it is
    grouping it with the 4 and assuming you are trying to interpolate $4.
    $4 is not defined, the match is now for /3/ which matches.

    > Could someone please point me to an explanation of that behavior?


    HTH,
    -jp
    DJ Stunks, Jun 24, 2006
    #2
    1. Advertising

  3. Wolfgang Thomas

    Mumia W. Guest

    Wolfgang Thomas wrote:
    > Hi,


    Hi Wolfgang.

    >
    > Therefore I wonder why the following example does not match:
    >
    > my $s = "123\n456";
    > if ($s =~ /3$^4/m) {print "match (4)\n";}
    > [...]


    ^ only matches the beginning of a line when it appears at the beginning
    of the RE, and $ only matches the end of a line when it appears at the
    end of the RE.

    Use \n to match newlines embedded inside an RE:
    if ($s =~ /3\n4/) { ... }
    Mumia W., Jun 24, 2006
    #3
  4. Wolfgang Thomas

    Dr.Ruud Guest

    Mumia W. schreef:

    > ^ only matches the beginning of a line when it appears at the
    > beginning of the RE, and $ only matches the end of a line when it
    > appears at the end of the RE.



    No.

    perl -wle '"a\nb" =~ /a$(?:\n)^b/m and print 1'

    perl -wle '"a\nb" =~ / a $ \n ^ b /mx and print 1'

    perl -wle '"a\nb" =~ / a $ \s ^ b /mx and print 1'

    perl -wle '"a\nb" =~ / a $ (?:[^^]) ^ b /mx and print 1'

    etc.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Jun 24, 2006
    #4
  5. Wolfgang Thomas

    Mumia W. Guest

    Dr.Ruud wrote:
    > Mumia W. schreef:
    >
    >> ^ only matches the beginning of a line when it appears at the
    >> beginning of the RE, and $ only matches the end of a line when it
    >> appears at the end of the RE.

    >
    >
    > No.
    >
    > perl -wle '"a\nb" =~ /a$(?:\n)^b/m and print 1'
    >
    > perl -wle '"a\nb" =~ / a $ \n ^ b /mx and print 1'
    >
    > perl -wle '"a\nb" =~ / a $ \s ^ b /mx and print 1'
    >
    > perl -wle '"a\nb" =~ / a $ (?:[^^]) ^ b /mx and print 1'
    >
    > etc.
    >


    Hmm, it seems I was wrong about "^"; thanks.
    Mumia W., Jun 25, 2006
    #5
  6. DJ Stunks wrote:

    > you have your answer: "next to". they are called "zero width
    > assertions" which means they match, but they do not consume any
    > characters from the string.


    That's the key to understanding the behavior.


    Thank you guys.
    Wolfgang Thomas, Jun 25, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Santa
    Replies:
    1
    Views:
    1,061
    Mark A. Odell
    Jul 17, 2003
  2. Replies:
    3
    Views:
    683
  3. Replies:
    5
    Views:
    460
    christian.bau
    Feb 22, 2008
  4. Replies:
    3
    Views:
    728
    Reedick, Andrew
    Jul 1, 2008
  5. SG
    Replies:
    4
    Views:
    1,775
    Dilip
    Jul 23, 2010
Loading...

Share This Page