Unexpected regex Behavior

Discussion in 'Perl Misc' started by Mark Shelor, May 14, 2006.

  1. Mark Shelor

    Mark Shelor Guest

    Is it true that defining $/ to an integer reference (to read
    fixed-length records) affects the meaning of the end-of-string symbol
    ($) in regex's?

    For example, let's say I'm reading 4096-byte chunks from a file, and
    wish to do special processing if any chunk ends with the carriage-return
    character (\015). So, I start with code that looks like:

    local $/ = \4096;
    while (defined (my $rec = <F>)) {
    while ($rec =~ /\015$/) {
    # do special processing ...
    }
    ...
    }

    Oddly, this doesn't seem to work. It ends up matching chunks that
    contain, but don't necessarily end with, \015.

    Instead, I have to do this:

    local $/ = \4096;
    while (defined (my $rec = <F>)) {
    while (substr($rec, -1) eq "\015") {
    # do special processing ...
    }
    ...
    }

    Any idea what's going on?

    Thanks, Mark
     
    Mark Shelor, May 14, 2006
    #1
    1. Advertising

  2. Mark Shelor

    MSG Guest

    Mark Shelor wrote:
    > Is it true that defining $/ to an integer reference (to read
    > fixed-length records) affects the meaning of the end-of-string symbol
    > ($) in regex's?
    >
    > local $/ = \4096;
    > while (defined (my $rec = <F>)) {
    > while ($rec =~ /\015$/) {
    > # do special processing ...

    ($) is not exactly the end-of-string symbol, it is end-of-line symbol.
    (Z) or (z) is end-of-string symbol and should serves your purpose.

    Also I feel that "if" is better than a "while" loop ( the 2nd one),
    since
    you only want to match one \015 at the end of the string.
     
    MSG, May 14, 2006
    #2
    1. Advertising

  3. Mark Shelor wrote:
    > Is it true that defining $/ to an integer reference (to read
    > fixed-length records) affects the meaning of the end-of-string symbol
    > ($) in regex's?


    No, it is not true.

    > For example, let's say I'm reading 4096-byte chunks from a file, and
    > wish to do special processing if any chunk ends with the carriage-return
    > character (\015). So, I start with code that looks like:
    >
    > local $/ = \4096;
    > while (defined (my $rec = <F>)) {
    > while ($rec =~ /\015$/) {
    > # do special processing ...
    > }
    > ...
    > }
    >
    > Oddly, this doesn't seem to work. It ends up matching chunks that
    > contain, but don't necessarily end with, \015.
    >
    > Instead, I have to do this:
    >
    > local $/ = \4096;
    > while (defined (my $rec = <F>)) {
    > while (substr($rec, -1) eq "\015") {
    > # do special processing ...
    > }
    > ...
    > }
    >
    > Any idea what's going on?


    perldoc perlre
    [snip]
    By default, the "^" character is guaranteed to match only the beginning
    of the string, the "$" character only the end (or before the newline at
    the end), and Perl does certain optimizations with the assumption that
    the string contains only one line. Embedded newlines will not be
    matched by "^" or "$". You may, however, wish to treat a string as a
    multi-line buffer, such that the "^" will match after any newline
    within the string, and "$" will match before any newline. At the cost
    of a little more overhead, you can do this by using the /m modifier on
    the pattern match operator. (Older programs did this by setting $*,
    but this practice is now deprecated.)


    So the regular expression will match with either "\015" or "\015\012" at the
    end of the string. If you want it to only match at the end of the string use
    /\015\z/ or the substr() expression.



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, May 14, 2006
    #3
  4. Mark Shelor

    Mark Shelor Guest

    John W. Krahn wrote:
    > Mark Shelor wrote:
    >
    >>Is it true that defining $/ to an integer reference (to read
    >>fixed-length records) affects the meaning of the end-of-string symbol
    >>($) in regex's?

    >
    >
    > No, it is not true.
    >
    >
    >>For example, let's say I'm reading 4096-byte chunks from a file, and
    >>wish to do special processing if any chunk ends with the carriage-return
    >>character (\015). So, I start with code that looks like:
    >>
    >>local $/ = \4096;
    >>while (defined (my $rec = <F>)) {
    >> while ($rec =~ /\015$/) {
    >> # do special processing ...
    >> }
    >> ...
    >>}
    >>
    >>Oddly, this doesn't seem to work. It ends up matching chunks that
    >>contain, but don't necessarily end with, \015.
    >>
    >>Instead, I have to do this:
    >>
    >>local $/ = \4096;
    >>while (defined (my $rec = <F>)) {
    >> while (substr($rec, -1) eq "\015") {
    >> # do special processing ...
    >> }
    >> ...
    >>}
    >>
    >>Any idea what's going on?

    >
    >
    > perldoc perlre
    > [snip]
    > By default, the "^" character is guaranteed to match only the beginning
    > of the string, the "$" character only the end (or before the newline at
    > the end), and Perl does certain optimizations with the assumption that
    > the string contains only one line. Embedded newlines will not be
    > matched by "^" or "$". You may, however, wish to treat a string as a
    > multi-line buffer, such that the "^" will match after any newline
    > within the string, and "$" will match before any newline. At the cost
    > of a little more overhead, you can do this by using the /m modifier on
    > the pattern match operator. (Older programs did this by setting $*,
    > but this practice is now deprecated.)
    >
    >
    > So the regular expression will match with either "\015" or "\015\012" at the
    > end of the string. If you want it to only match at the end of the string use
    > /\015\z/ or the substr() expression.



    Now it all makes perfect sense. Thanks for citing the reference, and
    thanks to you and MSG for the helpful replies.

    As a side remark to MSG's response, both $ and \Z match *before* newline
    at the end, so only /\015\z/ will work in this case.

    Regards, Mark
     
    Mark Shelor, May 15, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. G Dean Blake

    Unexpected datagrid behavior

    G Dean Blake, Jan 13, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    322
    G Dean Blake
    Jan 13, 2005
  2. Chuck Bowling

    Unexpected page designer behavior

    Chuck Bowling, Jul 4, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    446
    Chuck Bowling
    Jul 4, 2005
  3. Victor Bazarov
    Replies:
    0
    Views:
    855
    Victor Bazarov
    Jun 25, 2003
  4. Russell Hanneken
    Replies:
    0
    Views:
    899
    Russell Hanneken
    Jun 25, 2003
  5. Replies:
    3
    Views:
    797
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page