Unexpected regex Behavior

Mark Shelor · May 14, 2006

Is it true that defining $/ to an integer reference (to read
fixed-length records) affects the meaning of the end-of-string symbol
($) in regex's?

For example, let's say I'm reading 4096-byte chunks from a file, and
wish to do special processing if any chunk ends with the carriage-return
character (\015). So, I start with code that looks like:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while ($rec =~ /\015$/) {
# do special processing ...
}
...
}

Oddly, this doesn't seem to work. It ends up matching chunks that
contain, but don't necessarily end with, \015.

Instead, I have to do this:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while (substr($rec, -1) eq "\015") {
# do special processing ...
}
...
}

Any idea what's going on?

Thanks, Mark

MSG · May 14, 2006

Mark said:
Is it true that defining $/ to an integer reference (to read
fixed-length records) affects the meaning of the end-of-string symbol
($) in regex's?

local $/ = \4096;
while (defined (my $rec = <F>)) {
while ($rec =~ /\015$/) {
# do special processing ...

($) is not exactly the end-of-string symbol, it is end-of-line symbol.
(Z) or (z) is end-of-string symbol and should serves your purpose.

Also I feel that "if" is better than a "while" loop ( the 2nd one),
since
you only want to match one \015 at the end of the string.

John W. Krahn · May 14, 2006

Mark said:
Is it true that defining $/ to an integer reference (to read
fixed-length records) affects the meaning of the end-of-string symbol
($) in regex's?

No, it is not true.

For example, let's say I'm reading 4096-byte chunks from a file, and
wish to do special processing if any chunk ends with the carriage-return
character (\015). So, I start with code that looks like:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while ($rec =~ /\015$/) {
# do special processing ...
}
...
}

Oddly, this doesn't seem to work. It ends up matching chunks that
contain, but don't necessarily end with, \015.

Instead, I have to do this:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while (substr($rec, -1) eq "\015") {
# do special processing ...
}
...
}

Any idea what's going on?

perldoc perlre
[snip]
By default, the "^" character is guaranteed to match only the beginning
of the string, the "$" character only the end (or before the newline at
the end), and Perl does certain optimizations with the assumption that
the string contains only one line. Embedded newlines will not be
matched by "^" or "$". You may, however, wish to treat a string as a
multi-line buffer, such that the "^" will match after any newline
within the string, and "$" will match before any newline. At the cost
of a little more overhead, you can do this by using the /m modifier on
the pattern match operator. (Older programs did this by setting $*,
but this practice is now deprecated.)

So the regular expression will match with either "\015" or "\015\012" at the
end of the string. If you want it to only match at the end of the string use
/\015\z/ or the substr() expression.

John

Mark Shelor · May 15, 2006

John said:
Mark said:

Is it true that defining $/ to an integer reference (to read
fixed-length records) affects the meaning of the end-of-string symbol
($) in regex's?

Click to expand...

No, it is not true.

For example, let's say I'm reading 4096-byte chunks from a file, and
wish to do special processing if any chunk ends with the carriage-return
character (\015). So, I start with code that looks like:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while ($rec =~ /\015$/) {
# do special processing ...
}
...
}

Oddly, this doesn't seem to work. It ends up matching chunks that
contain, but don't necessarily end with, \015.

Instead, I have to do this:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while (substr($rec, -1) eq "\015") {
# do special processing ...
}
...
}

Any idea what's going on?

Click to expand...

perldoc perlre
[snip]
By default, the "^" character is guaranteed to match only the beginning
of the string, the "$" character only the end (or before the newline at
the end), and Perl does certain optimizations with the assumption that
the string contains only one line. Embedded newlines will not be
matched by "^" or "$". You may, however, wish to treat a string as a
multi-line buffer, such that the "^" will match after any newline
within the string, and "$" will match before any newline. At the cost
of a little more overhead, you can do this by using the /m modifier on
the pattern match operator. (Older programs did this by setting $*,
but this practice is now deprecated.)

So the regular expression will match with either "\015" or "\015\012" at the
end of the string. If you want it to only match at the end of the string use
/\015\z/ or the substr() expression.

Now it all makes perfect sense. Thanks for citing the reference, and
thanks to you and MSG for the helpful replies.

As a side remark to MSG's response, both $ and \Z match *before* newline
at the end, so only /\015\z/ will work in this case.

Regards, Mark

Odd regex behavior	9	Oct 1, 2007
Help with dynamic regex	14	Mar 7, 2012
Could someone help me with this source code?	5	Jan 20, 2007
Dummy regex question	23	Jan 4, 2005
POE HTTP Proxy	1	Oct 3, 2004
reuse code inquiry	3	Dec 5, 2007
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
Define method and def	7	Apr 7, 2006

Unexpected regex Behavior

Mark Shelor

MSG

John W. Krahn

Mark Shelor

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads