How to change Perl's concept of a newline in regexps?

R Krause · Nov 20, 2006

I know that $/ is the input record separator. But that doesn't seem to
affect any change in the behavior of regular expressions in normal and
multi-line mode.

For example, this returns false since we end the string with a CR
instead of a LF:

$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );

How does one change the default newline character used in pattern
matching operations? Is there a Perl variable that can be set?

TIA,
--Randall

robertospara · Nov 20, 2006

I think I want to know something similar.
Look to discussion >>>is there '\0' like in C in Perl also<<<
Because $ - right border of the matching text is usually before new
line >>>$\n <<<
But in case of \r is after it >>>\r$<<<.
what is really in string when I input $something = <INPUT> end press
enter

Is there $something = 'characters'.'\n' and nothing else

or something like this or similar $something = 'characters'.'\0'.'\n'
???????????
Then Perl would know where to match $ <--end of the string --> to '\0'
<-- end of the string.

Is there someone who will understand
me?????????????????????????????????????????????????

Ilya Zakharevich · Nov 20, 2006

[A complimentary Cc of this posting was sent to
R Krause

$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );

How does one change the default newline character used in pattern
matching operations?

Usually, one won't need to do this. My bet is that you do something
in a very far from optimal way

Is there a Perl variable that can be set?

No.

Hope this helps,
Ilya

P.S. Do not forget that (in //m mode) $ is just a shortcut for
(?=\n|\z), and ^ for (?:\A|(?<=\n)). Likewise for no //m.

robertospara · Nov 20, 2006

Thanks Ilya now I know more

Ilya Zakharevich napisal(a):

[A complimentary Cc of this posting was sent to
R Krause

$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );

Click to expand...

How does one change the default newline character used in pattern
matching operations?

Click to expand...

Usually, one won't need to do this. My bet is that you do something
in a very far from optimal way

Is there a Perl variable that can be set?

Click to expand...

No.

Hope this helps,
Ilya

P.S. Do not forget that (in //m mode) $ is just a shortcut for
(?=\n|\z), and ^ for (?:\A|(?<=\n)). Likewise for no //m.

Mumia W. (reading news) · Nov 20, 2006

I know that $/ is the input record separator. But that doesn't seem to
affect any change in the behavior of regular expressions in normal and
multi-line mode.

For example, this returns false since we end the string with a CR
instead of a LF:

$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );

How does one change the default newline character used in pattern
matching operations? Is there a Perl variable that can be set?

TIA,
--Randall

There is no Perl variable for that; you would have to match the \r yourself:

m/^Hello\r/

R Krause · Nov 20, 2006

Ilya said:
[A complimentary Cc of this posting was sent to
R Krause

$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );

Click to expand...

How does one change the default newline character used in pattern
matching operations?

Click to expand...

Usually, one won't need to do this. My bet is that you do something
in a very far from optimal way [..]
P.S. Do not forget that (in //m mode) $ is just a shortcut for
(?=\n|\z), and ^ for (?:\A|(?<=\n)). Likewise for no //m.

Thanks. It's mostly theoretical. After all '$' and '^' and '\Z' do
exist for convenience as you've shown in the hint above. I was testing
my code and it occured to me that this behavior cannot be changed. As
far as being optimal, it is curious why Perl has certain predefined
variables that can be changed at all since, to be fair, programmers
should never have to modify '$/', '$\', or '$,' to perform any I/O
operations. Yet, they exist for efficiency when needed.

Likewise, Perl's presumption that a newline in pattern matching is
always '\n' is heavily machine dependent which is unusual since I
thought that functions like chomp( ) were created to correct for this
mistaken notion. Yet, now I realize that Perl's regexps still encourage
chop( )-like behavior, which is the poor-man's solution for processing
the end-of-line.

--Randall

Dr.Ruud · Nov 20, 2006

R Krause schreef:

I know that $/ is the input record separator. But that doesn't seem to
affect any change in the behavior of regular expressions in normal and
multi-line mode.

That is the wrong way around. When (line oriented) input is read, and
the proper IO-layer is used, the resulting string will have a "\n" at
the end.
See also Encode::Encoding. Compare the :crlf layer.

Ilya Zakharevich · Nov 20, 2006

[A complimentary Cc of this posting was sent to
R Krause

Thanks. It's mostly theoretical. After all '$' and '^' and '\Z' do
exist for convenience as you've shown in the hint above. I was testing
my code and it occured to me that this behavior cannot be changed. As
far as being optimal, it is curious why Perl has certain predefined
variables that can be changed at all since, to be fair, programmers
should never have to modify '$/', '$\', or '$,' to perform any I/O
operations. Yet, they exist for efficiency when needed.

I think you are wrong. They exist mostly for backward compatibility.
These stuff should be made in channels, not in an interpreter.

Hope this helps,
Ilya

R Krause · Nov 22, 2006

Ilya said:
[A complimentary Cc of this posting was sent to
R Krause

Thanks. It's mostly theoretical. After all '$' and '^' and '\Z' do
exist for convenience as you've shown in the hint above. I was testing
my code and it occured to me that this behavior cannot be changed. As
far as being optimal, it is curious why Perl has certain predefined
variables that can be changed at all since, to be fair, programmers
should never have to modify '$/', '$\', or '$,' to perform any I/O
operations. Yet, they exist for efficiency when needed.

Click to expand...

I think you are wrong. They exist mostly for backward compatibility.
These stuff should be made in channels, not in an interpreter.

I'm not certain what channels have to do with Perl's predefined
variables. If in truth '$/', '$\' and so on exist for backwards
compatibility, then I am curious why that is not mentioned in the perl
docs. It seems these variables were introduced by Larry Wall for some
ultimate purpose. If they are now to be disused, than the interpreter
must be dragging around a lot legacy coding.

--Randall

R Krause · Nov 22, 2006

Dr.Ruud said:
R Krause schreef:

That is the wrong way around. When (line oriented) input is read, and
the proper IO-layer is used, the resulting string will have a "\n" at
the end.
See also Encode::Encoding. Compare the :crlf layer.

Thanks. I've checked the :crlf layer but it seems to only work with
overall reading or writing operations and does not affect Perl's
internal processing of specific regexps (esp. in multi-line mode). In
cases of network communications, it is often valuable to test some
portion of input presuming line-delimiters such as '\r\n' or even '\0'
using regular expressions, but to then leave other portions of the
transmission unchanged.

I'm still looking into Encode::Encoding, but that module seems very
complex just for processing text with alternate line-delimiters.

I
guess the convenience of $ and ^ is lost in this situation, so one
needs to just roll his own.

--Randall

Ilya Zakharevich · Nov 22, 2006

[A complimentary Cc of this posting was sent to
R Krause

I'm not certain what channels have to do with Perl's predefined
variables.

Most variables predate objects and channels. Channels (should) make
them obsolete.

If in truth '$/', '$\' and so on exist for backwards
compatibility, then I am curious why that is not mentioned in the perl
docs. It seems these variables were introduced by Larry Wall for some
ultimate purpose.

Yes - 20 years ago.

If they are now to be disused, than the interpreter
must be dragging around a lot legacy coding.

It does.

Hope this helps,
Ilya

Dr.Ruud · Nov 22, 2006

R Krause schreef:

Dr.Ruud:

Thanks. I've checked the :crlf layer but it seems to only work with
overall reading or writing operations and does not affect Perl's
internal processing of specific regexps (esp. in multi-line mode).

The :crlf layer normalizes the data, so that you don't need to use \r\n
in your code (regexp or not) if your text file happens to be one with
CRLF line endings.
Maybe there exists a :cr layer for Mac-type text files?

In
cases of network communications, it is often valuable to test some
portion of input presuming line-delimiters such as '\r\n' or even '\0'
using regular expressions, but to then leave other portions of the
transmission unchanged.

Of course, but make a distinction between platform-dependant line
oriented operations, and buffer oriented operations.

I'm still looking into Encode::Encoding, but that module seems very
complex just for processing text with alternate line-delimiters. I
guess the convenience of $ and ^ is lost in this situation, so one
needs to just roll his own.

If you are not reading the data in a line-oriented way, than just use \r
(or \x0D) etc. You can make your own zero-width assertions like (?=\r).

Ben Morrow · Nov 22, 2006

Quoth "Dr.Ruud said:
Maybe there exists a :cr layer for Mac-type text files?

PerlIO::eol, which is IMHO greatly superior to :crlf in all
circumstances.

Ben

How to combine regexps?	3	Aug 5, 2009
Need help to find byte offsets for regexps in a file	2	Jul 8, 2006
FAQ 5.2 How do I change, delete, or insert a line in a file, or append to the beginning of a file?	0	Feb 24, 2011
FAQ 4.32 How do I strip blank space from the beginning/end of a string?	0	Feb 25, 2011
How to get a newline when writing file from PC to Linux using Samba	2	Jul 9, 2007
FAQ 5.3 How do I count the number of lines in a file?	0	Jan 31, 2011
perl bug File::Basename and Perl's nature	14	Jan 25, 2004
FAQ 5.4 How do I delete the last N lines from a file?	0	Jan 31, 2011

How to change Perl's concept of a newline in regexps?

R Krause

robertospara

Ilya Zakharevich

robertospara

Mumia W. (reading news)

R Krause

Dr.Ruud

Ilya Zakharevich

R Krause

R Krause

Ilya Zakharevich

Dr.Ruud

Ben Morrow

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads