How to change Perl's concept of a newline in regexps?

R

R Krause

I know that $/ is the input record separator. But that doesn't seem to
affect any change in the behavior of regular expressions in normal and
multi-line mode.

For example, this returns false since we end the string with a CR
instead of a LF:

$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );

How does one change the default newline character used in pattern
matching operations? Is there a Perl variable that can be set?

TIA,
--Randall
 
R

robertospara

I think I want to know something similar.
Look to discussion >>>is there '\0' like in C in Perl also<<<
Because $ - right border of the matching text is usually before new
line >>>$\n <<<
But in case of \r is after it >>>\r$<<<.
what is really in string when I input $something = <INPUT> end press
enter

Is there $something = 'characters'.'\n' and nothing else

or something like this or similar $something = 'characters'.'\0'.'\n'
???????????
Then Perl would know where to match $ <--end of the string --> to '\0'
<-- end of the string.

Is there someone who will understand
me?????????????????????????????????????????????????
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
R Krause
$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );
How does one change the default newline character used in pattern
matching operations?

Usually, one won't need to do this. My bet is that you do something
in a very far from optimal way
Is there a Perl variable that can be set?

No.

Hope this helps,
Ilya

P.S. Do not forget that (in //m mode) $ is just a shortcut for
(?=\n|\z), and ^ for (?:\A|(?<=\n)). Likewise for no //m.
 
R

robertospara

Thanks Ilya now I know more :)
Ilya Zakharevich napisal(a):
[A complimentary Cc of this posting was sent to
R Krause
$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );
How does one change the default newline character used in pattern
matching operations?

Usually, one won't need to do this. My bet is that you do something
in a very far from optimal way
Is there a Perl variable that can be set?

No.

Hope this helps,
Ilya

P.S. Do not forget that (in //m mode) $ is just a shortcut for
(?=\n|\z), and ^ for (?:\A|(?<=\n)). Likewise for no //m.
 
M

Mumia W. (reading news)

I know that $/ is the input record separator. But that doesn't seem to
affect any change in the behavior of regular expressions in normal and
multi-line mode.

For example, this returns false since we end the string with a CR
instead of a LF:

$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );

How does one change the default newline character used in pattern
matching operations? Is there a Perl variable that can be set?

TIA,
--Randall

There is no Perl variable for that; you would have to match the \r yourself:

m/^Hello\r/
 
R

R Krause

Ilya said:
[A complimentary Cc of this posting was sent to
R Krause
$var = "Hello\r";
return 1 if( $var =~ m/^Hello$/ );
How does one change the default newline character used in pattern
matching operations?

Usually, one won't need to do this. My bet is that you do something
in a very far from optimal way [..]
P.S. Do not forget that (in //m mode) $ is just a shortcut for
(?=\n|\z), and ^ for (?:\A|(?<=\n)). Likewise for no //m.

Thanks. It's mostly theoretical. After all '$' and '^' and '\Z' do
exist for convenience as you've shown in the hint above. I was testing
my code and it occured to me that this behavior cannot be changed. As
far as being optimal, it is curious why Perl has certain predefined
variables that can be changed at all since, to be fair, programmers
should never have to modify '$/', '$\', or '$,' to perform any I/O
operations. Yet, they exist for efficiency when needed.

Likewise, Perl's presumption that a newline in pattern matching is
always '\n' is heavily machine dependent which is unusual since I
thought that functions like chomp( ) were created to correct for this
mistaken notion. Yet, now I realize that Perl's regexps still encourage
chop( )-like behavior, which is the poor-man's solution for processing
the end-of-line.

--Randall
 
D

Dr.Ruud

R Krause schreef:
I know that $/ is the input record separator. But that doesn't seem to
affect any change in the behavior of regular expressions in normal and
multi-line mode.

That is the wrong way around. When (line oriented) input is read, and
the proper IO-layer is used, the resulting string will have a "\n" at
the end.
See also Encode::Encoding. Compare the :crlf layer.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
R Krause
Thanks. It's mostly theoretical. After all '$' and '^' and '\Z' do
exist for convenience as you've shown in the hint above. I was testing
my code and it occured to me that this behavior cannot be changed. As
far as being optimal, it is curious why Perl has certain predefined
variables that can be changed at all since, to be fair, programmers
should never have to modify '$/', '$\', or '$,' to perform any I/O
operations. Yet, they exist for efficiency when needed.

I think you are wrong. They exist mostly for backward compatibility.
These stuff should be made in channels, not in an interpreter.

Hope this helps,
Ilya
 
R

R Krause

Ilya said:
[A complimentary Cc of this posting was sent to
R Krause
Thanks. It's mostly theoretical. After all '$' and '^' and '\Z' do
exist for convenience as you've shown in the hint above. I was testing
my code and it occured to me that this behavior cannot be changed. As
far as being optimal, it is curious why Perl has certain predefined
variables that can be changed at all since, to be fair, programmers
should never have to modify '$/', '$\', or '$,' to perform any I/O
operations. Yet, they exist for efficiency when needed.

I think you are wrong. They exist mostly for backward compatibility.
These stuff should be made in channels, not in an interpreter.

I'm not certain what channels have to do with Perl's predefined
variables. If in truth '$/', '$\' and so on exist for backwards
compatibility, then I am curious why that is not mentioned in the perl
docs. It seems these variables were introduced by Larry Wall for some
ultimate purpose. If they are now to be disused, than the interpreter
must be dragging around a lot legacy coding.

--Randall
 
R

R Krause

Dr.Ruud said:
R Krause schreef:


That is the wrong way around. When (line oriented) input is read, and
the proper IO-layer is used, the resulting string will have a "\n" at
the end.
See also Encode::Encoding. Compare the :crlf layer.

Thanks. I've checked the :crlf layer but it seems to only work with
overall reading or writing operations and does not affect Perl's
internal processing of specific regexps (esp. in multi-line mode). In
cases of network communications, it is often valuable to test some
portion of input presuming line-delimiters such as '\r\n' or even '\0'
using regular expressions, but to then leave other portions of the
transmission unchanged.

I'm still looking into Encode::Encoding, but that module seems very
complex just for processing text with alternate line-delimiters. :) I
guess the convenience of $ and ^ is lost in this situation, so one
needs to just roll his own.

--Randall
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
R Krause
I'm not certain what channels have to do with Perl's predefined
variables.

Most variables predate objects and channels. Channels (should) make
them obsolete.
If in truth '$/', '$\' and so on exist for backwards
compatibility, then I am curious why that is not mentioned in the perl
docs. It seems these variables were introduced by Larry Wall for some
ultimate purpose.

Yes - 20 years ago.
If they are now to be disused, than the interpreter
must be dragging around a lot legacy coding.

It does.

Hope this helps,
Ilya
 
D

Dr.Ruud

R Krause schreef:
Dr.Ruud:

Thanks. I've checked the :crlf layer but it seems to only work with
overall reading or writing operations and does not affect Perl's
internal processing of specific regexps (esp. in multi-line mode).

The :crlf layer normalizes the data, so that you don't need to use \r\n
in your code (regexp or not) if your text file happens to be one with
CRLF line endings.
Maybe there exists a :cr layer for Mac-type text files?

In
cases of network communications, it is often valuable to test some
portion of input presuming line-delimiters such as '\r\n' or even '\0'
using regular expressions, but to then leave other portions of the
transmission unchanged.

Of course, but make a distinction between platform-dependant line
oriented operations, and buffer oriented operations.

I'm still looking into Encode::Encoding, but that module seems very
complex just for processing text with alternate line-delimiters. :) I
guess the convenience of $ and ^ is lost in this situation, so one
needs to just roll his own.

If you are not reading the data in a line-oriented way, than just use \r
(or \x0D) etc. You can make your own zero-width assertions like (?=\r).
 
B

Ben Morrow

Quoth "Dr.Ruud said:
Maybe there exists a :cr layer for Mac-type text files?

PerlIO::eol, which is IMHO greatly superior to :crlf in all
circumstances.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top