s///x

D

Dr.Ruud

I was trying the s///x syntax and got unexpected results.
Somebody cares to explain?

Simplified example:

#!/usr/bin/perl

use warnings;
use strict;

local ($,, $\) = ("\t", "\n");
my $x;

$_ = "abc 123 def 123 ghi";

$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/ # by
4 # FOUR
5 # FIVE
6 # SIX
/gsx; # global, single line, extended format

print 'Made', $x, 'replacements.';
print;


This printed:

Made 2 replacements.
abc # by
4 # FOUR
5 # FIVE
6 # SIX
def # by
4 # FOUR
5 # FIVE
6 # SIX
ghi

I expected: abc 456 def 456 ghi
 
A

Anno Siegel

Dr.Ruud said:
I was trying the s///x syntax and got unexpected results.
Somebody cares to explain?

Simplified example:

#!/usr/bin/perl

use warnings;
use strict;

local ($,, $\) = ("\t", "\n");
my $x;

$_ = "abc 123 def 123 ghi";

$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/ # by
4 # FOUR
5 # FIVE
6 # SIX
/gsx; # global, single line, extended format

print 'Made', $x, 'replacements.';
print;


This printed:

Made 2 replacements.
abc # by
4 # FOUR
5 # FIVE
6 # SIX
def # by
4 # FOUR
5 # FIVE
6 # SIX
ghi

I expected: abc 456 def 456 ghi

The changes by /x only affect the regex proper. The replacement part
is still an ordinary double-quotish string.

Anno
 
P

Paul Lalli

Dr.Ruud said:
I was trying the s///x syntax and got unexpected results.
Somebody cares to explain?

Simplified example:

#!/usr/bin/perl

use warnings;
use strict;

local ($,, $\) = ("\t", "\n");
my $x;

$_ = "abc 123 def 123 ghi";

$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/ # by
4 # FOUR
5 # FIVE
6 # SIX
/gsx; # global, single line, extended format

print 'Made', $x, 'replacements.';
print;


This printed:

Made 2 replacements.
abc # by
4 # FOUR
5 # FIVE
6 # SIX
def # by
4 # FOUR
5 # FIVE
6 # SIX
ghi

I expected: abc 456 def 456 ghi

Your expectations were incorrect.

The /x modifier causes whitespace in the *pattern match* to be ignored.
The replacement portion of a s/// operation is not a pattern match -
it is a double-quoted string. /x has no effect on this replacement.

Paul Lalli
 
G

Gunnar Hjalmarsson

Dr.Ruud said:
I was trying the s///x syntax and got unexpected results.
Somebody cares to explain?

Simplified example:

#!/usr/bin/perl

use warnings;
use strict;

local ($,, $\) = ("\t", "\n");
my $x;

$_ = "abc 123 def 123 ghi";

$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/ # by
4 # FOUR
5 # FIVE
6 # SIX
/gsx; # global, single line, extended format

print 'Made', $x, 'replacements.';
print;


This printed:

Made 2 replacements.
abc # by
4 # FOUR
5 # FIVE
6 # SIX
def # by
4 # FOUR
5 # FIVE
6 # SIX
ghi

I expected: abc 456 def 456 ghi

Try:

$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/456/gx; # global, extended format

Note that the /s modifier is redundant (see "perldoc perlre").
 
D

Dr.Ruud

Gunnar Hjalmarsson schreef:

[s/ 123 #one-two-three / 456 #four-five-six /x]

Try:

$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/456/gx; # global, extended format

Of course what is in my real and working code is a lot more like that.
But I like the commented format much better and was real disappointed
that it didn't work.

Note that the /s modifier is redundant (see "perldoc perlre").

I don't consider the /s modifier redundant. It was not needed in my
example, so maybe you meant "redundant here"?
 
P

Paul Lalli

Dr.Ruud said:
Gunnar Hjalmarsson schreef:

I don't consider the /s modifier redundant. It was not needed in my
example, so maybe you meant "redundant here"?

Redundant would be if you had something in your pattern match like:
/stuff(?:.|\n)stuff/s

Here, I think /s is simply extraneous.

Paul Lalli
 
G

Gunnar Hjalmarsson

Dr.Ruud said:
Gunnar Hjalmarsson schreef:

I don't consider the /s modifier redundant. It was not needed in my
example, so maybe you meant "redundant here"?

Okay, redundant (or extraneous...) here. I mentioned it because people
misunderstand the meaning of it all the time, and I believe one reason
for that is that "perldoc perlre" - unlike e.g. "perldoc perlop" - is
the only place in the docs (to my knowledge) where its meaning is
properly explained.
 
D

Dr.Ruud

Gunnar Hjalmarsson:
Dr.Ruud:

Okay, redundant (or extraneous...) here. I mentioned it because people
misunderstand the meaning of it all the time, and I believe one reason
for that is that "perldoc perlre" - unlike e.g. "perldoc perlop" - is
the only place in the docs (to my knowledge) where its meaning is
properly explained.

OK. It would be nice to have an educational piece of code about /m and
/s.

Let me make a start:

# a.1: without /s, the .* will match up to the first \n
$ echo 'first
second
third' | perl -pe 's/.*/#/'
#
#
#

# a.2: with /s, the .* will match until the very end
$ echo 'first
second
third' | perl -pe 's/.*/#/s'
###


# b.1: without /s or /m, the .$ will match nothing if there are
# two newlines at the end
$ echo 'first
second
third' | perl -pe '$_.="\n"; s/.$/#/'
first

second

third


# b.2: with /s, the .$ will match anything before the last \n
$ echo 'first
second
third' | perl -pe '$_.="\n"; s/.$/#/s'
first#
second#
third#


# b.3: with /m, the .$ will match anything before the first \n
$ echo 'first
second
third' | perl -pe '$_.="\n"; s/.$/#/m'
firs#

secon#

thir#
 
D

Dr.Ruud

Anno Siegel schreef:
The changes by /x only affect the regex proper. The replacement part
is still an ordinary double-quotish string.

OK. I am still trying to think up why it was chosen to not affect the
replacement part. I have no doubt that there is a simple explanation why
it is not feasible, but I just can't think it up (tired of working some
very long days, but very satisfied with the results and very happy with
Perl).
 
G

Gunnar Hjalmarsson

Abigail said:
Gunnar Hjalmarsson ([email protected]) wrote on MMMMCDXLVI September
MCMXCIII in <URL:** Dr.Ruud wrote:
** > I don't consider the /s modifier redundant. It was not needed in my
** > example, so maybe you meant "redundant here"?
**
** Okay, redundant (or extraneous...) here. I mentioned it because people
** misunderstand the meaning of it all the time, and I believe one reason
** for that is that "perldoc perlre" - unlike e.g. "perldoc perlop" - is
** the only place in the docs (to my knowledge) where its meaning is
** properly explained.

Damian makes a good argument in PBP to always use /s and /m.

What's PBP?
I don't think it's worth raising your finger if someone uses /s or /m
on a regex where it doesn't matter. It's like complaining someone uses
'use warnings' on a piece of code where it didn't matter.

A better parallel IMO is that it's like complaining when someone calls a
function using '&' without knowing the implications of doing so. It
'works' most of the time, but not always...
(Not saying that Dr. Ruud doesn't know the implications of using the /s
modifier. It's now obvious that he does.)
 
T

Tad McClellan

Dr.Ruud said:
Anno Siegel schreef:


OK. I am still trying to think up why it was chosen to not affect the
replacement part.


Because spaces are _supposed_ to matter when they are in a string.
 
J

John Bokma

Gunnar Hjalmarsson said:
A better parallel IMO is that it's like complaining when someone calls
a function using '&' without knowing the implications of doing so. It
'works' most of the time, but not always...

Yup, I agree on that one. If I see &sub, I assume that the user requires
the & there. Same with /s or /m. It confuses me if it's just there and adds
line noise.
 
T

Tad McClellan

Damian makes a good argument in PBP to always use /s and /m.


I'd better go read it.

I don't think it's worth raising your finger if someone uses /s or /m
on a regex where it doesn't matter.


To me, modifiers mean "something out of the ordinary here, pay attention!".

I feel tricked when I try to figure out why the programmer wanted dot
to match newline, only to find that there isn't even a dot in the pattern.

It's like complaining someone uses
'use warnings' on a piece of code where it didn't matter.


'use warnings' always matters.[1] (heh)



[1] Message-ID: <[email protected]>
 
A

Ala Qumsieh

Dr.Ruud said:
OK. I am still trying to think up why it was chosen to not affect the
replacement part. I have no doubt that there is a simple explanation why
it is not feasible, but I just can't think it up (tired of working some
very long days, but very satisfied with the results and very happy with
Perl).

I don't think the reason is that it's not feasible, but rather that it's
not intuitive. Regular expressions can be messy, so having an option to
add comments, and 'beautify' them is a good idea. The replacement part
of an s/// is simply a string, and won't really benefit much from such
an option.

Moreover, you CAN add comments in the replacement part if you want to.
You just need to modify your code slightly, and use the /e modifier.
From your example:

$_ = "abc 123 def 123 ghi";

$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/ # by
4 . # FOUR
5 . # FIVE
6 # SIX
/gsex; # global sex

But, I think this can be less readable than the alternative if the
replacement part is a simple string.

--Ala
 
D

Dr.Ruud

Ala Qumsieh:

Regular expressions can be messy, so having an
option to add comments, and 'beautify' them is a good idea. The
replacement part of an s/// is simply a string, and won't really
benefit much from such an option.

I met this with some rather lengthy strings of \x{####} in both the
search and the replacement part.

Moreover, you CAN add comments in the replacement part if you want to.
You just need to modify your code slightly, and use the /e modifier.
From your example:

$_ = "abc 123 def 123 ghi";

$x = s/ # Replace
1 # ONE
2 # TWO
3 # THREE
/ # by
4 . # FOUR
5 . # FIVE
6 # SIX
/gsex; # global sex

Thanks, that looks workable. Will the o-modifier make up for any lost
performance? I'll test it.

But, I think this can be less readable than the alternative if the
replacement part is a simple string.

Yes, but in my case it often isn't. The algorithm needs to be checked by
linguists. They rather read the Unicode character names and such, so I
like to use the \N{name} format, but (without that e-modifier) that
would give very lengthy lines. I was going to store everything in
variables, but I'll test this format too.

Short example (without backreferences):

$x = s/(?<=\x{0020})

\x{0111}\x{0123}\x{0222}\x{02AA}\x{0123}\x{0223}\x{0221}\x{0241}\x{0247}
\x{02E2}\x{0223}(?=\x{0020})

/\x{0117}\x{000D}\x{0223}\x{02AA}\x{000D}\x{0223}\x{0221}\x{0221}\x{0223
}/gmsx;

(actual codes munged)
 
T

Tad McClellan

Dr.Ruud said:
Ala Qumsieh:

Thanks, that looks workable. Will the o-modifier make up for any lost
performance?


Of course not.

s///o is a no-op when there are no variables in the pattern part.

s///o has no effect whatsoever on the replacement string part.

but

/gosex; # go have sex

would be cute to have in code. :)
 
D

Dr.Ruud

Abigail:
/o only matters if you have a variable inside regexp, and then
only
if you encounter the regex more than once with a different value in
the variable. And then only if you want to keep using the old value.


I have series of substitutions that have to be tried in order on every
line of many files.

To make the code more readable, I can store these substitutions in a
hash (with keys like 'A01' meaning phase A, first substitution).

It is no problem to unloop the code for speed, so it might look like:

$x = s/$re{'A01'}[SRCH]/$re{'A01')[REPL]/gsx; # or /gosx
print STDERR $re{'A01'}[NAME], $x if ($x > $re{'A01'}[MIN]);

$x = s/$re{'A02'}[SRCH]/$re{'A02')[REPL]/gsx;
print STDERR $re{'A02'}[NAME], $x if ($x > $re{'A02'}[MIN]);

(and then dozens more)

If possible, I would like the modifiers to be in $re{'key'}[MODS].
(yes, this is all totally untested code yet)

OK, let me first try and test the alternatives. I still have a few days.
 
D

Dr.Ruud

Abigail:
does "$re{A01}[SRCH]" change?

No, it's a constant.

If the first
question is answered with 'no', then using /o doesn't matter.

OK. I still hesitate that /o really doesn't matter, because I still
expect that a test needs to be done to find out if the variable has
changed or not, but even with such a (fast) test it can hardly matter.

If possible, I would like the modifiers to be in $re{'key'}[MODS].
(yes, this is all totally untested code yet)

s/(?$re{key}[MODS])$re{key}[SRCH]/$re{key}[REPL]/

ought to do the trick.

Ah, nice. Just another thing that I had read about but hadn't used yet.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top