Prematch ($`) and the m//g modifier

Mahesh Asolkar · Mar 10, 2006

Greetings,

I am trying to pick parts of a text file, process them and put them
back in place. The parts to be processed are marked by some begin and
end patterns.

Following is the essence of what I have.
-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

#
# Any alternative to the following?
#
$_ = $post;
}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

The above script generates results as I want - it picks parts between
'begin' and 'end', processes them (upcasing) and prints the data in the
original flow. The data is not printed exactly as in the data source
for debugging purposes:

-----
% script.pl
other text1
--> PART1
other text2
--> PART2
other text3
--> PART3
other text4
--> PART4
other text5

-----

Although, I am wondering if there is a way to do away with the '$_ =
$post' line.

I tried to use m//g modifier to remember the position of last match and
resume from there. But then I get a PREMATCH that begins at the
begining of the entire string, not where the matching resumed.

-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/g) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

Any suggestions?

Thanks,
Mahesh.

A. Sinan Unur · Mar 10, 2006

Greetings,

I am trying to pick parts of a text file, process them and put them
back in place. The parts to be processed are marked by some begin and
end patterns.

Following is the essence of what I have.
-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

#
# Any alternative to the following?
#
$_ = $post;
}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

Maybe I am missing something but how about:

#!/usr/bin/perl

use strict;
use warnings;

my $text = do { local $/; <DATA> };

$text =~ s{begin(.+?)end}{uc($1)}egms;

print $text;

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

Sinan

PS: Thank you for an unusually clear, concise and to-the-point post.

Mahesh Asolkar · Mar 10, 2006

A. Sinan Unur said:
Maybe I am missing something but how about:

#!/usr/bin/perl

use strict;
use warnings;

my $text = do { local $/; <DATA> };

$text =~ s{begin(.+?)end}{uc($1)}egms;

That pretty much does it! Only I used:

$text =~ s{begin(.+?)end}{"begin" . uc($1) . "end"}egms;

in order to preserve the begin and end markings.

print $text;

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

Sinan

Thanks,
Mahesh.

A. Sinan Unur · Mar 10, 2006

That pretty much does it! Only I used:

$text =~ s{begin(.+?)end}{"begin" . uc($1) . "end"}egms;

in order to preserve the begin and end markings.

I couldn't tell if you wanted them or not.

Thanks,

You are most welcome.

Sinan

Uri Guttman · Mar 10, 2006

use File::Slurp for that. cleaner and faster.

MA> That pretty much does it! Only I used:

MA> $text =~ s{begin(.+?)end}{"begin" . uc($1) . "end"}egms;

first, you can lose the /e and clean up that replacement like this:

$text =~ s{begin(.+?)end}{begin\U$1\Eend}gms;

then we remove the /m since you don't have any anchors there

$text =~ s{begin(.+?)end}{begin\U$1\Eend}gs;

then we remove the redundant strings (so if you need to change them you
only change in one place):

$text =~ s{(begin)(.+?)(end)}{$1\U$2\E$3}gs;

then we switch back to the normal / delimiters since there are no /'s
that need to be escaped in there. only use alternate delimiters when
you need to, not because you like their style.

$text =~ s/(begin)(.+?)(end)/$1\U$2\E$3/gs;

uri

John Bokma · Mar 10, 2006

Uri Guttman said:
....

use File::Slurp for that. cleaner and faster.
....

MA> $text =~ s{begin(.+?)end}{"begin" . uc($1) . "end"}egms;
....

$text =~ s/(begin)(.+?)(end)/$1\U$2\E$3/gs;

Well written, thanks.

And I am also a File::Slurp fan ;-)

Xicheng · Mar 10, 2006

Mahesh said:
Greetings,

I am trying to pick parts of a text file, process them and put them
back in place. The parts to be processed are marked by some begin and
end patterns.

Following is the essence of what I have.
-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

#
# Any alternative to the following?
#
$_ = $post;
}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

The above script generates results as I want - it picks parts between
'begin' and 'end', processes them (upcasing) and prints the data in the
original flow. The data is not printed exactly as in the data source
for debugging purposes:

-----
% script.pl
other text1
--> PART1
other text2
--> PART2
other text3
--> PART3
other text4
--> PART4
other text5

-----

Although, I am wondering if there is a way to do away with the '$_ =
$post' line.

I tried to use m//g modifier to remember the position of last match and
resume from there. But then I get a PREMATCH that begins at the
begining of the entire string, not where the matching resumed.

-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/g) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

you dont need to slurp in your whole file when your data are arranged
quite regularly. For your sample data, you might change the IRS $/ to
"end":
=======================
use strict;
use warnings;

local ($/, $\) = ('end', "\n");

while (<DATA>) {
chomp;
m{\A\s*(.*)\s+begin\s+(\S*)}sg and print "$1\n--> ",uc($2) and
next;
s/\G //g; print;
# if every block has begin-end pairs, you may not need this line....
}
__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

=====result=====
other text1
--> PART1
other text2
--> PART2
other text3
--> PART3
other text4
--> PART4
other text5
===============

or you can do it in the slurp-mode, which prints the same thing as the
above result.

s{\sbegin\s(\S+)\send\s}{\n--> \U\1\n}g;

Xicheng

John W. Krahn · Mar 10, 2006

Mahesh said:
I am trying to pick parts of a text file, process them and put them
back in place. The parts to be processed are marked by some begin and
end patterns.

Following is the essence of what I have.
-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

#
# Any alternative to the following?
#
$_ = $post;
}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

The above script generates results as I want - it picks parts between
'begin' and 'end', processes them (upcasing) and prints the data in the
original flow. The data is not printed exactly as in the data source
for debugging purposes:

-----
% script.pl
other text1
--> PART1
other text2
--> PART2
other text3
--> PART3
other text4
--> PART4
other text5

-----

Although, I am wondering if there is a way to do away with the '$_ =
$post' line.

[snip]

Any suggestions?

This appears to work (YMMV):

#!/usr/bin/perl

use strict;
use warnings;

while ( <DATA> ) {
chomp;
if ( /begin/ ) {
print " $1\n--> \U$2\n" while s/(.+?)begin(.+?)end//;
$_ .= ' ' . <DATA>;
redo;
}
print " $_\n";
}

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

John

Help with my responsive home page	2	Dec 14, 2022
i=infinity;0= isin kpi, 1=cos kpi, k=m/n, n=4,m=0-00; cG=20=const, 1/sgrt2>G>0.5, 6<N = NA ^2su	13	Aug 8, 2006
Threads and Directory Handles	2	Apr 20, 2010
Child processes don't get the close on pipe	3	Jun 2, 2012
FAQ 5.2 How do I change, delete, or insert a line in a file, or append to the beginning of a file?	0	Feb 24, 2011
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
math calculations, form population and submission	1	Dec 11, 2007

Prematch ($`) and the m//g modifier

Mahesh Asolkar

A. Sinan Unur

Mahesh Asolkar

A. Sinan Unur

Uri Guttman

John Bokma

Xicheng

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads