Prematch ($`) and the m//g modifier

M

Mahesh Asolkar

Greetings,

I am trying to pick parts of a text file, process them and put them
back in place. The parts to be processed are marked by some begin and
end patterns.

Following is the essence of what I have.
-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

#
# Any alternative to the following?
#
$_ = $post;
}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

The above script generates results as I want - it picks parts between
'begin' and 'end', processes them (upcasing) and prints the data in the
original flow. The data is not printed exactly as in the data source
for debugging purposes:

-----
% script.pl
other text1
--> PART1
other text2
--> PART2
other text3
--> PART3
other text4
--> PART4
other text5

-----

Although, I am wondering if there is a way to do away with the '$_ =
$post' line.

I tried to use m//g modifier to remember the position of last match and
resume from there. But then I get a PREMATCH that begins at the
begining of the entire string, not where the matching resumed.

-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/g) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

Any suggestions?

Thanks,
Mahesh.
 
A

A. Sinan Unur

Greetings,

I am trying to pick parts of a text file, process them and put them
back in place. The parts to be processed are marked by some begin and
end patterns.

Following is the essence of what I have.
-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

#
# Any alternative to the following?
#
$_ = $post;
}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

Maybe I am missing something but how about:

#!/usr/bin/perl

use strict;
use warnings;

my $text = do { local $/; <DATA> };

$text =~ s{begin(.+?)end}{uc($1)}egms;

print $text;

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

Sinan

PS: Thank you for an unusually clear, concise and to-the-point post.
 
M

Mahesh Asolkar

A. Sinan Unur said:
Maybe I am missing something but how about:

#!/usr/bin/perl

use strict;
use warnings;

my $text = do { local $/; <DATA> };

$text =~ s{begin(.+?)end}{uc($1)}egms;

That pretty much does it! Only I used:

$text =~ s{begin(.+?)end}{"begin" . uc($1) . "end"}egms;

in order to preserve the begin and end markings.
print $text;

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

Sinan

Thanks,
Mahesh.
 
A

A. Sinan Unur

That pretty much does it! Only I used:

$text =~ s{begin(.+?)end}{"begin" . uc($1) . "end"}egms;

in order to preserve the begin and end markings.

I couldn't tell if you wanted them or not.

You are most welcome.

Sinan
 
U

Uri Guttman

use File::Slurp for that. cleaner and faster.

MA> That pretty much does it! Only I used:

MA> $text =~ s{begin(.+?)end}{"begin" . uc($1) . "end"}egms;

first, you can lose the /e and clean up that replacement like this:

$text =~ s{begin(.+?)end}{begin\U$1\Eend}gms;

then we remove the /m since you don't have any anchors there

$text =~ s{begin(.+?)end}{begin\U$1\Eend}gs;

then we remove the redundant strings (so if you need to change them you
only change in one place):

$text =~ s{(begin)(.+?)(end)}{$1\U$2\E$3}gs;

then we switch back to the normal / delimiters since there are no /'s
that need to be escaped in there. only use alternate delimiters when
you need to, not because you like their style.

$text =~ s/(begin)(.+?)(end)/$1\U$2\E$3/gs;

uri
 
J

John Bokma

Uri Guttman said:
....


use File::Slurp for that. cleaner and faster.
....

MA> $text =~ s{begin(.+?)end}{"begin" . uc($1) . "end"}egms;
....

$text =~ s/(begin)(.+?)(end)/$1\U$2\E$3/gs;

Well written, thanks.

And I am also a File::Slurp fan ;-)
 
X

Xicheng

Mahesh said:
Greetings,

I am trying to pick parts of a text file, process them and put them
back in place. The parts to be processed are marked by some begin and
end patterns.

Following is the essence of what I have.
-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

#
# Any alternative to the following?
#
$_ = $post;
}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

The above script generates results as I want - it picks parts between
'begin' and 'end', processes them (upcasing) and prints the data in the
original flow. The data is not printed exactly as in the data source
for debugging purposes:

-----
% script.pl
other text1
--> PART1
other text2
--> PART2
other text3
--> PART3
other text4
--> PART4
other text5

-----

Although, I am wondering if there is a way to do away with the '$_ =
$post' line.

I tried to use m//g modifier to remember the position of last match and
resume from there. But then I get a PREMATCH that begins at the
begining of the entire string, not where the matching resumed.

-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/g) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

you dont need to slurp in your whole file when your data are arranged
quite regularly. For your sample data, you might change the IRS $/ to
"end":
=======================
use strict;
use warnings;

local ($/, $\) = ('end', "\n");

while (<DATA>) {
chomp;
m{\A\s*(.*)\s+begin\s+(\S*)}sg and print "$1\n--> ",uc($2) and
next;
s/\G //g; print;
# if every block has begin-end pairs, you may not need this line....
}
__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5

=====result=====
other text1
--> PART1
other text2
--> PART2
other text3
--> PART3
other text4
--> PART4
other text5
===============

or you can do it in the slurp-mode, which prints the same thing as the
above result.

s{\sbegin\s(\S+)\send\s}{\n--> \U\1\n}g;

Xicheng
 
J

John W. Krahn

Mahesh said:
I am trying to pick parts of a text file, process them and put them
back in place. The parts to be processed are marked by some begin and
end patterns.

Following is the essence of what I have.
-----
#!/usr/bin/perl

use strict;
use warnings;

local $/;
local $_ = <DATA>;

my $post;

while (/\s*(begin|end)\s*/) {
my ($pre, $mat) = ($`, $&);
$post = $';
print "" . ($mat =~ /begin/)
? " $pre\n"
: "--> " . uc($pre) . "\n";

#
# Any alternative to the following?
#
$_ = $post;
}
print " $post";

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5
-----

The above script generates results as I want - it picks parts between
'begin' and 'end', processes them (upcasing) and prints the data in the
original flow. The data is not printed exactly as in the data source
for debugging purposes:

-----
% script.pl
other text1
--> PART1
other text2
--> PART2
other text3
--> PART3
other text4
--> PART4
other text5

-----

Although, I am wondering if there is a way to do away with the '$_ =
$post' line.

[snip]

Any suggestions?

This appears to work (YMMV):

#!/usr/bin/perl

use strict;
use warnings;

while ( <DATA> ) {
chomp;
if ( /begin/ ) {
print " $1\n--> \U$2\n" while s/(.+?)begin(.+?)end//;
$_ .= ' ' . <DATA>;
redo;
}
print " $_\n";
}

__DATA__
other text1 begin part1 end other text2
begin part2 end other text3 begin part3
end other text4 begin part4 end other text5



John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top