[regexp] Changing lines NOT containing a pattern

azrazer · Oct 6, 2009

Hello,
I recently found an interesting issue on fr.comp.lang.perl and thought it
would be good to share [since not answers were found until now]. So here
it goes.

A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].
This $my_text now contains many lines of this form : <code>;<comments>.

The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments

I have tried using look-forward and behind regexps but i guess it is not
the good way of doing it. Also, i wanted to try using extended regexps
like (?(COND)true|false) but i ended up drawing a blank...

Any help appreciated !
Thanks a lot !

azra

azrazer · Oct 6, 2009

On Tue, 06 Oct 2009 17:09:50 -0500, Tad J McClellan wrote:

[snip]

Errrr, there is no need for the m//m flag, since there are no ^ or $
anchors in the pattern...

Well, since the file is slurped, m flag might help finding line
boundaries, isn't it ... ?
[snip]

$my_text =~ s/(.*)(;.*)/$1 . (index($1, 'WORD') == -1 ? '' :
$2)/ge;

Wow... so great, thanks a lot...
Much more easy [and definitely cleaner] than what i tried...

Thanks !

azra

Jürgen Exner · Oct 6, 2009

azrazer said:
A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].

And there is your underlying basic problem.

This $my_text now contains many lines of this form : <code>;<comments>.

The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments

Unless you are interesting in an academic excercise or intellectual mind
twister it is _MUCH_ better to choose a data structure that fits the
problem description.

You have an abstract concept of "lines" and you want to do something
with each line or don't want to do something with each line depending
upon if that line contains something.
Then for haven's sake choose a data structure that represents such a
line!!! And convert your mega-string $my_text into an array of such
lines, e.g. using split(). This way your whole problem will collapse
into a simple

s/.../.../ unless m/.../;

Problem trivially solved.

jue

C.DeRykus · Oct 7, 2009

Quoth azrazer <[email protected]>:

Hello,
I recently found an interesting issue on fr.comp.lang.perl and thought it
would be good to share [since not answers were found until now]. So here
it goes.

Click to expand...

A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].
This $my_text now contains many lines of this form : <code>;<comments>.

Click to expand...

The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments

Click to expand...

I have tried using look-forward and behind regexps but i guess it is not
the good way of doing it. Also, i wanted to try using extended regexps
like (?(COND)true|false) but i ended up drawing a blank...

Click to expand...

The obvious answer (besides the one Tad suggested, or simply splitting
twice on newlines and then on ';') would be

s/(?<! WORD .*) ; .*//gx

but that doesn't work because perl doesn't do variable-length
look-behind.
...

Hm, late night.. but this does appear to work:

s/ ( (?<!WORD) ) ;. * /$1/gx;

(only tried in 5.10)

sln · Oct 7, 2009

Hello,
I recently found an interesting issue on fr.comp.lang.perl and thought it
would be good to share [since not answers were found until now]. So here
it goes.

A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].
This $my_text now contains many lines of this form : <code>;<comments>.

The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments

I have tried using look-forward and behind regexps but i guess it is not
the good way of doing it. Also, i wanted to try using extended regexps
like (?(COND)true|false) but i ended up drawing a blank...

Any help appreciated !
Thanks a lot !

azra

Its moderately dificult, depending on what the overal conditions are.
Simple lookahead is all this needs. And there are many ways to do this
without extended regx's.

-sln
-------------------------

use strict;
use warnings;

my $string = "
1 this WORD here; this is ok
2 word2 is not here; delete comment
3 word3 is not here either; should not see this WORD, ; delete comment
";

#$string =~ s/^ ( (?

?! WORD ).)*

.* $ /$1/xmg;

$string =~
s/
^ # start of new line and substitution part

( # Capture group 1
(?: # group
(?! WORD ) # lookahead, not 'WORD' ? Continue else Fail line
. # capture this character
) * # end group, do this zero or more times
; # capture ';'
) # end Capture group 1

.* # get all from ';' to the end of line

$ # end of new line, substitute with $1

/$1/xmg;

print $string,"\n";

__END__

C.DeRykus · Oct 8, 2009

Quoth azrazer <[email protected]>:

Hello,
I recently found an interesting issue on fr.comp.lang.perl and thought it
would be good to share [since not answers were found until now]. So here
it goes.
A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].
This $my_text now contains many lines of this form : <code>;<comments>.
The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments
I have tried using look-forward and behind regexps but i guess it is not
the good way of doing it. Also, i wanted to try using extended regexps
like (?(COND)true|false) but i ended up drawing a blank...

Click to expand...

Click to expand...

The obvious answer (besides the one Tad suggested, or simply splitting
twice on newlines and then on ';') would be

Click to expand...

s/(?<! WORD .*) ; .*//gx

Click to expand...

but that doesn't work because perl doesn't do variable-length
look-behind.
...

Click to expand...

Hm, late night.. but this does appear to work:

s/ ( (?<!WORD) ) ;. * /$1/gx;

(only tried in 5.10)

This is confusing late-night nonsense since the lookaround
assertion isn't captured and $1 isn't defined. But evidently
there's a regex misfeature/bug and so it appears to work.
At least that's my guess after looking at this output:

perl -M"re debug" -wle "$_=q{xxxx;comment};s/((?<!WORD));.*/$1/gx";

azrazer · Oct 8, 2009

Its moderately dificult, depending on what the overal conditions are.
Simple lookahead is all this needs. And there are many ways to do this
without extended regx's.

-sln
-------------------------

use strict;
use warnings;

my $string = "
1 this WORD here; this is ok
2 word2 is not here; delete comment
3 word3 is not here either; should not see this WORD, ; delete comment
";

#$string =~ s/^ ( (??! WORD ).)* .* $ /$1/xmg;

$string =~
s/
^ # start of new line and substitution part

( # Capture group 1
(?: # group
(?! WORD ) # lookahead, not 'WORD' ? Continue else Fail line
. # capture this character
) * # end group, do this zero or more times ;
# capture ';'
) # end Capture group 1

.* # get all from ';' to the end of line

$ # end of new line, substitute with $1

/$1/xmg;

print $string,"\n";

__END__

Ha ! great, that was what i was struggling with ... look-aheads.
I actually forgot to group my pattern like this (?

?!word).)* and did (?!
word).* which did not work...
Thanks a lot for this answer, i guess i learned a lot today

Best,
azra.

azrazer · Oct 8, 2009

azrazer said:
azrazer said:

On Tue, 06 Oct 2009 17:09:50 -0500, Tad J McClellan wrote:

[snip]

The question is : Using a regexp (with mg flags)
Errrr, there is no need for the m//m flag, since there are no ^ or $
anchors in the pattern...

Click to expand...

Well, since the file is slurped, m flag might help finding line
boundaries, isn't it ... ?

Click to expand...

No.

m//m ONLY affects the meaning of the ^ and $ anchors.

It is useless and does nothing when those anchors are not used.

Arh, sorry i think i still don't get it...
m//m affects the meanings of ^ and $ ... and allows it to be matched for
every line in the scalar variable, isn't it ?
I mean, this way, it is possible to find treat every single line present
within this variable using patterns like m/^...$/mg, then applying
changes to every line if the regexp is correctly built.

Am I wrong somewhere or did you say this for that your great pattern
works without m flag ?

Thanks again for the explanations,

azra

sln · Oct 9, 2009

I actually forgot to group my pattern like this (??!word).)* and did (?!
word).* which did not work...

There is a '\K' option, a sentence from perlre.html docs:
".. it is especially useful in situations where you want to efficiently
remove something following something else in a string."

This would be more efficient to use this in combination with a lookahead.
Compare these:

$string =~ s/^ ( (?

?! WORD ).)*

.* $ /$1/xmg;
$string =~ s/ ^ (?

?! WORD ).)* ; \K .* $ //xmg;

-sln
---------

use strict;
use warnings;

my $string = "
1 this WORD here; this is ok
2 word2 is not here; delete comment
3 word3 is not here either; should not see this WORD, ; delete comment
";

$string =~ s/ ^ (?

?! WORD ).)* ; \K .* $ //xmg;
print $string,"\n";

__END__

RegExp pattern / replace function	0	Mar 3, 2025
Regexp discovery - using ^ with /m is a time sink	5	Feb 14, 2009
I need help with a Gemini prompt	1	May 14, 2025
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
Privacy Shield A Clean, C++ Win32 Tool for Temporarily Masking Windows	4	Mar 25, 2026
matching over multiple lines	4	Nov 20, 2006
What's a pattern?	50	Aug 5, 2009
Help with finding difference between two bodies of text in order	0	Sep 10, 2024

[regexp] Changing lines NOT containing a pattern

azrazer

azrazer

Jürgen Exner

C.DeRykus

sln

C.DeRykus

azrazer

azrazer

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads