[regexp] Changing lines NOT containing a pattern

A

azrazer

Hello,
I recently found an interesting issue on fr.comp.lang.perl and thought it
would be good to share [since not answers were found until now]. So here
it goes.

A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].
This $my_text now contains many lines of this form : <code>;<comments>.

The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments

I have tried using look-forward and behind regexps but i guess it is not
the good way of doing it. Also, i wanted to try using extended regexps
like (?(COND)true|false) but i ended up drawing a blank...

Any help appreciated !
Thanks a lot !

azra
 
A

azrazer

On Tue, 06 Oct 2009 17:09:50 -0500, Tad J McClellan wrote:

[snip]
Errrr, there is no need for the m//m flag, since there are no ^ or $
anchors in the pattern...
Well, since the file is slurped, m flag might help finding line
boundaries, isn't it ... ?
[snip]
$my_text =~ s/(.*)(;.*)/$1 . (index($1, 'WORD') == -1 ? '' :
$2)/ge;
Wow... so great, thanks a lot...
Much more easy [and definitely cleaner] than what i tried...

Thanks !

azra
 
J

Jürgen Exner

azrazer said:
A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].

And there is your underlying basic problem.
This $my_text now contains many lines of this form : <code>;<comments>.

The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments

Unless you are interesting in an academic excercise or intellectual mind
twister it is _MUCH_ better to choose a data structure that fits the
problem description.

You have an abstract concept of "lines" and you want to do something
with each line or don't want to do something with each line depending
upon if that line contains something.
Then for haven's sake choose a data structure that represents such a
line!!! And convert your mega-string $my_text into an array of such
lines, e.g. using split(). This way your whole problem will collapse
into a simple

s/.../.../ unless m/.../;

Problem trivially solved.

jue
 
C

C.DeRykus

Quoth azrazer <[email protected]>:


Hello,
I recently found an interesting issue on fr.comp.lang.perl and thought it
would be good to share [since not answers were found until now]. So here
it goes.
A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].
This $my_text now contains many lines of this form : <code>;<comments>.
The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments
I have tried using look-forward and behind regexps but i guess it is not
the good way of doing it. Also, i wanted to try using extended regexps
like (?(COND)true|false) but i ended up drawing a blank...

The obvious answer (besides the one Tad suggested, or simply splitting
twice on newlines and then on ';') would be

    s/(?<! WORD .*) ; .*//gx

but that doesn't work because perl doesn't do variable-length
look-behind.
...

Hm, late night.. but this does appear to work:

s/ ( (?<!WORD) ) ;. * /$1/gx;

(only tried in 5.10)
 
S

sln

Hello,
I recently found an interesting issue on fr.comp.lang.perl and thought it
would be good to share [since not answers were found until now]. So here
it goes.

A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].
This $my_text now contains many lines of this form : <code>;<comments>.

The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments

I have tried using look-forward and behind regexps but i guess it is not
the good way of doing it. Also, i wanted to try using extended regexps
like (?(COND)true|false) but i ended up drawing a blank...

Any help appreciated !
Thanks a lot !

azra

Its moderately dificult, depending on what the overal conditions are.
Simple lookahead is all this needs. And there are many ways to do this
without extended regx's.

-sln
-------------------------

use strict;
use warnings;

my $string = "
1 this WORD here; this is ok
2 word2 is not here; delete comment
3 word3 is not here either; should not see this WORD, ; delete comment
";

#$string =~ s/^ ( (?:(?! WORD ).)* ;) .* $ /$1/xmg;

$string =~
s/
^ # start of new line and substitution part

( # Capture group 1
(?: # group
(?! WORD ) # lookahead, not 'WORD' ? Continue else Fail line
. # capture this character
) * # end group, do this zero or more times
; # capture ';'
) # end Capture group 1

.* # get all from ';' to the end of line

$ # end of new line, substitute with $1

/$1/xmg;

print $string,"\n";

__END__
 
C

C.DeRykus

Quoth azrazer <[email protected]>:
Hello,
I recently found an interesting issue on fr.comp.lang.perl and thought it
would be good to share [since not answers were found until now]. So here
it goes.
A file is slurped into a scalar variable (let say $my_text) [NOT AN
ARRAY].
This $my_text now contains many lines of this form : <code>;<comments>.
The question is : Using a regexp (with mg flags) How to do the following
for all lines at once ?
1/ if <code> contains a fixed word [let say WORD] then do not remove
comments
2/ if <code> does nots contain WORD then remove comments
I have tried using look-forward and behind regexps but i guess it is not
the good way of doing it. Also, i wanted to try using extended regexps
like (?(COND)true|false) but i ended up drawing a blank...
The obvious answer (besides the one Tad suggested, or simply splitting
twice on newlines and then on ';') would be
    s/(?<! WORD .*) ; .*//gx
but that doesn't work because perl doesn't do variable-length
look-behind.
...

Hm, late night..  but this does appear to work:

s/ ( (?<!WORD) ) ;. * /$1/gx;

(only tried in 5.10)


This is confusing late-night nonsense since the lookaround
assertion isn't captured and $1 isn't defined. But evidently
there's a regex misfeature/bug and so it appears to work.
At least that's my guess after looking at this output:

perl -M"re debug" -wle "$_=q{xxxx;comment};s/((?<!WORD));.*/$1/gx";
 
A

azrazer

Its moderately dificult, depending on what the overal conditions are.
Simple lookahead is all this needs. And there are many ways to do this
without extended regx's.

-sln
-------------------------

use strict;
use warnings;

my $string = "
1 this WORD here; this is ok
2 word2 is not here; delete comment
3 word3 is not here either; should not see this WORD, ; delete comment
";

#$string =~ s/^ ( (?:(?! WORD ).)* ;) .* $ /$1/xmg;

$string =~
s/
^ # start of new line and substitution part

( # Capture group 1
(?: # group
(?! WORD ) # lookahead, not 'WORD' ? Continue else Fail line
. # capture this character
) * # end group, do this zero or more times ;
# capture ';'
) # end Capture group 1

.* # get all from ';' to the end of line

$ # end of new line, substitute with $1

/$1/xmg;

print $string,"\n";

__END__

Ha ! great, that was what i was struggling with ... look-aheads.
I actually forgot to group my pattern like this (?:(?!word).)* and did (?!
word).* which did not work...
Thanks a lot for this answer, i guess i learned a lot today :)

Best,
azra.
 
A

azrazer

azrazer said:
On Tue, 06 Oct 2009 17:09:50 -0500, Tad J McClellan wrote:

[snip]
The question is : Using a regexp (with mg flags)
Errrr, there is no need for the m//m flag, since there are no ^ or $
anchors in the pattern...
Well, since the file is slurped, m flag might help finding line
boundaries, isn't it ... ?


No.

m//m ONLY affects the meaning of the ^ and $ anchors.

It is useless and does nothing when those anchors are not used.

Arh, sorry i think i still don't get it...
m//m affects the meanings of ^ and $ ... and allows it to be matched for
every line in the scalar variable, isn't it ?
I mean, this way, it is possible to find treat every single line present
within this variable using patterns like m/^...$/mg, then applying
changes to every line if the regexp is correctly built.

Am I wrong somewhere or did you say this for that your great pattern
works without m flag ? :)

Thanks again for the explanations,

azra
 
S

sln

I actually forgot to group my pattern like this (?:(?!word).)* and did (?!
word).* which did not work...

There is a '\K' option, a sentence from perlre.html docs:
".. it is especially useful in situations where you want to efficiently
remove something following something else in a string."

This would be more efficient to use this in combination with a lookahead.
Compare these:

$string =~ s/^ ( (?:(?! WORD ).)* ;) .* $ /$1/xmg;
$string =~ s/ ^ (?:(?! WORD ).)* ; \K .* $ //xmg;

-sln
---------

use strict;
use warnings;

my $string = "
1 this WORD here; this is ok
2 word2 is not here; delete comment
3 word3 is not here either; should not see this WORD, ; delete comment
";

$string =~ s/ ^ (?:(?! WORD ).)* ; \K .* $ //xmg;
print $string,"\n";

__END__
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top