[regexp] Changing lines NOT containing a pattern

Discussion in 'Perl Misc' started by azrazer, Oct 6, 2009.

  1. azrazer

    azrazer Guest

    Hello,
    I recently found an interesting issue on fr.comp.lang.perl and thought it
    would be good to share [since not answers were found until now]. So here
    it goes.

    A file is slurped into a scalar variable (let say $my_text) [NOT AN
    ARRAY].
    This $my_text now contains many lines of this form : <code>;<comments>.

    The question is : Using a regexp (with mg flags) How to do the following
    for all lines at once ?
    1/ if <code> contains a fixed word [let say WORD] then do not remove
    comments
    2/ if <code> does nots contain WORD then remove comments

    I have tried using look-forward and behind regexps but i guess it is not
    the good way of doing it. Also, i wanted to try using extended regexps
    like (?(COND)true|false) but i ended up drawing a blank...

    Any help appreciated !
    Thanks a lot !

    azra
     
    azrazer, Oct 6, 2009
    #1
    1. Advertising

  2. azrazer

    azrazer Guest

    On Tue, 06 Oct 2009 17:09:50 -0500, Tad J McClellan wrote:

    [snip]
    >> The question is : Using a regexp (with mg flags)

    > Errrr, there is no need for the m//m flag, since there are no ^ or $
    > anchors in the pattern...

    Well, since the file is slurped, m flag might help finding line
    boundaries, isn't it ... ?
    [snip]
    > $my_text =~ s/(.*)(;.*)/$1 . (index($1, 'WORD') == -1 ? '' :
    > $2)/ge;

    Wow... so great, thanks a lot...
    Much more easy [and definitely cleaner] than what i tried...

    Thanks !

    azra
     
    azrazer, Oct 6, 2009
    #2
    1. Advertising

  3. azrazer <> wrote:
    >A file is slurped into a scalar variable (let say $my_text) [NOT AN
    >ARRAY].


    And there is your underlying basic problem.

    >This $my_text now contains many lines of this form : <code>;<comments>.
    >
    >The question is : Using a regexp (with mg flags) How to do the following
    >for all lines at once ?
    >1/ if <code> contains a fixed word [let say WORD] then do not remove
    >comments
    >2/ if <code> does nots contain WORD then remove comments


    Unless you are interesting in an academic excercise or intellectual mind
    twister it is _MUCH_ better to choose a data structure that fits the
    problem description.

    You have an abstract concept of "lines" and you want to do something
    with each line or don't want to do something with each line depending
    upon if that line contains something.
    Then for haven's sake choose a data structure that represents such a
    line!!! And convert your mega-string $my_text into an array of such
    lines, e.g. using split(). This way your whole problem will collapse
    into a simple

    s/.../.../ unless m/.../;

    Problem trivially solved.

    jue
     
    Jürgen Exner, Oct 7, 2009
    #3
  4. azrazer

    C.DeRykus Guest

    Re: Changing lines NOT containing a pattern

    On Oct 6, 3:28 pm, Ben Morrow <> wrote:
    > Quoth azrazer <>:
    >
    >
    >
    > > Hello,
    > > I recently found an interesting issue on fr.comp.lang.perl and thought it
    > > would be good to share [since not answers were found until now]. So here
    > > it goes.

    >
    > > A file is slurped into a scalar variable (let say $my_text) [NOT AN
    > > ARRAY].
    > > This $my_text now contains many lines of this form : <code>;<comments>.

    >
    > > The question is : Using a regexp (with mg flags) How to do the following
    > > for all lines at once ?
    > > 1/ if <code> contains a fixed word [let say WORD] then do not remove
    > > comments
    > > 2/ if <code> does nots contain WORD then remove comments

    >
    > > I have tried using look-forward and behind regexps but i guess it is not
    > > the good way of doing it. Also, i wanted to try using extended regexps
    > > like (?(COND)true|false) but i ended up drawing a blank...

    >
    > The obvious answer (besides the one Tad suggested, or simply splitting
    > twice on newlines and then on ';') would be
    >
    >     s/(?<! WORD .*) ; .*//gx
    >
    > but that doesn't work because perl doesn't do variable-length
    > look-behind.
    > ...


    Hm, late night.. but this does appear to work:

    s/ ( (?<!WORD) ) ;. * /$1/gx;

    (only tried in 5.10)

    --
    Charles DeRykus
     
    C.DeRykus, Oct 7, 2009
    #4
  5. azrazer

    Guest

    On 06 Oct 2009 21:51:57 GMT, azrazer <> wrote:

    >Hello,
    >I recently found an interesting issue on fr.comp.lang.perl and thought it
    >would be good to share [since not answers were found until now]. So here
    >it goes.
    >
    >A file is slurped into a scalar variable (let say $my_text) [NOT AN
    >ARRAY].
    >This $my_text now contains many lines of this form : <code>;<comments>.
    >
    >The question is : Using a regexp (with mg flags) How to do the following
    >for all lines at once ?
    >1/ if <code> contains a fixed word [let say WORD] then do not remove
    >comments
    >2/ if <code> does nots contain WORD then remove comments
    >
    >I have tried using look-forward and behind regexps but i guess it is not
    >the good way of doing it. Also, i wanted to try using extended regexps
    >like (?(COND)true|false) but i ended up drawing a blank...
    >
    >Any help appreciated !
    >Thanks a lot !
    >
    >azra


    Its moderately dificult, depending on what the overal conditions are.
    Simple lookahead is all this needs. And there are many ways to do this
    without extended regx's.

    -sln
    -------------------------

    use strict;
    use warnings;

    my $string = "
    1 this WORD here; this is ok
    2 word2 is not here; delete comment
    3 word3 is not here either; should not see this WORD, ; delete comment
    ";

    #$string =~ s/^ ( (?:(?! WORD ).)* ;) .* $ /$1/xmg;

    $string =~
    s/
    ^ # start of new line and substitution part

    ( # Capture group 1
    (?: # group
    (?! WORD ) # lookahead, not 'WORD' ? Continue else Fail line
    . # capture this character
    ) * # end group, do this zero or more times
    ; # capture ';'
    ) # end Capture group 1

    .* # get all from ';' to the end of line

    $ # end of new line, substitute with $1

    /$1/xmg;

    print $string,"\n";

    __END__
     
    , Oct 7, 2009
    #5
  6. azrazer

    C.DeRykus Guest

    Re: Changing lines NOT containing a pattern

    On Oct 7, 3:55 am, "C.DeRykus" <> wrote:
    > On Oct 6, 3:28 pm, Ben Morrow <> wrote:
    >
    >
    >
    > > Quoth azrazer <>:

    >
    > > > Hello,
    > > > I recently found an interesting issue on fr.comp.lang.perl and thought it
    > > > would be good to share [since not answers were found until now]. So here
    > > > it goes.

    >
    > > > A file is slurped into a scalar variable (let say $my_text) [NOT AN
    > > > ARRAY].
    > > > This $my_text now contains many lines of this form : <code>;<comments>.

    >
    > > > The question is : Using a regexp (with mg flags) How to do the following
    > > > for all lines at once ?
    > > > 1/ if <code> contains a fixed word [let say WORD] then do not remove
    > > > comments
    > > > 2/ if <code> does nots contain WORD then remove comments

    >
    > > > I have tried using look-forward and behind regexps but i guess it is not
    > > > the good way of doing it. Also, i wanted to try using extended regexps
    > > > like (?(COND)true|false) but i ended up drawing a blank...

    >
    > > The obvious answer (besides the one Tad suggested, or simply splitting
    > > twice on newlines and then on ';') would be

    >
    > >     s/(?<! WORD .*) ; .*//gx

    >
    > > but that doesn't work because perl doesn't do variable-length
    > > look-behind.
    > > ...

    >
    > Hm, late night..  but this does appear to work:
    >
    > s/ ( (?<!WORD) ) ;. * /$1/gx;
    >
    > (only tried in 5.10)



    This is confusing late-night nonsense since the lookaround
    assertion isn't captured and $1 isn't defined. But evidently
    there's a regex misfeature/bug and so it appears to work.
    At least that's my guess after looking at this output:

    perl -M"re debug" -wle "$_=q{xxxx;comment};s/((?<!WORD));.*/$1/gx";

    --
    Charles DeRykus
     
    C.DeRykus, Oct 8, 2009
    #6
  7. azrazer

    azrazer Guest

    On Wed, 07 Oct 2009 15:39:45 -0700, sln wrote:

    > On 06 Oct 2009 21:51:57 GMT, azrazer <> wrote:

    [...]
    >
    > Its moderately dificult, depending on what the overal conditions are.
    > Simple lookahead is all this needs. And there are many ways to do this
    > without extended regx's.
    >
    > -sln
    > -------------------------
    >
    > use strict;
    > use warnings;
    >
    > my $string = "
    > 1 this WORD here; this is ok
    > 2 word2 is not here; delete comment
    > 3 word3 is not here either; should not see this WORD, ; delete comment
    > ";
    >
    > #$string =~ s/^ ( (?:(?! WORD ).)* ;) .* $ /$1/xmg;
    >
    > $string =~
    > s/
    > ^ # start of new line and substitution part
    >
    > ( # Capture group 1
    > (?: # group
    > (?! WORD ) # lookahead, not 'WORD' ? Continue else Fail line
    > . # capture this character
    > ) * # end group, do this zero or more times ;
    > # capture ';'
    > ) # end Capture group 1
    >
    > .* # get all from ';' to the end of line
    >
    > $ # end of new line, substitute with $1
    >
    > /$1/xmg;
    >
    > print $string,"\n";
    >
    > __END__


    Ha ! great, that was what i was struggling with ... look-aheads.
    I actually forgot to group my pattern like this (?:(?!word).)* and did (?!
    word).* which did not work...
    Thanks a lot for this answer, i guess i learned a lot today :)

    Best,
    azra.
     
    azrazer, Oct 8, 2009
    #7
  8. azrazer

    azrazer Guest

    On Tue, 06 Oct 2009 19:44:20 -0500, Tad J McClellan wrote:

    > azrazer <> wrote:
    >> On Tue, 06 Oct 2009 17:09:50 -0500, Tad J McClellan wrote:
    >>
    >> [snip]
    >>>> The question is : Using a regexp (with mg flags)
    >>> Errrr, there is no need for the m//m flag, since there are no ^ or $
    >>> anchors in the pattern...

    >> Well, since the file is slurped, m flag might help finding line
    >> boundaries, isn't it ... ?

    >
    >
    > No.
    >
    > m//m ONLY affects the meaning of the ^ and $ anchors.
    >
    > It is useless and does nothing when those anchors are not used.


    Arh, sorry i think i still don't get it...
    m//m affects the meanings of ^ and $ ... and allows it to be matched for
    every line in the scalar variable, isn't it ?
    I mean, this way, it is possible to find treat every single line present
    within this variable using patterns like m/^...$/mg, then applying
    changes to every line if the regexp is correctly built.

    Am I wrong somewhere or did you say this for that your great pattern
    works without m flag ? :)

    Thanks again for the explanations,

    azra
     
    azrazer, Oct 8, 2009
    #8
  9. azrazer

    Guest

    On 08 Oct 2009 20:12:04 GMT, azrazer <> wrote:

    >On Wed, 07 Oct 2009 15:39:45 -0700, sln wrote:
    >
    >> On 06 Oct 2009 21:51:57 GMT, azrazer <> wrote:

    >[...]
    >>
    >> Its moderately dificult, depending on what the overal conditions are.
    >> Simple lookahead is all this needs. And there are many ways to do this
    >> without extended regx's.
    >>
    >> -sln
    >> -------------------------
    >>

    <delete old regex>

    >I actually forgot to group my pattern like this (?:(?!word).)* and did (?!
    >word).* which did not work...


    There is a '\K' option, a sentence from perlre.html docs:
    ".. it is especially useful in situations where you want to efficiently
    remove something following something else in a string."

    This would be more efficient to use this in combination with a lookahead.
    Compare these:

    $string =~ s/^ ( (?:(?! WORD ).)* ;) .* $ /$1/xmg;
    $string =~ s/ ^ (?:(?! WORD ).)* ; \K .* $ //xmg;

    -sln
    ---------

    use strict;
    use warnings;

    my $string = "
    1 this WORD here; this is ok
    2 word2 is not here; delete comment
    3 word3 is not here either; should not see this WORD, ; delete comment
    ";

    $string =~ s/ ^ (?:(?! WORD ).)* ; \K .* $ //xmg;
    print $string,"\n";

    __END__
     
    , Oct 9, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    603
    Craig Deelsnyder
    Oct 25, 2003
  2. Umesh
    Replies:
    8
    Views:
    473
    Charlton Wilbur
    May 30, 2007
  3. Sullivan WxPyQtKinter
    Replies:
    18
    Views:
    584
    John J. Lee
    Aug 12, 2007
  4. Francesco Pietra

    Delete lines containing a specific word

    Francesco Pietra, Jan 6, 2008, in forum: Python
    Replies:
    3
    Views:
    791
  5. Joao Silva
    Replies:
    16
    Views:
    381
    7stud --
    Aug 21, 2009
Loading...

Share This Page