removing paragraphs from text files

Discussion in 'Perl Misc' started by alfonsobaldaserra, Jul 13, 2009.

  1. hello,

    i have a specific paragraph in a bunch of configuration files that i
    want to remove. the lines are as follows

    define service{
    use linux-service
    host_name ninjasrv
    service_description PING
    check_command check_ping!100.0,20%!500.0,60%
    action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=
    $SERVICEDESC$
    }

    the 'use' and 'host_name' directives are different in each file. the
    unique string is 'PING'.

    i was just wondering if it is possible to do such thing in Perl?

    thanks.
    alfonsobaldaserra, Jul 13, 2009
    #1
    1. Advertising

  2. >     perl -p0777 -i -e 's/define service\{[^}]*PING[^}]*\}\s+//g' *.cf

    that was so amazing, all done in a single shot. could you please also
    help on what exactly is -p0777 and how did this substitution work 's/
    define service\{[^}]*PING[^}]*\}\s+//g'. i have never seen/read such
    regex.

    thanks again.
    alfonsobaldaserra, Jul 13, 2009
    #2
    1. Advertising

  3. > help on what exactly is -p0777 and how did this substitution work 's/
    > define service\{[^}]*PING[^}]*\}\s+//g'.  i have never seen/read such
    > regex.


    i just found
    -0777
    the separator between records is 777 in octal; this is not a real
    ASCII char so the whole file is slurped in as a single record;

    now my confusion is the regex match.
    it goes like, search for
    define service followed by a { then any characters but not } then PING
    then any characters but not } then atleast one space and replace with
    nothing. i am just wondering what exactly is this [^}]* doing. i
    tried it with .* like

    define service\{.*PING.*\}\s+//g
    but it would not replace.

    my understanding is that it should work because [^}]* (any character
    but not }) is same as .* in this case since I know there is no }
    before PING string.

    what am i missing?
    alfonsobaldaserra, Jul 13, 2009
    #3
  4. On 2009-07-13 08:52, alfonsobaldaserra <> wrote:
    >> how did this substitution work 's/ define
    >> service\{[^}]*PING[^}]*\}\s+//g'.  i have never seen/read such regex.

    [...]
    > now my confusion is the regex match.
    > it goes like, search for
    > define service followed by a { then any characters but not } then PING
    > then any characters but not } then atleast one space and replace with
    > nothing. i am just wondering what exactly is this [^}]* doing. i
    > tried it with .* like
    >
    > define service\{.*PING.*\}\s+//g
    > but it would not replace.
    >
    > my understanding is that it should work because [^}]* (any character
    > but not }) is same as .* in this case since I know there is no }
    > before PING string.


    /./ is not "any character" but "any character except newline" unless you
    use the /s modifier. So your substitution would only work if the whole
    section was on a single line.

    s/define service\{.*PING.*\}\s+//sg

    OTOH would match anything from the first "define service{" to the last
    "}" in the file (provided there's a PING somewhere between them) so it
    would probably remove a lot more than you want. The /[^}]*/ in Tad's
    regex is there to keep the match within a single brace-delimited block
    (and it's a bit simple-minded: It won't work if you have a } inside a
    comment, for example, but you probably don't, so that doesn't matter).

    hp
    Peter J. Holzer, Jul 13, 2009
    #4
  5. alfonsobaldaserra

    Guest

    On Mon, 13 Jul 2009 01:52:14 -0700 (PDT), alfonsobaldaserra <> wrote:

    >> help on what exactly is -p0777 and how did this substitution work 's/
    >> define service\{[^}]*PING[^}]*\}\s+//g'.  i have never seen/read such
    >> regex.

    >
    >i just found
    >-0777
    > the separator between records is 777 in octal; this is not a real
    >ASCII char so the whole file is slurped in as a single record;
    >
    >now my confusion is the regex match.
    >it goes like, search for
    >define service followed by a { then any characters but not } then PING
    >then any characters but not } then atleast one space and replace with
    >nothing. i am just wondering what exactly is this [^}]* doing. i
    >tried it with .* like
    >
    >define service\{.*PING.*\}\s+//g
    >but it would not replace.
    >
    >my understanding is that it should work because [^}]* (any character
    >but not }) is same as .* in this case since I know there is no }
    >before PING string.
    >
    >what am i missing?


    If you have never read such a regex, you don't know regex. This is very simple.
    You should visit this group/site more often.

    Assuming a slurped in file and your test: s/define service\{.*PING.*\}\s+//g,
    as Holzer said .* will greedily grab all the chars up until the last anchor 'PING.*\}\s+',
    that is all except '\n' newline because you don't have /s modifier, and won't match anything.
    Try 's/define service\{.*PING.*\}\s+//sg'.

    Also, using greedy quantifiers with '.' is a tricky prospect. They have thier place
    though. Most beginners just throw '.*' in the middle of thier regex, when in reality,
    they should only be put in when the regex can already be described without them,
    if at all.

    The reason is that there is no guarantee of the shape of text when it is written to
    a file, none! For this reason, regexs' should be molded with at least a certain level
    of built in error checking (qualification). And while not %100, 90-95 will do as a
    minimal QA check.

    Thus, Tad used the '[^}]*' character class to describe all characters, but one.
    Specifically NOT '}' which would signify the end of a block. Which leads to the next
    problem:

    How do you know the syntax of what the known parser uses to extract information
    from that file? Even if the form of the writer is simple, even custom, there may be
    anomolies introduced from the file system, even if the writer changes form, then what?
    Surely you would want a little robustness of QA built into the regex.

    Tad gave you what you wanted from your simple problem statement. Indeed it was stated
    in simple terms, that would not be acceptable in a production environment.

    A lot of times (most of them) here on this group/site, that is the case.
    It just amazes me sometimes that people come back with, 'but it doesen't work if I
    have this condition', that was never stated.

    Tads regex could have been written (untested) like this:

    /define\s+service\s*\{[^}]*service_description\s+PING[^}]*\}\s*//g

    and still work, that maybe give some variability the way normal parsers work.
    But you didn't state information on where it came from or how it is parsed.
    Whether 'use' or 'service_description' any other other var type is there,
    what order, required, etc...

    No, you stated PING, the only constant, is in this form:
    'define service{PING}'

    Not alot to go on, but don't expect this to be a real parser unless you understand
    the RULES.

    Good luck.

    -sln
    , Jul 13, 2009
    #5
  6. On 2009-07-13, Peter J. Holzer <> wrote:
    *SKIP*
    *skipping alfonsobaldaserra since he skipped Tad anyway*

    > s/define service\{.*PING.*\}\s+//sg
    >
    > OTOH would match anything from the first "define service{" to the last
    > "}" in the file (provided there's a PING somewhere between them) so it
    > would probably remove a lot more than you want. The /[^}]*/ in Tad's
    > regex is there to keep the match within a single brace-delimited block
    > (and it's a bit simple-minded: It won't work if you have a } inside a
    > comment, for example, but you probably don't, so that doesn't matter).


    Then stricter

    qr/\}\n+/

    and stricter

    qr/\}(?:\h*\n)+/ # needs 5.10

    and stricter

    qr/\}\h*\n(?:\h*\n)*/

    What leads as to

    perdoc -q nesting

    and applieing regexes at HTML.


    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Jul 14, 2009
    #6
  7. > Not alot to go on, but don't expect this to be a real parser unless you understand
    > the RULES.


    that was an excellent explanation. thank you very much guys, i have
    understood it now.
    alfonsobaldaserra, Jul 15, 2009
    #7
  8. alfonsobaldaserra

    Guest

    On Tue, 14 Jul 2009 16:04:44 +0300, Eric Pozharski <> wrote:

    >On 2009-07-13, Peter J. Holzer <> wrote:
    >*SKIP*
    >*skipping alfonsobaldaserra since he skipped Tad anyway*
    >
    >> s/define service\{.*PING.*\}\s+//sg
    >>
    >> OTOH would match anything from the first "define service{" to the last
    >> "}" in the file (provided there's a PING somewhere between them) so it
    >> would probably remove a lot more than you want. The /[^}]*/ in Tad's
    >> regex is there to keep the match within a single brace-delimited block
    >> (and it's a bit simple-minded: It won't work if you have a } inside a
    >> comment, for example, but you probably don't, so that doesn't matter).

    >
    >Then stricter
    >
    > qr/\}\n+/
    >
    >and stricter
    >
    > qr/\}(?:\h*\n)+/ # needs 5.10

    ^^^^
    510 is great, a lot of new stuff in the engine.
    New nesting, etc.

    When you can write a regex without the need for the
    # needs 5.10
    maybe it might be usefull.

    Btw, I don't think anybody skipped Tad, who never skips
    anybody.

    -sln
    , Jul 15, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rlrcstr

    Collapsable paragraphs...

    Rlrcstr, May 16, 2005, in forum: ASP .Net
    Replies:
    7
    Views:
    676
    Rlrcstr
    May 17, 2005
  2. jersie0
    Replies:
    0
    Views:
    677
    jersie0
    Nov 23, 2003
  3. Jeff
    Replies:
    3
    Views:
    9,110
    kleinbaas
    Apr 13, 2010
  4. City Dweller

    Default leading for paragraphs

    City Dweller, Apr 6, 2006, in forum: HTML
    Replies:
    13
    Views:
    1,089
    Nije Nego
    Apr 8, 2006
  5. HTML Paragraphs from Text

    , Dec 24, 2005, in forum: Perl Misc
    Replies:
    5
    Views:
    83
    Anno Siegel
    Dec 24, 2005
Loading...

Share This Page