RegEx replace "bracketed text"

Discussion in 'Perl Misc' started by hrrglburf@hotmail.com, Apr 16, 2006.

  1. Guest

    I have information that needs to strip out all tags that start with a
    '{' and end with a '}' including whatever may be in between them, but
    not outside of them... I tried making my own reg. exp. but i suck at
    it. can anyone give me an example?
     
    , Apr 16, 2006
    #1
    1. Advertising

  2. news reader Guest

    Something like

    s/\{.*?\}//g;


    An example may be useful to avoid misunderstandings.

    The siutation complicates a little if '\}' may be part of a tag.

    Current exanple would read in:
    dasdfsafdsadsa{dsadas}d dasdasd{dasdas} fsddfsf{dsadas}


    and spit out
    dasdfsafdsadsad dasdasd fsddfsf




    bye


    N.



    ($ wrote:
    > I have information that needs to strip out all tags that start with a
    > '{' and end with a '}' including whatever may be in between them, but
    > not outside of them... I tried making my own reg. exp. but i suck at
    > it. can anyone give me an example?
    >
     
    news reader, Apr 16, 2006
    #2
    1. Advertising

  3. news reader Guest

    Something like

    s/\{.*?\}//g;


    An example may be useful to avoid misunderstandings.

    The siutation complicates a little if '\}' may be part of a tag.

    Current exanple would read in:
    dasdfsafdsadsa{dsadas}d dasdasd{dasdas} fsddfsf{dsadas}


    and spit out
    dasdfsafdsadsad dasdasd fsddfsf




    bye


    N.



    ($ wrote:
    > I have information that needs to strip out all tags that start with a
    > '{' and end with a '}' including whatever may be in between them, but
    > not outside of them... I tried making my own reg. exp. but i suck at
    > it. can anyone give me an example?
    >
     
    news reader, Apr 16, 2006
    #3
  4. Dale Henderson wrote:
    >>>>>>"MD" == Marc Dashevsky <> writes:

    > MD> Assuming that tags don't nest and that matching '{' and '}'
    > MD> are not separated by line-ends, the following works:
    >
    > MD> s/\{.*?\}//g
    >
    > A more efficient solution is:
    > s/\{[^}]*\}//g
    >
    > with the \s modifier this will work across line-ends.


    The /s modifier isn't needed for that.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Apr 16, 2006
    #4
  5. Marc Dashevsky wrote:
    > Dale Henderson <> writes in article %:
    >> >>>>> "MD" == Marc Dashevsky <> writes:

    >> MD> Assuming that tags don't nest and that matching '{' and '}'
    >> MD> are not separated by line-ends, the following works:
    >>
    >> MD> s/\{.*?\}//g
    >>
    >> A more efficient solution is:
    >> s/\{[^}]*\}//g

    >
    > Thanks. Would you explain the reasons for the increased efficiency?
    > I don't know how to even start the analysis.


    Theoretically both should be about the same speed since both require
    only a linear scan for a single character without backtracking. A simple
    benchmark shows that the first expression is slightly faster on my
    system:


    #!/usr/bin/perl
    use strict;
    use warnings;

    use Benchmark ':all';

    my $s = "aaaaaa{bbbbbbbbbbbb}cccccccccc{ddddddddd}eeeeeee";

    cmpthese(100000,
    {
    nongreedy => sub {
    local $_ = $s;
    s/\{.*?\}//g;
    },
    class => sub {
    local $_ = $s;
    s/\{[^}]*\}//g
    },
    }
    );
    __END__
    Rate class nongreedy
    class 90909/s -- -13%
    nongreedy 104167/s 15% --


    --
    _ | Peter J. Holzer | Löschung von at.usenet.schmankerl?
    |_|_) | Sysadmin WSR/LUGA |
    | | | | Diskussion derzeit in at.usenet.gruppen
    __/ | http://www.hjp.at/ |
     
    Peter J. Holzer, Apr 16, 2006
    #5
  6. Dr.Ruud Guest

    Dale Henderson schreef:
    > Marc Dashevsky:
    >> <unattributed>


    >> s/\{.*?\}//g
    >>> A more efficient solution is: s/\{[^}]*\}//g

    >
    >> Thanks. Would you explain the reasons for the increased
    >> efficiency? I don't know how to even start the analysis.

    >
    > It's spelled out in the Owl book :)


    That could be old news.


    > As I understand it, the non-greedy operator gives up too easily.


    That could have been optimized already. The patterns /[^x]*x/ and /.*?x/
    have a lot in common.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Apr 17, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Blais
    Replies:
    1
    Views:
    400
    Bruno Desthuilliers
    Jun 27, 2006
  2. Greg Ewing
    Replies:
    2
    Views:
    367
    Dieter Maurer
    Jun 29, 2006
  3. Alun
    Replies:
    3
    Views:
    4,615
    Masudur
    Feb 18, 2008
  4. Replies:
    3
    Views:
    809
    Reedick, Andrew
    Jul 1, 2008
  5. Prasad S
    Replies:
    2
    Views:
    251
    Dr John Stockton
    Aug 27, 2004
Loading...

Share This Page