regular expression pb. with tags

Discussion in 'Perl Misc' started by steeve_dun@SoftHome.net, Sep 26, 2006.

  1. Guest

    Hi,
    I want to make some pattern replacement. ie to delete every thing
    that's between 2 tags.
    For example for

    1<tag> 2</tag>3
    x<tag> a<tag> b </tag> c</tag>z

    I want to get

    1 3
    x z

    But I have a problem with embeded tags.
    I've tried :
    $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
    but it doens't work for embeded tags. It gives:
    13
    x c</tag>z

    Is there a way to deal with this?

    Thank you

    -steeve
     
    , Sep 26, 2006
    #1
    1. Advertising

  2. David Squire Guest

    wrote:
    > Hi,
    > I want to make some pattern replacement. ie to delete every thing
    > that's between 2 tags.
    > For example for
    >
    > 1<tag> 2</tag>3
    > x<tag> a<tag> b </tag> c</tag>z
    >
    > I want to get
    >
    > 1 3
    > x z
    >
    > But I have a problem with embeded tags.
    > I've tried :
    > $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
    > but it doens't work for embeded tags. It gives:
    > 13
    > x c</tag>z
    >
    > Is there a way to deal with this?


    Yep. Don't try to use regular expressions to parse XML. Use a module
    that understands XML. Go to CPAN and you will find many.


    DS
     
    David Squire, Sep 26, 2006
    #2
    1. Advertising

  3. -berlin.de Guest

    <> wrote in comp.lang.perl.misc:
    > Hi,
    > I want to make some pattern replacement. ie to delete every thing
    > that's between 2 tags.
    > For example for
    >
    > 1<tag> 2</tag>3
    > x<tag> a<tag> b </tag> c</tag>z
    >
    > I want to get
    >
    > 1 3
    > x z
    >
    > But I have a problem with embeded tags.
    > I've tried :
    > $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
    > but it doens't work for embeded tags. It gives:
    > 13
    > x c</tag>z
    >
    > Is there a way to deal with this?


    Not using regular expressions directly. Use one of the HTML-parsing
    modules from CPAN.

    Anno
     
    -berlin.de, Sep 26, 2006
    #3
  4. Xicheng Jia Guest

    wrote:
    > Hi,
    > I want to make some pattern replacement. ie to delete every thing
    > that's between 2 tags.
    > For example for
    >
    > 1<tag> 2</tag>3
    > x<tag> a<tag> b </tag> c</tag>z
    >
    > I want to get
    >
    > 1 3
    > x z
    >
    > But I have a problem with embeded tags.
    > I've tried :
    > $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
    > but it doens't work for embeded tags. It gives:
    > 13
    > x c</tag>z
    >
    > Is there a way to deal with this?


    Since you are using Perl, and XML is quite well formated, you may try
    something like:

    my $ptn;
    $ptn = qr(<tag>(?:(??{$ptn})|.)*?</tag>)s;
    $line =~ s/$ptn//g;

    I am not encouraging you using regexes at work. But in case of some
    small programs, using regexes might be much faster/easier if you know
    what you do.

    Regards,
    Xicheng
     
    Xicheng Jia, Sep 26, 2006
    #4
  5. Ted Zlatanov Guest

    On 26 Sep 2006, wrote:

    > I want to make some pattern replacement. ie to delete every thing
    > that's between 2 tags.
    > For example for
    >
    > 1<tag> 2</tag>3
    > x<tag> a<tag> b </tag> c</tag>z
    >
    > I want to get
    >
    > 1 3
    > x z
    >
    > But I have a problem with embeded tags.
    > I've tried :
    > $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
    > but it doens't work for embeded tags. It gives:
    > 13
    > x c</tag>z
    >
    > Is there a way to deal with this?


    For the first example, you're getting exactly what you wanted ("13").
    Look at your input data.

    For the second example, your requirements are not good. You don't say
    whether you want to replace the outermost tags (in which case a regex
    would work) or you want to balance tags. For outermost tag
    replacement, use

    $text =~ s/\<tag\>(.*)\<\/tag\>//sg;

    but note that this will also replace "<tag>a</tag> extra <tag>b</tag>"
    with "" and not " extra " as you may expect.

    My guess is that you do want to balance tags, and you can use
    Text::Balanced for that (especially if your text is not valid XML or
    even SGML). If you are doing SGML/HTML/XML/etc. tagged formats then
    you should search CPAN for the appropriate parser, as others have
    suggested. Look at "perldoc -q html" as well.

    Ted
     
    Ted Zlatanov, Sep 26, 2006
    #5
  6. Guest

    Thank you all
    -steve
     
    , Sep 27, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shannon Jacobs
    Replies:
    8
    Views:
    682
    John W. Kennedy
    Jan 24, 2004
  2. VSK
    Replies:
    2
    Views:
    2,307
  3. Shannon Jacobs
    Replies:
    19
    Views:
    195
    John W. Kennedy
    Jan 24, 2004
  4. Shannon Jacobs
    Replies:
    18
    Views:
    154
    Uri Guttman
    Jan 23, 2004
  5. Marc Bogaard
    Replies:
    12
    Views:
    362
    Bart Lateur
    Oct 21, 2004
Loading...

Share This Page