pattern replacement in xml

Discussion in 'Perl Misc' started by tom, Jun 21, 2005.

  1. tom

    tom Guest

    Just picked up perl to do some emergency task. Hope some expert can
    help here.

    I'm using perl to cleanse an xml file so it can be parsed. One problem
    is to replace strings like this
    <font color=669966>:
    with:
    &lt;font color=669966&rt;

    The code is:
    $templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;

    The problem is anytime the color value changes, I need to do another
    replacement. Can there be a pattern to find this kind of strings. eg
    <font ....> and replace them with &lt;font ....&gt;

    Thanks for the help.
    tom, Jun 21, 2005
    #1
    1. Advertising

  2. "tom" <> wrote in news:1119392886.167881.82760
    @g47g2000cwa.googlegroups.com:

    > Just picked up perl to do some emergency task. Hope some expert can
    > help here.
    >
    > I'm using perl to cleanse an xml file so it can be parsed. One problem
    > is to replace strings like this
    > <font color=669966>:
    > with:
    > &lt;font color=669966&rt;
    >
    > The code is:
    > $templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;
    >
    > The problem is anytime the color value changes, I need to do another
    > replacement. Can there be a pattern to find this kind of strings. eg
    > <font ....> and replace them with &lt;font ....&gt;


    You probably should be using

    <URL:http://search.cpan.org/~gaas/HTML-Parser-3.45/lib/HTML/Entities.pm>

    along with an appropriate XML parser from CPAN.

    #!/usr/bin/perl

    use strict;
    use warnings;

    use HTML::Entities;

    print encode_entities(q{<font color=669966>})."\n";

    __END__



    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Jun 21, 2005
    #2
    1. Advertising

  3. tom

    Bob Walton Guest

    tom wrote:

    > Just picked up perl to do some emergency task. Hope some expert can
    > help here.
    >
    > I'm using perl to cleanse an xml file so it can be parsed. One problem
    > is to replace strings like this
    > <font color=669966>:
    > with:
    > &lt;font color=669966&rt;
    >
    > The code is:
    > $templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;
    >
    > The problem is anytime the color value changes, I need to do another
    > replacement. Can there be a pattern to find this kind of strings. eg
    > <font ....> and replace them with &lt;font ....&gt;


    Sure. Try:

    $templine=~s/<(font.*?)>/&lt;$1&gt;/gi;

    ....
    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Jun 21, 2005
    #3
  4. tom

    tom Guest

    Thanks a lot. This works exactly as I wanted:
    tom, Jun 21, 2005
    #4
  5. tom

    John Bokma Guest

    "tom" <> wrote:

    > Just picked up perl to do some emergency task. Hope some expert can
    > help here.
    >
    > I'm using perl to cleanse an xml file so it can be parsed. One problem
    > is to replace strings like this
    > <font color=669966>:
    > with:
    > &lt;font color=669966&rt;

    ^^^
    should be &gt; Also, the > doesn't have to be escaped in XML afaik.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
    John Bokma, Jun 22, 2005
    #5
  6. John Bokma <> wrote in
    news:Xns967CB7085D416castleamber@130.133.1.4:

    > "tom" <> wrote:
    >
    >> Just picked up perl to do some emergency task. Hope some expert can
    >> help here.
    >>
    >> I'm using perl to cleanse an xml file so it can be parsed. One
    >> problem is to replace strings like this
    >> <font color=669966>:
    >> with:
    >> &lt;font color=669966&rt;

    > ^^^
    > should be &gt; Also, the > doesn't have to be escaped in XML afaik.


    This is somewhat off-topic but I think what the OP had in mind was
    something like:

    <custom-tag>
    <font color="white">Bad HTML</font>
    </custom-tag>

    where he does not want the text between <custom-tag>...</custom-tag> to
    be interpreted as XML.

    AFAIK, and that's not saying much, in that case, one needs to use:

    <custom-tag>
    <![CDATA[<font color="white">Bad HTML</font>]]\
    </custom-tag>

    rather than encoding the < and > inside <custom-tag>...</custom-tag>.

    I am drifting off-topic, so I will shut up now.

    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Jun 22, 2005
    #6
  7. tom

    John Bokma Guest

    "A. Sinan Unur" <> wrote:

    > John Bokma <> wrote in
    > news:Xns967CB7085D416castleamber@130.133.1.4:


    [...]

    >>> &lt;font color=669966&rt;

    >> ^^^
    >> should be &gt; Also, the > doesn't have to be escaped in XML afaik.

    >
    > This is somewhat off-topic but I think what the OP had in mind was
    > something like:
    >
    > <custom-tag>
    > <font color="white">Bad HTML</font>
    > </custom-tag>
    >
    > where he does not want the text between <custom-tag>...</custom-tag>
    > to be interpreted as XML.
    >
    > AFAIK, and that's not saying much, in that case, one needs to use:
    >
    > <custom-tag>
    > <![CDATA[<font color="white">Bad HTML</font>]]\
    > </custom-tag>
    >
    > rather than encoding the < and > inside <custom-tag>...</custom-tag>.


    Both work, yours is probably more neat, but also a lot of overhead. :)
    I personally would drop the font element entirely. Or if I have to, make
    it valid XML (color="#669966" would be sufficient + DTD update), and
    "ignore" it in the processing stage.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
    John Bokma, Jun 22, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jarkko Viinamäki
    Replies:
    1
    Views:
    4,177
    =?ISO-8859-1?Q?Daniel_Sj=F6blom?=
    Feb 22, 2004
  2. Replies:
    17
    Views:
    1,862
    Chris Uppal
    Nov 16, 2005
  3. Replies:
    9
    Views:
    5,323
    Lasse Reichstein Nielsen
    Mar 12, 2006
  4. tom

    replacement pattern

    tom, Jun 23, 2005, in forum: Perl Misc
    Replies:
    3
    Views:
    108
    Greg Bacon
    Jun 23, 2005
  5. Replies:
    11
    Views:
    189
    Tad McClellan
    Apr 16, 2006
Loading...

Share This Page