pattern replacement in xml

T

tom

Just picked up perl to do some emergency task. Hope some expert can
help here.

I'm using perl to cleanse an xml file so it can be parsed. One problem
is to replace strings like this
<font color=669966>:
with:
&lt;font color=669966&rt;

The code is:
$templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;

The problem is anytime the color value changes, I need to do another
replacement. Can there be a pattern to find this kind of strings. eg
<font ....> and replace them with &lt;font ....&gt;

Thanks for the help.
 
A

A. Sinan Unur

Just picked up perl to do some emergency task. Hope some expert can
help here.

I'm using perl to cleanse an xml file so it can be parsed. One problem
is to replace strings like this
<font color=669966>:
with:
&lt;font color=669966&rt;

The code is:
$templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;

The problem is anytime the color value changes, I need to do another
replacement. Can there be a pattern to find this kind of strings. eg
<font ....> and replace them with &lt;font ....&gt;

You probably should be using

<URL:http://search.cpan.org/~gaas/HTML-Parser-3.45/lib/HTML/Entities.pm>

along with an appropriate XML parser from CPAN.

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Entities;

print encode_entities(q{<font color=669966>})."\n";

__END__
 
B

Bob Walton

tom said:
Just picked up perl to do some emergency task. Hope some expert can
help here.

I'm using perl to cleanse an xml file so it can be parsed. One problem
is to replace strings like this
<font color=669966>:
with:
&lt;font color=669966&rt;

The code is:
$templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;

The problem is anytime the color value changes, I need to do another
replacement. Can there be a pattern to find this kind of strings. eg
<font ....> and replace them with &lt;font ....&gt;

Sure. Try:

$templine=~s/<(font.*?)>/&lt;$1&gt;/gi;

....
 
J

John Bokma

tom said:
Just picked up perl to do some emergency task. Hope some expert can
help here.

I'm using perl to cleanse an xml file so it can be parsed. One problem
is to replace strings like this
<font color=669966>:
with:
&lt;font color=669966&rt;
^^^
should be &gt; Also, the > doesn't have to be escaped in XML afaik.
 
A

A. Sinan Unur

^^^
should be &gt; Also, the > doesn't have to be escaped in XML afaik.

This is somewhat off-topic but I think what the OP had in mind was
something like:

<custom-tag>
<font color="white">Bad HTML</font>
</custom-tag>

where he does not want the text between <custom-tag>...</custom-tag> to
be interpreted as XML.

AFAIK, and that's not saying much, in that case, one needs to use:

<custom-tag>
<![CDATA[<font color="white">Bad HTML</font>]]\
</custom-tag>

rather than encoding the < and > inside <custom-tag>...</custom-tag>.

I am drifting off-topic, so I will shut up now.

Sinan
 
J

John Bokma

A. Sinan Unur said:
[...]
^^^
should be &gt; Also, the > doesn't have to be escaped in XML afaik.

This is somewhat off-topic but I think what the OP had in mind was
something like:

<custom-tag>
<font color="white">Bad HTML</font>
</custom-tag>

where he does not want the text between <custom-tag>...</custom-tag>
to be interpreted as XML.

AFAIK, and that's not saying much, in that case, one needs to use:

<custom-tag>
<![CDATA[<font color="white">Bad HTML</font>]]\
</custom-tag>

rather than encoding the < and > inside <custom-tag>...</custom-tag>.

Both work, yours is probably more neat, but also a lot of overhead. :)
I personally would drop the font element entirely. Or if I have to, make
it valid XML (color="#669966" would be sufficient + DTD update), and
"ignore" it in the processing stage.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top