JJ said:
I am not very confident with regular expressions, can anyone suggest a
good guide or some expressions that would remove all tags except
<p>,<br>,<ul>,<li>,<b>,<em>,<i>,<strong> and remove all remaining
attriubtes from the existing tags?
I'll bite...
#!/usr/bin/perl
$_ = <<TEST;
<p class="flibble">
This is a test. The paragraph should remain,
but the class should go, as should this
<u>underline</u>. <little>This should go too,
and not be reduced to an LI element.</little>
</p>
TEST
s/<\/(p|br|ul|li|b|em|i|strong)>/<\/\1>/ig;
s/<(p\b|br\b|ul\b|li\b|b\b|em\b|i\b|strong\b)\s*[^>]*>/<\1>/ig;
s/<\/(?!(p|br|ul|li|b|em|i|strong))[^>]*>/<\/span>/ig;
s/<(?!(\/|p\b|br\b|ul\b|li\b|b\b|em\b|i\b|strong\b))[^>]*>/<span>/ig;
print;
exit;