Remove HTML tags (except anchor tag) from a string using regularexpressions

N

Nico Grubert

Hello,

I want to remove all html tags from a string "content" except <a
....>xxx</a>.

My script reads like this:

###
import re
content = re.sub('<([^!>]([^>]|\n)*)>', '', content)
###

It works fine. It removes all html tags from "content".
Unfortunately, this also removes <a ...>xxx</a> occurancies.
Any idea, how to modify this to remove all html tags except <a ...>xxx</a>?

Thanks in advance,
Nico
 
A

Anand

How about...

import re
content = re.sub('<([^!(a>)]([^(/a>)]|\n)*)>', '', content)
Seems to work for me.

HTH

-Anand
 
A

Anand

I meant
content = re.sub ('<[^!(a>)]([^>]|\n)*[^!(/a)]>', '', content)

Sorry for the mistake.
However this seems to also print tags like <b>, <p> etc
also.

-Anand
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top