Replacing html tags

jumblesale · Oct 4, 2006

Hello all,
I'm not all that bad at Regex, but i'm stumped on how to approach my
problem.

I need to parse a string and remove all html tags except hyperlinks.

I can remove all the html tags using: Regex.Replace(inputText,
@"<(/?[^\>]+)>", "");
But this also removes any hyperlinks, which i need to keep.

I've also written a regex for finding hyperlinks:
<a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
but my problem is trying to put all this together.

I've thought of using Regex.Matches and checking each instance but
can't get that to work.

Any ideas and/ or code would be great - i'm used to C# but VB's cool as
well.

Cheers in advance,
max

Chris Fulstow · Oct 4, 2006

You could do this with the HTML Agility Pack:
http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

I think it comes with an example that strips HTML tags, which you could
probably adapt quite quickly to keep <a> tags.

jumblesale · Oct 4, 2006

wow, that's a great pack but surely there's a simpler way of doing it
with regex? seems like a huge amount of files to import just to check a
string

Cheers for your quick response,
max

Chris said:
You could do this with the HTML Agility Pack:
http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

I think it comes with an example that strips HTML tags, which you could
probably adapt quite quickly to keep <a> tags.

Hello all,
I'm not all that bad at Regex, but i'm stumped on how to approach my
problem.

I need to parse a string and remove all html tags except hyperlinks.

I can remove all the html tags using: Regex.Replace(inputText,
@"<(/?[^\>]+)>", "");
But this also removes any hyperlinks, which i need to keep.

I've also written a regex for finding hyperlinks:
<a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
but my problem is trying to put all this together.

I've thought of using Regex.Matches and checking each instance but
can't get that to work.

Any ideas and/ or code would be great - i'm used to C# but VB's cool as
well.

Cheers in advance,
max

Click to expand...

Mark Fitzpatrick · Oct 4, 2006

Woohoo! This is a great control library. Glad you posted it here as it saved
me from writing a lot of code using the WebBrowser control to do some
similar HTML manipulation.

--
Thanks again,
Mark Fitzpatrick
Former Microsoft FrontPage MVP 199?-2006

Chris Fulstow said:
You could do this with the HTML Agility Pack:
http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

I think it comes with an example that strips HTML tags, which you could
probably adapt quite quickly to keep <a> tags.

Hello all,
I'm not all that bad at Regex, but i'm stumped on how to approach my
problem.

I need to parse a string and remove all html tags except hyperlinks.

I can remove all the html tags using: Regex.Replace(inputText,
@"<(/?[^\>]+)>", "");
But this also removes any hyperlinks, which i need to keep.

I've also written a regex for finding hyperlinks:
<a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
but my problem is trying to put all this together.

I've thought of using Regex.Matches and checking each instance but
can't get that to work.

Any ideas and/ or code would be great - i'm used to C# but VB's cool as
well.

Cheers in advance,
max

Click to expand...

Stuck with html and css	25	Dec 14, 2022
HTML Site Problems	11	Nov 25, 2019
How to have two html audio players on one page?	0	May 3, 2022
Background image not showing up on html page	3	Sep 23, 2023
Remove all HTML but keep <p> tags	4	Feb 10, 2012
Need assistance finetuning HTML, CSS, Javascript - sticky header issue	3	Feb 25, 2022
Regex, replacing THIS\|THAT	2	Dec 17, 2011
sanitizing html tags (content)	2	Oct 22, 2009

Replacing html tags

jumblesale

Chris Fulstow

jumblesale

Mark Fitzpatrick

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads