Replacing html tags

J

jumblesale

Hello all,
I'm not all that bad at Regex, but i'm stumped on how to approach my
problem.

I need to parse a string and remove all html tags except hyperlinks.

I can remove all the html tags using: Regex.Replace(inputText,
@"<(/?[^\>]+)>", "");
But this also removes any hyperlinks, which i need to keep.

I've also written a regex for finding hyperlinks:
<a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
but my problem is trying to put all this together.

I've thought of using Regex.Matches and checking each instance but
can't get that to work.

Any ideas and/ or code would be great - i'm used to C# but VB's cool as
well.

Cheers in advance,
max
 
J

jumblesale

wow, that's a great pack but surely there's a simpler way of doing it
with regex? seems like a huge amount of files to import just to check a
string

Cheers for your quick response,
max

Chris said:
You could do this with the HTML Agility Pack:
http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

I think it comes with an example that strips HTML tags, which you could
probably adapt quite quickly to keep <a> tags.
Hello all,
I'm not all that bad at Regex, but i'm stumped on how to approach my
problem.

I need to parse a string and remove all html tags except hyperlinks.

I can remove all the html tags using: Regex.Replace(inputText,
@"<(/?[^\>]+)>", "");
But this also removes any hyperlinks, which i need to keep.

I've also written a regex for finding hyperlinks:
<a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
but my problem is trying to put all this together.

I've thought of using Regex.Matches and checking each instance but
can't get that to work.

Any ideas and/ or code would be great - i'm used to C# but VB's cool as
well.

Cheers in advance,
max
 
M

Mark Fitzpatrick

Woohoo! This is a great control library. Glad you posted it here as it saved
me from writing a lot of code using the WebBrowser control to do some
similar HTML manipulation.


--
Thanks again,
Mark Fitzpatrick
Former Microsoft FrontPage MVP 199?-2006


Chris Fulstow said:
You could do this with the HTML Agility Pack:
http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

I think it comes with an example that strips HTML tags, which you could
probably adapt quite quickly to keep <a> tags.
Hello all,
I'm not all that bad at Regex, but i'm stumped on how to approach my
problem.

I need to parse a string and remove all html tags except hyperlinks.

I can remove all the html tags using: Regex.Replace(inputText,
@"<(/?[^\>]+)>", "");
But this also removes any hyperlinks, which i need to keep.

I've also written a regex for finding hyperlinks:
<a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
but my problem is trying to put all this together.

I've thought of using Regex.Matches and checking each instance but
can't get that to work.

Any ideas and/ or code would be great - i'm used to C# but VB's cool as
well.

Cheers in advance,
max
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top