Retrievel Hyperlinks for a web page in code

E

Enigma Boy

Hi folks,

I am retrieving a website for a site using httpWebRequest. What I want to
do with the retrieved webpage is list all the hyperlinks in the page. If I
do a simple regex search for <a then I get links that are commented out in
code and I don't want that. I want links that are actually active. This is
to do with reciprocal link check.

Can someone please point me in the right direction.

Thanks.

--
<a href="http://1pakistangifts.com">Send Gifts to Pakisan at #Pakistan Gifts
Store</a> | <a href="http://dotspecialists.com">Leading Software offshoring
and outsourcing service provider</a> | <a
href="http://websitedesignersrus.com">Professional Websites at affordable
prices</a>
 
G

Guest

Hi folks,

I am retrieving a website for a site using httpWebRequest. What I want to
do with the retrieved webpage is list all the hyperlinks in the page. If I
do a simple regex search for <a then I get links that are commented out in
code and I don't want that. I want links that are actually active. This is
to do with reciprocal link check.

Hi, I think you can try to clean the text before you get the links.
For example:

html_code = Regex.Replace(html_code, "<!--((.|\n)*?)-->", "");

This will replace all commented code by an empty string and then you
can get the links.
 
J

Jesse Houwing

Hello Enigma,
Hi folks,

I am retrieving a website for a site using httpWebRequest. What I
want to do with the retrieved webpage is list all the hyperlinks in
the page. If I do a simple regex search for <a then I get links that
are commented out in code and I don't want that. I want links that
are actually active. This is to do with reciprocal link check.

Can someone please point me in the right direction.

Thanks.

Have a look at the HTML Agility pack. It allows you to treat the HTML as
it were XML.

http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top