Retrievel Hyperlinks for a web page in code

Discussion in 'ASP .Net' started by Enigma Boy, Aug 14, 2007.

  1. Enigma Boy

    Enigma Boy Guest

    Hi folks,

    I am retrieving a website for a site using httpWebRequest. What I want to
    do with the retrieved webpage is list all the hyperlinks in the page. If I
    do a simple regex search for <a then I get links that are commented out in
    code and I don't want that. I want links that are actually active. This is
    to do with reciprocal link check.

    Can someone please point me in the right direction.

    Thanks.

    --
    <a href="http://1pakistangifts.com">Send Gifts to Pakisan at #Pakistan Gifts
    Store</a> | <a href="http://dotspecialists.com">Leading Software offshoring
    and outsourcing service provider</a> | <a
    href="http://websitedesignersrus.com">Professional Websites at affordable
    prices</a>
     
    Enigma Boy, Aug 14, 2007
    #1
    1. Advertising

  2. On Aug 14, 8:01 am, "Enigma Boy" <> wrote:
    > Hi folks,
    >
    > I am retrieving a website for a site using httpWebRequest. What I want to
    > do with the retrieved webpage is list all the hyperlinks in the page. If I
    > do a simple regex search for <a then I get links that are commented out in
    > code and I don't want that. I want links that are actually active. This is
    > to do with reciprocal link check.


    Hi, I think you can try to clean the text before you get the links.
    For example:

    html_code = Regex.Replace(html_code, "<!--((.|\n)*?)-->", "");

    This will replace all commented code by an empty string and then you
    can get the links.
     
    Alexey Smirnov, Aug 14, 2007
    #2
    1. Advertising

  3. Hello Enigma,

    > Hi folks,
    >
    > I am retrieving a website for a site using httpWebRequest. What I
    > want to do with the retrieved webpage is list all the hyperlinks in
    > the page. If I do a simple regex search for <a then I get links that
    > are commented out in code and I don't want that. I want links that
    > are actually active. This is to do with reciprocal link check.
    >
    > Can someone please point me in the right direction.
    >
    > Thanks.


    Have a look at the HTML Agility pack. It allows you to treat the HTML as
    it were XML.

    http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

    --
    Jesse Houwing
    jesse.houwing at sogeti.n
     
    Jesse Houwing, Aug 14, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    433
    nfedin
    Mar 4, 2004
  2. Gary Larimer

    changing hyperlinks in code

    Gary Larimer, Aug 14, 2008, in forum: ASP .Net
    Replies:
    0
    Views:
    306
    Gary Larimer
    Aug 14, 2008
  3. JPElectron

    redisplay a page without any of the hyperlinks

    JPElectron, May 2, 2004, in forum: ASP General
    Replies:
    4
    Views:
    159
    JPElectron
    May 2, 2004
  4. Replies:
    3
    Views:
    169
    Mike Brind
    Aug 27, 2006
  5. -bonn.de

    data retrievel via perl

    -bonn.de, Dec 7, 2005, in forum: Perl Misc
    Replies:
    2
    Views:
    95
    -bonn.de
    Dec 8, 2005
Loading...

Share This Page