Parsing links within a html file.

Discussion in 'Python' started by Shriphani, Jan 14, 2008.

  1. Shriphani

    Shriphani Guest

    Hello,
    I have a html file over here by the name guide_ind.html and it
    contains links to other html files like guides.html#outline . How do I
    point BeautifulSoup (I want to use this module) to
    guides.html#outline ?
    Thanks
    Shriphani P.
     
    Shriphani, Jan 14, 2008
    #1
    1. Advertising

  2. Shriphani

    Hai Vu Guest

    On Jan 14, 9:59 am, Shriphani <> wrote:
    > Hello,
    > I have a html file over here by the name guide_ind.html and it
    > contains links to other html files like guides.html#outline . How do I
    > point BeautifulSoup (I want to use this module) to
    > guides.html#outline ?
    > Thanks
    > Shriphani P.


    Try Mark Pilgrim's excellent example at:
    http://www.diveintopython.org/http_web_services/index.html

    From the above link, you can retrieve openanything.py which I use in
    my example:

    # list_url.py
    # created by Hai Vu on 1/16/2008

    from openanything import fetch
    from sgmllib import SGMLParser

    class RetrieveURLs(SGMLParser):
    def reset(self):
    SGMLParser.reset(self)
    self.urls = []

    def start_a(self, attributes):
    url = [v for k, v in attributes if k.lower() == 'href']
    self.urls.extend(url)
    print '\t%s' % (url)

    #
    --------------------------------------------------------------------------------------------------------------
    # main
    def main():
    site = 'http://www.google.com'

    result = fetch(site)
    if result['status'] == 200:
    # Extracts a list of URLs off the top page
    parser = RetrieveURLs()
    parser.feed(result['data'])
    parser.close()

    # Display the URLs we just retrieved
    print '\nURL retrieved from %s' % (site)
    print '\t' + '\n\t'.join(parser.urls)
    else:
    print 'Error (%d) retrieving %s' % (result['status'], site)

    if __name__ == '__main__':
    main()
     
    Hai Vu, Jan 17, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?Y2xpY2tvbg==?=

    Multiple links to same page within a site map file

    =?Utf-8?B?Y2xpY2tvbg==?=, Mar 30, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    1,195
    =?Utf-8?B?RFdT?=
    Mar 30, 2006
  2. fitwell
    Replies:
    2
    Views:
    629
    fitwell
    Nov 13, 2003
  3. mark4asp
    Replies:
    2
    Views:
    2,272
    Harry Haller
    Nov 7, 2006
  4. Dan Cuddeford

    Parsing HTML / following links etc

    Dan Cuddeford, Jan 23, 2008, in forum: Ruby
    Replies:
    10
    Views:
    182
    Jörg W Mittag
    Jan 26, 2008
  5. Ralf Koms
    Replies:
    4
    Views:
    165
    Ralf Koms
    Oct 12, 2004
Loading...

Share This Page