SGMLlib module

Discussion in 'Python' started by Harlin Seritt, May 8, 2005.

  1. I am trying to use SGMLlib module to extract all links from some data I
    pulled from the web (via urllib). I have looked at the documentation
    online and can not make sense of it. As a quick example, how would I
    get the hyperlinks for an html file?

    thanks,

    Harlin
    Harlin Seritt, May 8, 2005
    #1
    1. Advertising

  2. Harlin Seritt

    Peter Hansen Guest

    Harlin Seritt wrote:
    > I am trying to use SGMLlib module to extract all links from some data I
    > pulled from the web (via urllib). I have looked at the documentation
    > online and can not make sense of it. As a quick example, how would I
    > get the hyperlinks for an html file?


    I know you're not someone to ignore Google, but this looked like a
    question that could pretty easily be answered using a quick search of
    the comp.lang.python archives via Google Groups -- and it appears I was
    right.

    I tried
    http://groups.google.ca/groups?q=sgmllib extract links group:comp.lang.python.*
    and found this page, which I believe should answer your question
    (perhaps not directly, but it looks basically like an sgmllib tutorial):
    http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html

    I'm pretty sure you can find a dozen threads with snippets showing just
    what you asked if you look at the result of the results.

    -Peter
    Peter Hansen, May 8, 2005
    #2
    1. Advertising

  3. Thanks for the help, I just didn't like the way that SGMLlib forces one
    to instantiate a class to do this (or httplib for that matter). I
    looked at those links you graciously sent (thanks!) but didn't like
    them. At any rate, I went ahead and wrote my own. Thank goodness that
    it's easy to parse with Python on your own!

    Thanks for the help,

    Harlin Seritt
    Harlin Seritt, May 8, 2005
    #3
  4. Harlin Seritt

    John J. Lee Guest

    Peter Hansen <> writes:

    > Harlin Seritt wrote:
    > > I am trying to use SGMLlib module to extract all links from some data I
    > > pulled from the web (via urllib). I have looked at the documentation
    > > online and can not make sense of it. As a quick example, how would I
    > > get the hyperlinks for an html file?

    >
    > I know you're not someone to ignore Google, but this looked like a
    > question that could pretty easily be answered using a quick search of
    > the comp.lang.python archives via Google Groups -- and it appears I
    > was right.

    [...]

    Also, htmllib extends sgmllib to make this trivial, IIRC, so you
    (Harlin) could look at the htmllib source.


    John
    John J. Lee, May 8, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. C. Titus Brown

    sgmllib problem & proposed fix.

    C. Titus Brown, Dec 17, 2004, in forum: Python
    Replies:
    1
    Views:
    359
    C. Titus Brown
    Dec 17, 2004
  2. Sakcee
    Replies:
    1
    Views:
    308
  3. Richard Hsu
    Replies:
    2
    Views:
    286
    Richard Hsu
    Apr 12, 2006
  4. Michael Butscher

    Py 2.5: Bug in sgmllib

    Michael Butscher, Oct 22, 2006, in forum: Python
    Replies:
    2
    Views:
    308
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Oct 22, 2006
  5. John Nagle
    Replies:
    2
    Views:
    355
    John Nagle
    Feb 7, 2007
Loading...

Share This Page