HTML Structure Extraction

Discussion in 'Python' started by dayzman@hotmail.com, Dec 8, 2004.

  1. Guest

    Hi,

    I'm going to write a program that extracts the structure of HTML
    documents. The structure would be in the form of a tree, separating the
    tags and grouping the start and end tags. I think I will use
    htmllib.HTMLParser, is it appropriate for my application? If so, I
    believe I will need to keep track of the depth reached.

    Any tips for such application will be much appreciated.

    Cheers,
    Michael
     
    , Dec 8, 2004
    #1
    1. Advertising

  2. <> wrote:

    > I'm going to write a program that extracts the structure of HTML
    > documents. The structure would be in the form of a tree, separating the
    > tags and grouping the start and end tags. I think I will use
    > htmllib.HTMLParser, is it appropriate for my application? If so, I
    > believe I will need to keep track of the depth reached.


    you mean like:

    http://www.crummy.com/software/BeautifulSoup/
    http://effbot.org/zone/element-tidylib.htm
    http://utidylib.berlios.de/
    http://www.xmlsoft.org/
    http://effbot.org/zone/pythondoc-elementtree-HTMLTreeBuilder.htm

    and a few dozen others?

    </F>
     
    Fredrik Lundh, Dec 8, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Himanshu Garg
    Replies:
    0
    Views:
    625
    Himanshu Garg
    Jan 27, 2004
  2. MaggieMagill

    HTML info extraction utility

    MaggieMagill, Mar 3, 2005, in forum: HTML
    Replies:
    5
    Views:
    367
    Andy Dingley
    Mar 4, 2005
  3. Replies:
    0
    Views:
    625
  4. Replies:
    4
    Views:
    446
    Nick Kew
    Dec 7, 2004
  5. Replies:
    0
    Views:
    350
Loading...

Share This Page