HTML Structure Extraction

dayzman · Dec 8, 2004

Hi,

I'm going to write a program that extracts the structure of HTML
documents. The structure would be in the form of a tree, separating the
tags and grouping the start and end tags. I think I will use
htmllib.HTMLParser, is it appropriate for my application? If so, I
believe I will need to keep track of the depth reached.

Any tips for such application will be much appreciated.

Cheers,
Michael

Fredrik Lundh · Dec 8, 2004

I'm going to write a program that extracts the structure of HTML
documents. The structure would be in the form of a tree, separating the
tags and grouping the start and end tags. I think I will use
htmllib.HTMLParser, is it appropriate for my application? If so, I
believe I will need to keep track of the depth reached.

you mean like:

http://www.crummy.com/software/BeautifulSoup/
http://effbot.org/zone/element-tidylib.htm
http://utidylib.berlios.de/
http://www.xmlsoft.org/
http://effbot.org/zone/pythondoc-elementtree-HTMLTreeBuilder.htm

and a few dozen others?

</F>

Write a JAVASCRIPT program that will parse the JSON structure once the PHP program is called using AJAX. You may show output using html	0	Jul 21, 2022
Simple web framework - improvements to makefile	0	Feb 1, 2023
Tree structure	3	Jul 26, 2011
python fast HTML data extraction library	4	Jul 22, 2009
HTML Anchor tag not working	2	Dec 15, 2020
Regular expression to structure HTML	11	Oct 2, 2009
Unstructured HTML extraction	4	Dec 7, 2004
need data structure to for test results analysis	1	Jul 6, 2013

HTML Structure Extraction

dayzman

Fredrik Lundh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads