Manipulate HTML documents via data structure

C

C. Barnes

Python provides HTML parsing through the
HTMLParser and htmllib modules.

For my application, I needed to search through
an HTML document in a nonlinear fashion and
dynamically change parts of the document. The
most logical way to do this is to translate HTML
back and forth to a data structure.

I wrote a module called htmldata, available from:

http://oregonstate.edu/~barnesc/htmldata/

Example:
[('img', {'src':'hi.gif', 'alt':'blah'}), 'foo',
('/body', {})]'<img alt="blah" src="hi.gif">foo</body>'

Pros:
* More powerful for HTML editing.
* Easy to reproduce the original document (at least,
a document that is HTML-equiv to the original).

Cons:
* Less user friendly than HTMLParser module.

I tested it on several popular sites. Feedback, bug
reports, etc appreciated.

- Connelly Barnes





__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top