DOM with HTML

A

Alessio Pace

Hi, I need to get a sort of DOM from an HTML page that is declared as XHTML
but unfortunately is *not* xhtml valid.. If I try to parse it with
xml.dom.minidom I get error with expat (as I supposed), so I was told to
try in this way, with a "forgiving" html parser:

from xml.dom.ext.reader import HtmlLib
reader = HtmlLib.Reader()
dom = reader.fromUri(url) # 'url' the web page

FIRST ISSUE:
It seemed to me, reading the source code in
$MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
that these are 4DOM APIs , so from what I know of python distributions, they
are extra packages, or not? I would like to use *only* libs that are
available in the python2.2 suite, not any extra.

SECOND ISSUE:
If the above libs were included in python (and so I would continue using
them), how do I print a string representation of a (sub) tree of the DOM? I
tried with .toxml() as in the XML tutorial but that method does not exist
for the FtNode objects that are involved there... Any idea??

Thanks so much for who can help me
 
F

F. GEIGER

Hi, I need to get a sort of DOM from an HTML page that is declared as
XHTML
but unfortunately is *not* xhtml valid.. If I try to parse it with

I use mx.Tidy in such cases, with great success.

Cheers
Franz
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top