DOM with HTML

Alessio Pace · Jul 1, 2003

Hi, I need to get a sort of DOM from an HTML page that is declared as XHTML
but unfortunately is *not* xhtml valid.. If I try to parse it with
xml.dom.minidom I get error with expat (as I supposed), so I was told to
try in this way, with a "forgiving" html parser:

from xml.dom.ext.reader import HtmlLib
reader = HtmlLib.Reader()
dom = reader.fromUri(url) # 'url' the web page

FIRST ISSUE:
It seemed to me, reading the source code in
$MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
that these are 4DOM APIs , so from what I know of python distributions, they
are extra packages, or not? I would like to use *only* libs that are
available in the python2.2 suite, not any extra.

SECOND ISSUE:
If the above libs were included in python (and so I would continue using
them), how do I print a string representation of a (sub) tree of the DOM? I
tried with .toxml() as in the XML tutorial but that method does not exist
for the FtNode objects that are involved there... Any idea??

Thanks so much for who can help me

F. GEIGER · Jul 1, 2003

Hi, I need to get a sort of DOM from an HTML page that is declared as
XHTML

but unfortunately is *not* xhtml valid.. If I try to parse it with

I use mx.Tidy in such cases, with great success.

Cheers
Franz

How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
DOM implementation	7	May 13, 2009
Getting extra blank rows from appending HTML..?	2	Oct 24, 2023
xml : remove a node with dom	3	Oct 28, 2010
html DOM	4	Mar 29, 2008
Problem with xml.dom parser and xmlns attribute	4	Apr 22, 2004
Python client/server that reads HTML body from server	1	Apr 12, 2023
HTML Aligning social media icons	2	Dec 6, 2020

DOM with HTML

Alessio Pace

F. GEIGER

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads