convert html

G

Guest

Hi:

I want to convert html to xml.

I am doing this:

from xml.dom.ext.reader import HtmlLib
from xml.dom import ext, Node
from xml.dom.NodeFilter import NodeFilter

def main( argv ):
# build a DOM tree from the html
reader = HtmlLib.Reader()
dom_object = reader.fromUri( sys.argv[1] )

info = getTableInfo( dom_object, 9 )

reader.releaseNode( dom_object );

if __name__ == "__main__":
main( sys.argv )

This takes almost a minute on a 6000 line html file on a PIII 700 Mhz 256 RAM. This is too slow.

Can you suggest another way of doing this in Python?
 
R

Richard Brodie

I want to convert html to xml.

I am doing this: ....
Can you suggest another way of doing this in Python?

I haven't benchmarked but I would imagine using HTML Tidy
(or µTidylib) is as good as any, particularly if your HTML source
is a bit rough.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top