convert html

Discussion in 'Python' started by Guest, Jul 8, 2004.

  1. Guest

    Guest Guest

    Hi:

    I want to convert html to xml.

    I am doing this:

    from xml.dom.ext.reader import HtmlLib
    from xml.dom import ext, Node
    from xml.dom.NodeFilter import NodeFilter

    def main( argv ):
    # build a DOM tree from the html
    reader = HtmlLib.Reader()
    dom_object = reader.fromUri( sys.argv[1] )

    info = getTableInfo( dom_object, 9 )

    reader.releaseNode( dom_object );

    if __name__ == "__main__":
    main( sys.argv )

    This takes almost a minute on a 6000 line html file on a PIII 700 Mhz 256 RAM. This is too slow.

    Can you suggest another way of doing this in Python?
    Guest, Jul 8, 2004
    #1
    1. Advertising

  2. <> wrote in message news:...
    > I want to convert html to xml.
    >
    > I am doing this:

    ....
    > Can you suggest another way of doing this in Python?


    I haven't benchmarked but I would imagine using HTML Tidy
    (or ┬ÁTidylib) is as good as any, particularly if your HTML source
    is a bit rough.
    Richard Brodie, Jul 9, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andreas Klemt
    Replies:
    1
    Views:
    376
    Karl Seguin
    Jul 23, 2003
  2. Steven Cheng[MSFT]

    RE: Convert HTML to XML or Paser HTML

    Steven Cheng[MSFT], Jan 9, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    3,455
    George Ter-Saakov
    Feb 12, 2004
  3. Joerg Jooss

    Re: Convert HTML to XML or Paser HTML

    Joerg Jooss, Jan 11, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    548
    Joerg Jooss
    Jan 11, 2004
  4. Q.Z
    Replies:
    0
    Views:
    570
  5. csgraham74
    Replies:
    2
    Views:
    1,213
    csgraham74
    Sep 19, 2006
Loading...

Share This Page