DOM with HTML

Discussion in 'Python' started by Alessio Pace, Jul 1, 2003.

  1. Alessio Pace

    Alessio Pace Guest

    Hi, I need to get a sort of DOM from an HTML page that is declared as XHTML
    but unfortunately is *not* xhtml valid.. If I try to parse it with
    xml.dom.minidom I get error with expat (as I supposed), so I was told to
    try in this way, with a "forgiving" html parser:

    from xml.dom.ext.reader import HtmlLib
    reader = HtmlLib.Reader()
    dom = reader.fromUri(url) # 'url' the web page

    FIRST ISSUE:
    It seemed to me, reading the source code in
    $MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
    that these are 4DOM APIs , so from what I know of python distributions, they
    are extra packages, or not? I would like to use *only* libs that are
    available in the python2.2 suite, not any extra.

    SECOND ISSUE:
    If the above libs were included in python (and so I would continue using
    them), how do I print a string representation of a (sub) tree of the DOM? I
    tried with .toxml() as in the XML tutorial but that method does not exist
    for the FtNode objects that are involved there... Any idea??

    Thanks so much for who can help me

    --
    bye
    Alessio Pace
     
    Alessio Pace, Jul 1, 2003
    #1
    1. Advertising

  2. Alessio Pace

    F. GEIGER Guest

    > Hi, I need to get a sort of DOM from an HTML page that is declared as
    XHTML
    > but unfortunately is *not* xhtml valid.. If I try to parse it with


    I use mx.Tidy in such cases, with great success.

    Cheers
    Franz


    "Alessio Pace" <> schrieb im Newsbeitrag
    news:3GbMa.4404$...
    > Hi, I need to get a sort of DOM from an HTML page that is declared as

    XHTML
    > but unfortunately is *not* xhtml valid.. If I try to parse it with
    > xml.dom.minidom I get error with expat (as I supposed), so I was told to
    > try in this way, with a "forgiving" html parser:
    >
    > from xml.dom.ext.reader import HtmlLib
    > reader = HtmlLib.Reader()
    > dom = reader.fromUri(url) # 'url' the web page
    >
    > FIRST ISSUE:
    > It seemed to me, reading the source code in
    > $MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
    > that these are 4DOM APIs , so from what I know of python distributions,

    they
    > are extra packages, or not? I would like to use *only* libs that are
    > available in the python2.2 suite, not any extra.
    >
    > SECOND ISSUE:
    > If the above libs were included in python (and so I would continue using
    > them), how do I print a string representation of a (sub) tree of the DOM?

    I
    > tried with .toxml() as in the XML tutorial but that method does not exist
    > for the FtNode objects that are involved there... Any idea??
    >
    > Thanks so much for who can help me
    >
    > --
    > bye
    > Alessio Pace
     
    F. GEIGER, Jul 1, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thorsten Meininger
    Replies:
    0
    Views:
    461
    Thorsten Meininger
    Jul 28, 2004
  2. Thorsten Meininger
    Replies:
    0
    Views:
    529
    Thorsten Meininger
    Jul 28, 2004
  3. mike
    Replies:
    1
    Views:
    1,291
    Martin Honnen
    Nov 20, 2004
  4. Replies:
    0
    Views:
    581
  5. DOM ? HTML DOM

    , Dec 19, 2007, in forum: Javascript
    Replies:
    1
    Views:
    149
Loading...

Share This Page