processing XHTML1.1 documents with xml.sax

Discussion in 'Python' started by webworldL@yahoo.com, Aug 7, 2004.

  1. Guest

    Has anybody had any luck processing XHTML1.1 documents with xml.sax?
    Whenever I try it, python loads the W3C DTD from the top, then crashes
    saying that there's an error in the external DTD.
    All I need to do is rip through a bunch of XHTML documents and extract
    some data, does anybody know a quick way to do this without sax making
    outgoing network connections and fussing with DTDs?

    BTW, the code to reproduce the error if anybody cares:
    below is a document 'hello.html' produced by the W3C's Amaya:

    <?xml version="1.0" encoding="iso-8859-1"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html;
    charset=iso-8859-1" />
    <title>Hello World</title>
    <meta name="generator" content="amaya 8.5, see
    http://www.w3.org/Amaya/" />
    </head>

    <body>
    <p>hello world!</p>
    </body>
    </html>

    and the script:

    import xml.sax.handler
    xml.sax.parse("hello.html",
    xml.sax.handler.ContentHandler()
    )

    the error:

    SAXParseException:
    http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-framework-1.mod:89:0:
    error in processing external entity reference

    will be thrown.
     
    , Aug 7, 2004
    #1
    1. Advertising

  2. Uche Ogbuji Guest

    wrote in message news:<>...
    > Has anybody had any luck processing XHTML1.1 documents with xml.sax?
    > Whenever I try it, python loads the W3C DTD from the top, then crashes
    > saying that there's an error in the external DTD.
    > All I need to do is rip through a bunch of XHTML documents and extract
    > some data, does anybody know a quick way to do this without sax making
    > outgoing network connections and fussing with DTDs?
    >
    > BTW, the code to reproduce the error if anybody cares:
    > below is a document 'hello.html' produced by the W3C's Amaya:
    >
    > <?xml version="1.0" encoding="iso-8859-1"?>
    > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    > "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    > <html xmlns="http://www.w3.org/1999/xhtml">
    > <head>
    > <meta http-equiv="Content-Type" content="text/html;
    > charset=iso-8859-1" />
    > <title>Hello World</title>
    > <meta name="generator" content="amaya 8.5, see
    > http://www.w3.org/Amaya/" />
    > </head>
    >
    > <body>
    > <p>hello world!</p>
    > </body>
    > </html>
    >
    > and the script:
    >
    > import xml.sax.handler
    > xml.sax.parse("hello.html",
    > xml.sax.handler.ContentHandler()
    > )
    >
    > the error:
    >
    > SAXParseException:
    > http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-framework-1.mod:89:0:
    > error in processing external entity reference
    >
    > will be thrown.


    Ouch. I took a brief look at this and expat has a problem here. I
    should note that there are few more hairy stress tests of DTD
    conformance than XHTMLMOD (the basis of XHTML 1.1).

    Using the most recent expat, 1.95.8, something weird happens:

    [uogbuji@borgia xmlwf]$ xmlwf -p ~/foo.xhtml
    /home/uogbuji/http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd: No such
    file or directory
    /home/uogbuji/foo.xhtml:3:52: error in processing external entity
    reference

    It's a little confused about the fact that http:// starts a URL. I
    tried as much fiddling as I had time to, but I think there's little
    recourse but for you to submit a bug report to the expat project:

    http://sourceforge.net/tracker/?group_id=10127&atid=110127

    And change your DTD to use XHTML 1.0 (which *does* work with expat)
    rather than 1.1

    Good luck.


    --
    Uche Ogbuji Fourthought, Inc.
    http://uche.ogbuji.net http://4Suite.org http://fourthought.com
    Decomposition, Process, Recomposition -
    http://www.xml.com/pub/a/2004/07/28/py-xml.html
    Perspective on XML: Steady steps spell success with Google -
    http://www.adtmag.com/article.asp?id=9663
    Managing XML libraries - http://www.adtmag.com/article.asp?id=9160
    Commentary on "Objects. Encapsulation. XML?" -
    http://www.adtmag.com/article.asp?id=9090
    Harold's Effective XML -
    http://www.ibm.com/developerworks/xml/library/x-think25.html
    A survey of XML standards -
    http://www-106.ibm.com/developerworks/xml/library/x-stand4/
     
    Uche Ogbuji, Aug 9, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. CJ
    Replies:
    10
    Views:
    628
  2. Andreas Prilop

    Re: XHTML1.0 validation problem

    Andreas Prilop, Aug 30, 2003, in forum: HTML
    Replies:
    3
    Views:
    369
  3. CJ
    Replies:
    4
    Views:
    498
  4. __PPS__
    Replies:
    3
    Views:
    538
    __PPS__
    Sep 7, 2005
  5. Rob Meade

    XHTML1-Transitional / Font size...

    Rob Meade, Nov 22, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    501
    Rob Meade
    Nov 23, 2006
Loading...

Share This Page