sax EntityResolver problem (expat?)

Discussion in 'Python' started by chris, Jun 10, 2004.

  1. chris

    chris Guest

    hi,
    sax beginner question i must admit:

    i try to filter a simple XHTML document with a standard DTD declaration
    (<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">) in it.
    sax gives the following error

    >>> xml.sax._exceptions.SAXParseException: <unknown>:53:8: undefined entity


    which is an &nbsp; entity.
    so i thought i just implement the EntityResolver class and use a local
    copy of the DTD

    # ========================
    class XHTMLResolver(xml.sax.handler.EntityResolver, object):

    def resolveEntity(self, publicId, systemId):
    return 'http://localhost/xhtml1-transitional.dtd'

    reader = xml.sax.make_parser()
    reader.setEntityResolver(XHTMLResolver())
    # ========================

    problem is, it seems expat does not use this resolver as i get the same
    error again. i also tried the following, which is not supported anyhow:

    reader.setFeature('http://xml.org/sax/features/external-parameter-entities',
    True)
    >>> xml.sax._exceptions.SAXNotSupportedException: expat does not read

    external parameter entities

    is the XHTMLResolver class not the way it should be? or do i have to set
    another feature/property?


    ultimately i do not want to use the http://localhost copy but i would
    like to read the local file (just with open(...) or something) and go
    from there. is that possible? do i have to


    thanks a lot
    chris
     
    chris, Jun 10, 2004
    #1
    1. Advertising

  2. chris

    Ralf Schmitt Guest

    chris <> writes:

    > hi,
    > sax beginner question i must admit:
    >
    > i try to filter a simple XHTML document with a standard DTD
    > declaration (<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    > Transitional//EN"
    > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">) in it.
    > sax gives the following error
    >
    > >>> xml.sax._exceptions.SAXParseException: <unknown>:53:8: undefined entity

    >
    > which is an &nbsp; entity.
    > so i thought i just implement the EntityResolver class and use a local
    > copy of the DTD
    >
    > # ========================
    > class XHTMLResolver(xml.sax.handler.EntityResolver, object):
    >
    > def resolveEntity(self, publicId, systemId):
    > return 'http://localhost/xhtml1-transitional.dtd'
    >
    > reader = xml.sax.make_parser()
    > reader.setEntityResolver(XHTMLResolver())
    > # ========================
    >
    > problem is, it seems expat does not use this resolver as i get the
    > same error again. i also tried the following, which is not supported
    > anyhow:
    >
    > reader.setFeature('http://xml.org/sax/features/external-parameter-entities',
    > True)
    > >>> xml.sax._exceptions.SAXNotSupportedException: expat does not read

    > external parameter entities
    >
    > is the XHTMLResolver class not the way it should be? or do i have to
    > set another feature/property?


    That's the way it works for me. You can also just open() your dtd'
    files and return an open file handle. Note that when using the above
    dtd your resolveEntity will be called more than once with different id's.

    --------------------------------
    from xml.sax import saxutils, handler, make_parser, xmlreader
    class Handler(handler.ContentHandler):
    def resolveEntity(self, publicid, systemid):
    print "RESOLVE:", publicid, systemid

    return open(systemid[systemid.rfind('/')+1:], "rb")
    def characters(self, s):
    print repr(s)

    doc = r'''<?xml version="1.0" encoding="ISO-8859-1"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <HTML>
    &nbsp;&auml;
    </HTML>
    '''

    h = Handler()
    parser = make_parser()
    parser.setContentHandler(h)
    parser.setEntityResolver(h)

    parser.feed(doc)
    parser.close()
    -------
    Output:

    RESOLVE: -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
    RESOLVE: -//W3C//ENTITIES Latin 1 for XHTML//EN xhtml-lat1.ent
    RESOLVE: -//W3C//ENTITIES Symbols for XHTML//EN xhtml-symbol.ent
    RESOLVE: -//W3C//ENTITIES Special for XHTML//EN xhtml-special.ent
    u'\n'
    u'\xa0'
    u'\xe4'
    u'\n'

    >
    >
    > ultimately i do not want to use the http://localhost copy but i would
    > like to read the local file (just with open(...) or something) and go
    > from there. is that possible? do i have to
    >
    >
    > thanks a lot
    > chris


    --
    brainbot technologies ag
    boppstrasse 64 . 55118 mainz . germany
    fon +49 6131 211639-1 . fax +49 6131 211639-2
    http://brainbot.com/ mailto:
     
    Ralf Schmitt, Jun 11, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas Scheffler

    JAXP:SAX: EntityResolver never used

    Thomas Scheffler, Nov 12, 2003, in forum: XML
    Replies:
    0
    Views:
    644
    Thomas Scheffler
    Nov 12, 2003
  2. christof hoeke

    empty EntityResolver for SAX

    christof hoeke, Dec 21, 2003, in forum: Python
    Replies:
    0
    Views:
    345
    christof hoeke
    Dec 21, 2003
  3. Thomas Guettler

    xml.parsers.expat vs. xml.sax

    Thomas Guettler, Apr 27, 2004, in forum: Python
    Replies:
    2
    Views:
    922
    Martijn Faassen
    Apr 27, 2004
  4. Gary Robinson

    xml.sax.xmlreader and expat

    Gary Robinson, Jun 27, 2006, in forum: Python
    Replies:
    2
    Views:
    346
    Stefan Behnel
    Jun 28, 2006
  5. sharan
    Replies:
    1
    Views:
    731
    Pavel Lepin
    Oct 26, 2007
Loading...

Share This Page