sax EntityResolver problem (expat?)

C

chris

hi,
sax beginner question i must admit:

i try to filter a simple XHTML document with a standard DTD declaration
(<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">) in it.
sax gives the following error

which is an &nbsp; entity.
so i thought i just implement the EntityResolver class and use a local
copy of the DTD

# ========================
class XHTMLResolver(xml.sax.handler.EntityResolver, object):

def resolveEntity(self, publicId, systemId):
return 'http://localhost/xhtml1-transitional.dtd'

reader = xml.sax.make_parser()
reader.setEntityResolver(XHTMLResolver())
# ========================

problem is, it seems expat does not use this resolver as i get the same
error again. i also tried the following, which is not supported anyhow:

reader.setFeature('http://xml.org/sax/features/external-parameter-entities',
True)external parameter entities

is the XHTMLResolver class not the way it should be? or do i have to set
another feature/property?


ultimately i do not want to use the http://localhost copy but i would
like to read the local file (just with open(...) or something) and go
from there. is that possible? do i have to


thanks a lot
chris
 
R

Ralf Schmitt

chris said:
hi,
sax beginner question i must admit:

i try to filter a simple XHTML document with a standard DTD
declaration (<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">) in it.
sax gives the following error


which is an &nbsp; entity.
so i thought i just implement the EntityResolver class and use a local
copy of the DTD

# ========================
class XHTMLResolver(xml.sax.handler.EntityResolver, object):

def resolveEntity(self, publicId, systemId):
return 'http://localhost/xhtml1-transitional.dtd'

reader = xml.sax.make_parser()
reader.setEntityResolver(XHTMLResolver())
# ========================

problem is, it seems expat does not use this resolver as i get the
same error again. i also tried the following, which is not supported
anyhow:

reader.setFeature('http://xml.org/sax/features/external-parameter-entities',
True)
external parameter entities

is the XHTMLResolver class not the way it should be? or do i have to
set another feature/property?

That's the way it works for me. You can also just open() your dtd'
files and return an open file handle. Note that when using the above
dtd your resolveEntity will be called more than once with different id's.

--------------------------------
from xml.sax import saxutils, handler, make_parser, xmlreader
class Handler(handler.ContentHandler):
def resolveEntity(self, publicid, systemid):
print "RESOLVE:", publicid, systemid

return open(systemid[systemid.rfind('/')+1:], "rb")
def characters(self, s):
print repr(s)

doc = r'''<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<HTML>
&nbsp;&auml;
</HTML>
'''

h = Handler()
parser = make_parser()
parser.setContentHandler(h)
parser.setEntityResolver(h)

parser.feed(doc)
parser.close()
-------
Output:

RESOLVE: -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
RESOLVE: -//W3C//ENTITIES Latin 1 for XHTML//EN xhtml-lat1.ent
RESOLVE: -//W3C//ENTITIES Symbols for XHTML//EN xhtml-symbol.ent
RESOLVE: -//W3C//ENTITIES Special for XHTML//EN xhtml-special.ent
u'\n'
u'\xa0'
u'\xe4'
u'\n'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top