Can xml.sax NOT process the DTD?

M

marek jedlinski

I'm using xml.sax to extract certain content from xml files. (Background:
my job is software localization; these are bilingual xml files, from which
I need to extract translated text, e.g. for spellchecking).

It works fine, unless a particular file has a doctype directive that
specifies a DTD. The parser then bails out, because the dtd is not
available (IOError, file not found). Since I don't have the DTDs, I need to
tell the SAX parser to ignore the doctype directive. Is this possible,
please?

I've noticed that I can eliminate the error if I create 0-byte dtd files
and put them where the parser expects to find them, but this is a little
tedious, since there are plenty of different DTDs expected at different
locations.

Or is there another SAX parser for Python I could use instead?

Kind thanks for any suggestions,
..marek
 
J

Jim

I've noticed that I can eliminate the error if I create 0-byte dtd files
and put them where the parser expects to find them, but this is a little
tedious, since there are plenty of different DTDs expected at different
locations.
How about overriding the entity resolver in some way like this:

class xResolver(EntityResolver):
def resolveEntity(self, publicId, systemId):
return "dummy.dtd"

and then calling .setEntityResolver(xResolver()) ? That will always
look in the same dtd file, which you say works for you.

Jim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top