S
scott
I've got an XML feed from a vendor that is not well-formed, and having
them change it is not an option. I'm trying to figure out how to
create an error-handler that will ignore the invalid token and continue
on.
The file is large, so I'd prefer not to put it all in memory or save it
off and strip out the bad characters before I parse it.
I've included one of the problematic characters in a small XML snippet
below.
I'm new to Python, and I don't know how to accomplish this. Any help is
greatly appreciated!
-----------------------------------------------------------------
Here is my code:
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
import StringIO
class ErrorHandler:
def __init__(self, parser):
self.parser = parser
def warning(self, msg):
print '*** (ErrorHandler.warning) msg:', msg
def error(self, msg):
print '*** (ErrorHandler.error) msg:', msg
def fatalError(self, msg):
print msg
class ContentHandler(ContentHandler):
def __init__ (self):
pass
def startElement(self, name, attrs):
pass
def characters (self, ch):
pass
def endElement(self, name):
pass
xmlstr = """
<cities>
<city>
<name>Tampa</name>
<description>A great city and place to live</description>
</city>
<city>
<name>Clearwater</name>
<description>Beautiful beaches</description>
</city>
</cities>
"""
parser = make_parser()
curHandler = ContentHandler()
errorHandler = ErrorHandler(parser)
parser.setContentHandler(curHandler)
parser.setErrorHandler(errorHandler)
parser.parse(StringIO.StringIO(xmlstr))
them change it is not an option. I'm trying to figure out how to
create an error-handler that will ignore the invalid token and continue
on.
The file is large, so I'd prefer not to put it all in memory or save it
off and strip out the bad characters before I parse it.
I've included one of the problematic characters in a small XML snippet
below.
I'm new to Python, and I don't know how to accomplish this. Any help is
greatly appreciated!
-----------------------------------------------------------------
Here is my code:
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
import StringIO
class ErrorHandler:
def __init__(self, parser):
self.parser = parser
def warning(self, msg):
print '*** (ErrorHandler.warning) msg:', msg
def error(self, msg):
print '*** (ErrorHandler.error) msg:', msg
def fatalError(self, msg):
print msg
class ContentHandler(ContentHandler):
def __init__ (self):
pass
def startElement(self, name, attrs):
pass
def characters (self, ch):
pass
def endElement(self, name):
pass
xmlstr = """
<cities>
<city>
<name>Tampa</name>
<description>A great city and place to live</description>
</city>
<city>
<name>Clearwater</name>
<description>Beautiful beaches</description>
</city>
</cities>
"""
parser = make_parser()
curHandler = ContentHandler()
errorHandler = ErrorHandler(parser)
parser.setContentHandler(curHandler)
parser.setErrorHandler(errorHandler)
parser.parse(StringIO.StringIO(xmlstr))