How to force SAX parser to ignore encoding problems

Å

Åukasz

Hi,
I have a problem with my XML parser (created with libraries from
xml.sax package). When parser finds a invalid character (in CDATA
section) for example �, throws an exception SAXParseException.

Is there any way to just ignore this kind of problem. Maybe there is a
way to set up parser in less strict mode?

I know that I can catch this exception and determine if this is this
kind of problem and then ignore this, but I am asking about any global
setting.
 
Å

Åukasz

Hi,
I have a problem with my XML parser (created with libraries from
xml.sax package). When parser finds a invalid character (in CDATA
section) for example ,

After sending this message I noticed that example invalid characters
are not displaying on some platforms :)
 
S

Stefan Behnel

Åukasz said:
I have a problem with my XML parser (created with libraries from
xml.sax package). When parser finds a invalid character (in CDATA
section) for example �, throws an exception SAXParseException.

Is there any way to just ignore this kind of problem. Maybe there is a
way to set up parser in less strict mode?

I know that I can catch this exception and determine if this is this
kind of problem and then ignore this, but I am asking about any global
setting.

The parser from libxml2 that lxml provides has a recovery option, i.e. it
can keep parsing regardless of errors and will drop the broken content.

However, it is *always* better to fix the input, if you get any hand on it.
Broken XML is *not* XML at all. If you can't fix the source, you can never
be sure that the data you received is in any way complete or even usable.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top