C
Collin VanDyck
Hey,
I have some user input that comes in from copy-pasted MS Word documents.
This poses a problem in our internal XML storage because the XML that they
provide does not include namespace declarations for certain elements and/or
attribute. For instance, there might be some input that looks similar to
this:
<xml>
<span font-style:="font-style:"/>
<xml>
This is of course abbreviated, but one can see that the attribute
font-style: has a namespace of font-style.
I need to remove all such attributes, and to accomplish this, I created a
filter class that extends
org.xml.sax.helpers.XMLFilterImpl
My strategy is to, each time I receive a startElement() event, I will remove
those attributes with those namespaces.
However, when parsing the XML, I receive all of the events leading up to the
element with said undesired attribute correctly. Right before the
startElement for the span element above gets called, I get an exception
09:51:55,864 ERROR [STDERR] org.xml.sax.SAXParseException: The prefix
"font-style" for attribute "font-style:" is not bound.
In order to combat this, I thought that I would insert prefix mappings after
the document start:
public void startDocument() throws SAXException {
super.startPrefixMapping("font-style",http://will.be.removed/etc/etc");
super.startDocument();
}
and to close them out:
public void endDocument() throws SAXException {
super.endPrefixMapping("font-style");
super.endDocument();
}
However, I still get the same exception each time. Is this possible using
XMLFilterImpl? I can try and solve this problem using the 1.4 RE
implementation, but if I could do it using XSLT, that would be much
preferred.
thanks!!!
I have some user input that comes in from copy-pasted MS Word documents.
This poses a problem in our internal XML storage because the XML that they
provide does not include namespace declarations for certain elements and/or
attribute. For instance, there might be some input that looks similar to
this:
<xml>
<span font-style:="font-style:"/>
<xml>
This is of course abbreviated, but one can see that the attribute
font-style: has a namespace of font-style.
I need to remove all such attributes, and to accomplish this, I created a
filter class that extends
org.xml.sax.helpers.XMLFilterImpl
My strategy is to, each time I receive a startElement() event, I will remove
those attributes with those namespaces.
However, when parsing the XML, I receive all of the events leading up to the
element with said undesired attribute correctly. Right before the
startElement for the span element above gets called, I get an exception
09:51:55,864 ERROR [STDERR] org.xml.sax.SAXParseException: The prefix
"font-style" for attribute "font-style:" is not bound.
In order to combat this, I thought that I would insert prefix mappings after
the document start:
public void startDocument() throws SAXException {
super.startPrefixMapping("font-style",http://will.be.removed/etc/etc");
super.startDocument();
}
and to close them out:
public void endDocument() throws SAXException {
super.endPrefixMapping("font-style");
super.endDocument();
}
However, I still get the same exception each time. Is this possible using
XMLFilterImpl? I can try and solve this problem using the 1.4 RE
implementation, but if I could do it using XSLT, that would be much
preferred.
thanks!!!