saxparser ignore <!doctype> line

S

SnooPac

Hi.

I'm trying to use SAXParser simply to extract some attribute values from an
XML file. No validations or anything.
So I have it all set up and working....mostly.

The problem is that the XML file (which can't be changed) has the line at
the start that says:
<!DOCTYPE BlaCfg SYSTEM "../../../path/to/dtd/file/BlaCfg.dtd">

And this dtd cannot be found.

There are some workarounds I can do (like regenerating this XML file
temporarily without this line, or making a dummy dtd file, or...?), but I'd
rather just have my SAXParser ignore this line entirely. Is there any
property I can set to do this?

Here is some of my code, by the way...I think its very standard:

DefaultHandler handler = new MyTempXmlParser();
SAXParserFactory factory = SAXParserFactory.newInstance();

// Parse the input
SAXParser saxParser = factory.newSAXParser();
saxParser.parse( new File("path/to/xml/file/file.xml"), handler);

I was hoping maybe for some sort of factory.setFeature() line that might do
this?
Or maybe within my handler?

Any hints would be appreciated.

Thanks,
Aiman
 
S

Stanimir Stamenkov

/SnooPac/:
DefaultHandler handler = new MyTempXmlParser();
SAXParserFactory factory = SAXParserFactory.newInstance();

Try setting, explicitly:

factory.setValidating(false);

It should be equivalent to setting the
"http://xml.org/sax/features/validation" feature to false.
// Parse the input
SAXParser saxParser = factory.newSAXParser();
saxParser.parse( new File("path/to/xml/file/file.xml"), handler);

I was hoping maybe for some sort of factory.setFeature() line that might do
this?

http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description
 
S

SnooPac

Try setting, explicitly:
factory.setValidating(false);

Although i havent done this exactly, I did check factory.isValidating() and
it was already set to false.
http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description

Thanks for the list.
From that page, I tried factory.setFeature("resolve-dtd-uris", false);
but from that i got the exception: org.xml.sax.SAXNotRecognizedException:
Feature: resolve-dtd-uris

So it doesnt look like my jdk's (sun 1.4.2 i believe) matches whats in that
webpage you sent :)
Anyway, I dont really know if (or think that) setting "resolve-dtd-uris" to
false would have helped anyway, but it looked like the closest thing to what
I wanted.

Thanks though.
 
P

Pawe³ Stobiñski

SnooPac said:
I was hoping maybe for some sort of factory.setFeature() line that might
do this?
Or maybe within my handler?


Hi,
I thing you rather need to set your own EntityResolver:


final SAXBuilder parser = new SAXBuilder();
parser.setEntityResolver(new EntityResolver()
{
public InputSource resolveEntity(String publicId, String systemId)
{
if (/*publicId is not right*/)
return new InputSource(
new ByteArrayInputStream("<?xml version='1.0'
encoding='UTF-8'?>".getBytes()));
else
return null;
}
});


BR.
 
P

Pawe³ Stobiñski

SnooPac said:
I was hoping maybe for some sort of factory.setFeature() line that might
do this?
Or maybe within my handler?

Hi,
I think you rather need to set your own EntityResolver:


final SAXBuilder parser = new SAXBuilder();
parser.setEntityResolver(new EntityResolver()
{
public InputSource resolveEntity(String publicId, String systemId)
{
if (/publicId is not right/)
return new InputSource(
new ByteArrayInputStream("<?xml version='1.0'
encoding='UTF-8'?>".getBytes()));
else
return null;
}
});


BR.
 
S

SnooPac

Ok, I used a variation of this, and it seems to be working.
Although I don't really understand it, so it seems rather cludgy.

I didn't use saxbuilder or anything. I just overrode the resolveEntity()
function from DefaultHandler (in my handler class).
The other problem I had was that my publicId was being reported as null, and
systemId was reported as file://path/to/myfile.dtd.
So I cant really do anything in my if statement concerning publicId.
so now I have my resolveEntity() returning "<?xml ..." regardless. I guess I
could do some kind of processing on systemId (like check if it starts with
file:// and ends with .dtd, or something else that's simple).

Another thing I don't understand, is what exactly happens when I return that
<?xml... stuff? Will the parser now continue parsing the document using the
InputSource thats returned instead of the <doctype> line from the xml file?

Thanks again,
Aiman

Pawe³ Stobiñski said:
SnooPac said:
I was hoping maybe for some sort of factory.setFeature() line that might
do this?
Or maybe within my handler?

Hi,
I think you rather need to set your own EntityResolver:


final SAXBuilder parser = new SAXBuilder();
parser.setEntityResolver(new EntityResolver()
{
public InputSource resolveEntity(String publicId, String systemId)
{
if (/publicId is not right/)
return new InputSource(
new ByteArrayInputStream("<?xml version='1.0'
encoding='UTF-8'?>".getBytes()));
else
return null;
}
});


BR.
--
Pawe³ Stobiñski // SQ9NRY/5 // GG 0x4F9228
"Windows [n.], A 32-bit extension and GUI shell to a 16-bit patch to
an 8-bit operating system originally coded for a 4-bit microprocessor
and produced by a 2-bit company without 1 bit of sense
 
P

Pawe³ Stobiñski

SnooPac said:
Another thing I don't understand, is what exactly happens when I return
that <?xml... stuff? Will the parser now continue parsing the document
using the InputSource thats returned instead of the <doctype> line from
the xml file?

No, in fact that <?xml... return string served as example. In the case it
should rather be <!DOCTYPE.., or any transformation of input data to avoid
referring to any unknown locations.

BR.
 
S

Stanimir Stamenkov

/SnooPac/:
Although i havent done this exactly, I did check factory.isValidating() and
it was already set to false.

http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description

One could notice that on the above page the "validation" feature has
unspecified default value so one probably need calling
'setValidating(false)' anyway although the Java API describes it is
the default. You may try calling even:

factory.setFeature("http://xml.org/sax/features/validation", false);

You didn't say: did you actually try setting it explicitly? You
stated you only checked 'getValidating()', but could be misleading
sometimes.
Thanks for the list.
From that page, I tried factory.setFeature("resolve-dtd-uris", false);
but from that i got the exception: org.xml.sax.SAXNotRecognizedException:
Feature: resolve-dtd-uris

So it doesnt look like my jdk's (sun 1.4.2 i believe) matches whats in that
webpage you sent :)
Anyway, I dont really know if (or think that) setting "resolve-dtd-uris" to
false would have helped anyway, but it looked like the closest thing to what
I wanted.

I don't think this feature is any use for you.
 
S

Stanimir Stamenkov

/SnooPac/:
Although i havent done this exactly, I did check factory.isValidating() and
it was already set to false.

From my experience the default behavior of parser implementations
is to validate when a DOCTYPE declaration has been encountered and
not to validate if no DOCTYPE has been specified, regardless of what
the initial JAXP factory.getValidation() returns. So you should set
factory.setValidation(true/false) explicitly if you want to trigger
exact behavior.
 
Joined
Apr 21, 2016
Messages
1
Reaction score
0
Hi,

In my XML i have the content <!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">.

Hence while parsing i m getting SAXParseException: doctype not allowed in content.

So i tried using factory.setFeature("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">", false);

But it gave org.xml.sax.SAXNotRecognizedException:

So i override the method resolveEntity method in my custom handler as shown below.
@Override
public InputSource resolveEntity(String publicId, String systemId)
{
if (systemId.contains("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">")) {
return new InputSource(new StringReader(""));
} else {
return null;
}

}

But this method is not getting called in parsing. May i know the details about this method when this method will be called?

Please answer anybody.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,813
Messages
2,569,696
Members
45,483
Latest member
TedDvb6626

Latest Threads

Top