validate XML with DTD and Xerces: Non-whitespace characters

G

Georg J. Stach

Hi,

as mentioned above I'd like to validate a simple XML-document with a simple
DTD.
For this, I use Java and Xerces.
But, when I have tags of this form:

<tag>some characters in here</tag>

Xerces always complains with:
org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters
are not allowed in schema elements other than 'xs:appinfo' and
'xs:documentation'. Saw 'some characters in here'.

The XML-doc is this:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE durchwahlnummer SYSTEM "mydtd.dtd">
<mytag>123456</mytag>

------------

The DTD mydtd.dtd that:

<!ELEMENT mytag (#PCDATA)>

------------

As you can see, the mytag-tag is explicitly declared as PCDATA type, so the
error with "non-Whitespace characters" should actually not occur.

------------
The small Java-Program:

[..]
try {
DOMParser parser = new DOMParser();
parser.setErrorHandler(new ParserError());

parser.setFeature("http://xml.org/sax/features/validation", true);

parser.parse(myDocument);
doc = parser.getDocument();

} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXNotRecognizedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXNotSupportedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
[..]

------------

The DTD-validation is turned on.
(parser.setFeature("http://xml.org/sax/features/validation", true);)

Does anyone know what's wrong and can help?
 
R

Richard Tobin

Georg J. Stach said:
Xerces always complains with:
org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters
are not allowed in schema elements other than 'xs:appinfo' and
'xs:documentation'. Saw 'some characters in here'.

It seems to be treating your document as an XML schema rather than an
instance to be validated, but I have no idea what you are doing wrong.

-- Richard
 
G

Georg J. Stach

Richard said:
It seems to be treating your document as an XML schema rather than an
instance to be validated, but I have no idea what you are doing wrong.

That's my assumption, too.
However, according to the Xerces page on [1] (read there "What validation
behavior do I expect from the default parser configuration?") the code
should be right.

Maybe this turns out to be a more Xerces-specific question.
If somebody has further hints, don't hesitate to reply.


[1] http://xml.apache.org/xerces2-j/faq-pcfp.html


Cheers
Georg
 
P

Peter Flynn

Georg said:
Hi,

as mentioned above I'd like to validate a simple XML-document with a
simple DTD.
For this, I use Java and Xerces.

Don't. If you want standalone validation with a DTD, use a standalone
validating parser like onsgmls or rxp.
But, when I have tags of this form:

<tag>some characters in here</tag>

Xerces always complains with:
org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace
characters are not allowed in schema elements other than 'xs:appinfo' and
'xs:documentation'. Saw 'some characters in here'.

The XML-doc is this:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE durchwahlnummer SYSTEM "mydtd.dtd">
<mytag>123456</mytag>

------------

The DTD mydtd.dtd that:

<!ELEMENT mytag (#PCDATA)>

The name you declare in the Document Type Declaration must be the
same as the name of the root element type. Change your XML file to

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE mytag SYSTEM "mydtd.dtd">
<mytag>123456</mytag>

(or change the DTD to declare durchwahlnummer instead).
As you can see, the mytag-tag is explicitly declared as PCDATA type, so
the error with "non-Whitespace characters" should actually not occur.

Your validator isn't giving you the whole story. If I test your original
with onsgmls, I get a much more explicit report:
$ onsgmls -wxml -s -E 5000 /usr/share/sgml/xml.dcl test.xml
onsgmls:/usr/share/sgml/xml.dcl:1:W: SGML declaration was not implied
onsgmls:test.xml:2:44:E: DTD did not contain element declaration for document type name
onsgmls:test.xml:3:6:E: document type does not allow element "mytag" here
onsgmls:test.xml:3:22:E: no document element
SGML validation exited abnormally with code 1 at Sun Sep 25 15:14:21
$

///Peter
 
G

Georg J. Stach

Hi Peter!
Don't. If you want standalone validation with a DTD, use a standalone
validating parser like onsgmls or rxp.

What reasons speak against the use of Xerces?
The name you declare in the Document Type Declaration must be the
same as the name of the root element type. Change your XML file to

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE mytag SYSTEM "mydtd.dtd">
<mytag>123456</mytag>

(or change the DTD to declare durchwahlnummer instead).

uuups, well I adapted the DTD for this newsgroup messages ;-) In real the
root element and Document Type are the same. This doesn't change anything
about Xerces' behaviour to complain, unfortunately...

Your validator isn't giving you the whole story. If I test your original
with onsgmls, I get a much more explicit report: [...]

I see, _that_ could be one reason no to use Xerces, hum? ;-)
I'll have a look on onsgmls.
But actually I cannot imagine that Xerces isn't able to validate against a
DTD. There must be a quite simple error in anywhere I haven't found... in
most cases the problem isn't the application but the programmer ;-)


Georg
 
P

Peter Flynn

Georg said:
Hi Peter!


What reasons speak against the use of Xerces?

None: the other two I mention were merely suggestions. If Xerces
runs standalone, unassisted, from the command line, then by all
means use it. But AFAIK it's an API, with a wrapper in Java2, C++,
or Perl. Which is a fine thing, but it's not a standalone parser-
validator. To make an adequate test where there is an unexplained
error, you need to remove all extraneous bits and get down the to
bare bones: an XML file, a DTD, and a parser.
uuups, well I adapted the DTD for this newsgroup messages ;-) In real the
root element and Document Type are the same.

That changes the problem entirely. Please post the accurate example.

///Peter
 
J

JAPISoft

Hi Georg,

I notice your declaration is a wrong one :

<!DOCTYPE durchwahlnummer SYSTEM "mydtd.dtd">

it should be

<!DOCTYPE mytag SYSTEM "mydtd.dtd">

Hope it helps,

Best regards,

A.Brillant
http://www.editix.com -- XML Editor and XSLT Debugger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top