validate XML with DTD and Xerces: Non-whitespace characters

Discussion in 'XML' started by Georg J. Stach, Sep 25, 2005.

  1. Hi,

    as mentioned above I'd like to validate a simple XML-document with a simple
    DTD.
    For this, I use Java and Xerces.
    But, when I have tags of this form:

    <tag>some characters in here</tag>

    Xerces always complains with:
    org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters
    are not allowed in schema elements other than 'xs:appinfo' and
    'xs:documentation'. Saw 'some characters in here'.

    The XML-doc is this:

    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <!DOCTYPE durchwahlnummer SYSTEM "mydtd.dtd">
    <mytag>123456</mytag>

    ------------

    The DTD mydtd.dtd that:

    <!ELEMENT mytag (#PCDATA)>

    ------------

    As you can see, the mytag-tag is explicitly declared as PCDATA type, so the
    error with "non-Whitespace characters" should actually not occur.

    ------------
    The small Java-Program:

    [..]
    try {
    DOMParser parser = new DOMParser();
    parser.setErrorHandler(new ParserError());

    parser.setFeature("http://xml.org/sax/features/validation", true);

    parser.parse(myDocument);
    doc = parser.getDocument();

    } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (SAXNotRecognizedException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (SAXNotSupportedException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (SAXException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }

    }
    [..]

    ------------

    The DTD-validation is turned on.
    (parser.setFeature("http://xml.org/sax/features/validation", true);)

    Does anyone know what's wrong and can help?

    --
    Cheers
    Georg
     
    Georg J. Stach, Sep 25, 2005
    #1
    1. Advertising

  2. In article <dh60t4$ea4$>,
    Georg J. Stach <> wrote:

    >Xerces always complains with:
    >org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters
    >are not allowed in schema elements other than 'xs:appinfo' and
    >'xs:documentation'. Saw 'some characters in here'.


    It seems to be treating your document as an XML schema rather than an
    instance to be validated, but I have no idea what you are doing wrong.

    -- Richard
     
    Richard Tobin, Sep 25, 2005
    #2
    1. Advertising

  3. Richard Tobin wrote:

    > It seems to be treating your document as an XML schema rather than an
    > instance to be validated, but I have no idea what you are doing wrong.


    That's my assumption, too.
    However, according to the Xerces page on [1] (read there "What validation
    behavior do I expect from the default parser configuration?") the code
    should be right.

    Maybe this turns out to be a more Xerces-specific question.
    If somebody has further hints, don't hesitate to reply.


    [1] http://xml.apache.org/xerces2-j/faq-pcfp.html


    Cheers
    Georg
     
    Georg J. Stach, Sep 25, 2005
    #3
  4. Georg J. Stach

    Peter Flynn Guest

    Georg J. Stach wrote:

    > Hi,
    >
    > as mentioned above I'd like to validate a simple XML-document with a
    > simple DTD.
    > For this, I use Java and Xerces.


    Don't. If you want standalone validation with a DTD, use a standalone
    validating parser like onsgmls or rxp.

    > But, when I have tags of this form:
    >
    > <tag>some characters in here</tag>
    >
    > Xerces always complains with:
    > org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace
    > characters are not allowed in schema elements other than 'xs:appinfo' and
    > 'xs:documentation'. Saw 'some characters in here'.
    >
    > The XML-doc is this:
    >
    > <?xml version="1.0" encoding="ISO-8859-1" ?>
    > <!DOCTYPE durchwahlnummer SYSTEM "mydtd.dtd">
    > <mytag>123456</mytag>
    >
    > ------------
    >
    > The DTD mydtd.dtd that:
    >
    > <!ELEMENT mytag (#PCDATA)>


    The name you declare in the Document Type Declaration must be the
    same as the name of the root element type. Change your XML file to

    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <!DOCTYPE mytag SYSTEM "mydtd.dtd">
    <mytag>123456</mytag>

    (or change the DTD to declare durchwahlnummer instead).

    > As you can see, the mytag-tag is explicitly declared as PCDATA type, so
    > the error with "non-Whitespace characters" should actually not occur.


    Your validator isn't giving you the whole story. If I test your original
    with onsgmls, I get a much more explicit report:

    > $ onsgmls -wxml -s -E 5000 /usr/share/sgml/xml.dcl test.xml
    > onsgmls:/usr/share/sgml/xml.dcl:1:W: SGML declaration was not implied
    > onsgmls:test.xml:2:44:E: DTD did not contain element declaration for

    document type name
    > onsgmls:test.xml:3:6:E: document type does not allow element "mytag" here
    > onsgmls:test.xml:3:22:E: no document element
    > SGML validation exited abnormally with code 1 at Sun Sep 25 15:14:21
    > $


    ///Peter
     
    Peter Flynn, Sep 25, 2005
    #4
  5. Hi Peter!

    > Don't. If you want standalone validation with a DTD, use a standalone
    > validating parser like onsgmls or rxp.


    What reasons speak against the use of Xerces?

    > The name you declare in the Document Type Declaration must be the
    > same as the name of the root element type. Change your XML file to
    >
    > <?xml version="1.0" encoding="ISO-8859-1" ?>
    > <!DOCTYPE mytag SYSTEM "mydtd.dtd">
    > <mytag>123456</mytag>
    >
    > (or change the DTD to declare durchwahlnummer instead).


    uuups, well I adapted the DTD for this newsgroup messages ;-) In real the
    root element and Document Type are the same. This doesn't change anything
    about Xerces' behaviour to complain, unfortunately...


    > Your validator isn't giving you the whole story. If I test your original
    > with onsgmls, I get a much more explicit report: [...]


    I see, _that_ could be one reason no to use Xerces, hum? ;-)
    I'll have a look on onsgmls.
    But actually I cannot imagine that Xerces isn't able to validate against a
    DTD. There must be a quite simple error in anywhere I haven't found... in
    most cases the problem isn't the application but the programmer ;-)


    Georg
     
    Georg J. Stach, Sep 25, 2005
    #5
  6. Georg J. Stach wrote:

    > What reasons speak against the use of Xerces?


    Xerces is a library, not a ready-to-use command line tool.
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Sep 25, 2005
    #6
  7. Georg J. Stach

    Peter Flynn Guest

    Georg J. Stach wrote:

    > Hi Peter!
    >
    >> Don't. If you want standalone validation with a DTD, use a standalone
    >> validating parser like onsgmls or rxp.

    >
    > What reasons speak against the use of Xerces?


    None: the other two I mention were merely suggestions. If Xerces
    runs standalone, unassisted, from the command line, then by all
    means use it. But AFAIK it's an API, with a wrapper in Java2, C++,
    or Perl. Which is a fine thing, but it's not a standalone parser-
    validator. To make an adequate test where there is an unexplained
    error, you need to remove all extraneous bits and get down the to
    bare bones: an XML file, a DTD, and a parser.

    > uuups, well I adapted the DTD for this newsgroup messages ;-) In real the
    > root element and Document Type are the same.


    That changes the problem entirely. Please post the accurate example.

    ///Peter
     
    Peter Flynn, Sep 25, 2005
    #7
  8. Georg J. Stach

    JAPISoft Guest

    Hi Georg,

    I notice your declaration is a wrong one :

    <!DOCTYPE durchwahlnummer SYSTEM "mydtd.dtd">

    it should be

    <!DOCTYPE mytag SYSTEM "mydtd.dtd">

    Hope it helps,

    Best regards,

    A.Brillant
    http://www.editix.com -- XML Editor and XSLT Debugger


    Georg J. Stach wrote:
    > Hi,
    >
    > as mentioned above I'd like to validate a simple XML-document with a simple
    > DTD.
    > For this, I use Java and Xerces.
    > But, when I have tags of this form:
    >
    > <tag>some characters in here</tag>
    >
    > Xerces always complains with:
    > org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters
    > are not allowed in schema elements other than 'xs:appinfo' and
    > 'xs:documentation'. Saw 'some characters in here'.
    >
    > The XML-doc is this:
    >
    > <?xml version="1.0" encoding="ISO-8859-1" ?>
    > <!DOCTYPE durchwahlnummer SYSTEM "mydtd.dtd">
    > <mytag>123456</mytag>
    >
    > ------------
    >
    > The DTD mydtd.dtd that:
    >
    > <!ELEMENT mytag (#PCDATA)>
    >
    > ------------
    >
    > As you can see, the mytag-tag is explicitly declared as PCDATA type, so the
    > error with "non-Whitespace characters" should actually not occur.
    >
    > ------------
    > The small Java-Program:
    >
    > [..]
    > try {
    > DOMParser parser = new DOMParser();
    > parser.setErrorHandler(new ParserError());
    >
    > parser.setFeature("http://xml.org/sax/features/validation", true);
    >
    > parser.parse(myDocument);
    > doc = parser.getDocument();
    >
    > } catch (IOException e) {
    > // TODO Auto-generated catch block
    > e.printStackTrace();
    > } catch (SAXNotRecognizedException e) {
    > // TODO Auto-generated catch block
    > e.printStackTrace();
    > } catch (SAXNotSupportedException e) {
    > // TODO Auto-generated catch block
    > e.printStackTrace();
    > } catch (SAXException e) {
    > // TODO Auto-generated catch block
    > e.printStackTrace();
    > }
    >
    > }
    > [..]
    >
    > ------------
    >
    > The DTD-validation is turned on.
    > (parser.setFeature("http://xml.org/sax/features/validation", true);)
    >
    > Does anyone know what's wrong and can help?
    >
    > --
    > Cheers
    > Georg
     
    JAPISoft, Sep 26, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bugbear
    Replies:
    0
    Views:
    1,018
    bugbear
    Aug 28, 2003
  2. cvissy
    Replies:
    0
    Views:
    609
    cvissy
    Nov 16, 2004
  3. Micah
    Replies:
    2
    Views:
    346
    Micah
    Jun 2, 2006
  4. test
    Replies:
    2
    Views:
    2,046
    Oliver Wong
    Jul 28, 2006
  5. Bouton Jones
    Replies:
    0
    Views:
    1,192
    Bouton Jones
    Jan 6, 2009
Loading...

Share This Page