Well-formedness and undeclared general entity references

S

Stanimir Stamenkov

Is an XML document using undeclared general entity references not
well-formed? For example:

<test>
foo
&bar;
</test>

If yes, what's the difference given using a non-validating processor
and the given example:

<!DOCTYPE test SYSTEM "empty.dtd">
<test>
foo
&bar;
</test>

Where "empty.dtd" is really an empty file.

I'm experimenting with Xerces2-J v2.9.1 and the SAX interfaces, and
the first example gives me a fatal error, while the second one
parses o.k. reporting the &bar; reference as skipped entity.
 
S

Stanimir Stamenkov

Fri, 10 Apr 2009 14:57:27 +0300, /Stanimir Stamenkov/:
Is an XML document using undeclared general entity references not
well-formed? For example:

<test>
foo
&bar;
</test>

If yes, what's the difference given using a non-validating processor and
the given example:

<!DOCTYPE test SYSTEM "empty.dtd">
<test>
foo
&bar;
</test>

Where "empty.dtd" is really an empty file.

I'm experimenting with Xerces2-J v2.9.1 and the SAX interfaces, and the
first example gives me a fatal error, while the second one parses o.k.
reporting the &bar; reference as skipped entity.

I've further read said:
*Well-formedness constraint: Entity Declared*

[...]

Note that non-validating processors are not obligated to read and
process entity declarations occurring in parameter entities or in
the external subset; for such documents, the rule that an entity
must be declared is a well-formedness constraint only if
standalone='yes'.

Adding a standalone="yes" declaration to the second example makes it
fail as the first one:

<?xml version="1.0" encoding="US-ASCII" standalone="yes"?>
<!DOCTYPE test SYSTEM "empty.dtd">
<test>
foo
&bar;
</test>

But adding standalone="no" to the first example doesn't change
anything for me using Xerces2-J v2.9.1:

<?xml version="1.0" encoding="US-ASCII" standalone="no"?>
<test>
foo
&bar;
</test>

So, Is it generally possible to parse an XML document with
undeclared entities?
 
M

Martin Honnen

Stanimir said:
Is an XML document using undeclared general entity references not
well-formed? For example:

<test>
foo
&bar;
</test>

See http://www.w3.org/TR/xml/#sec-references:
"Well-formedness constraint: Entity Declared

In a document without any DTD, [...] for an entity reference that does
not occur within the external subset or a parameter entity, the Name
given in the entity reference MUST match that in an entity declaration
that does not occur within the external subset or a parameter entity,
except that well-formed documents need not declare any of the following
entities: amp, lt, gt, apos, quot."

If you don't have a DTD, not even an internal subset, then you can only
reference the predefined entities (amp, lt, gt, apos, quot). An entity
reference to anything else violates well-formedness.

If yes, what's the difference given using a non-validating processor and
the given example:

<!DOCTYPE test SYSTEM "empty.dtd">
<test>
foo
&bar;
</test>

Where "empty.dtd" is really an empty file.

Not sure about that one.
 
S

Stanimir Stamenkov

Fri, 10 Apr 2009 14:43:32 +0200, /Martin Honnen/:
Stanimir said:
Is an XML document using undeclared general entity references not
well-formed? For example:

<test>
foo
&bar;
</test>

See http://www.w3.org/TR/xml/#sec-references:
"Well-formedness constraint: Entity Declared

In a document without any DTD, [...] for an entity reference that does
not occur within the external subset or a parameter entity, the Name
given in the entity reference MUST match that in an entity declaration
that does not occur within the external subset...

If I have "document without any DTD" the wording "the Name given in
the entity reference MUST match that in an entity declaration that
does not occur within the external subset" is somewhat confusing for
me as there's obviously no external subset. The other cases
(stripped from the quotation): "a document with only an internal DTD
subset which contains no parameter entity references, or a document
with standalone='yes'" seems fine, but see my other message in this
thread regarding the paragraph following the quoted one in the
specification:
Note that non-validating processors are not obligated to read and
process entity declarations occurring in parameter entities or in
the external subset; for such documents, the rule that an entity
must be declared is a well-formedness constraint only if
standalone='yes'.

Even if I specify standalone='no' for the given example the parsing
fails for me (again, I'm using Xerces2-J for what is worth).
 
R

Richard Tobin

Stanimir Stamenkov said:
Is an XML document using undeclared general entity references not
well-formed? For example:

<test>
foo
&bar;
</test>

That document is not well-formed.
If yes, what's the difference given using a non-validating processor
and the given example:

<!DOCTYPE test SYSTEM "empty.dtd">
<test>
foo
&bar;
</test>

Where "empty.dtd" is really an empty file.

A non-validating processor isn't required to read empty.dtd, so it may
not be able to tell that it doesn't contain any declarations. The
idea is that an error that can only be detected by reading the
external DTD is a validity error, rather than a well-formedness error.

-- Richard
 
S

Stanimir Stamenkov

10 Apr 2009 14:26:52 GMT, /Richard Tobin/:
A non-validating processor isn't required to read empty.dtd, so it may
not be able to tell that it doesn't contain any declarations. The
idea is that an error that can only be detected by reading the
external DTD is a validity error, rather than a well-formedness error.

So I may address this issue further on the Xerces mailing list as I
can see the "empty.dtd" gets read, first by registering a SAX
LexicalHandler and observing the start/endEntity events, then by
declaring:

<!ENTITY fu "foo">

And replacing "foo" with "&fu;" (w/o the quotes) in the original
example.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top