Parameter-entity inclusion in mark-up declarations

S

Steven Simpson

Hello,

If an XML processor encounters a 'PEReference' in a 'markupdecl' (but
not in 'EntityValue'), are there circumstances where it does not have to
include it (i.e. use the replacement text and re-parse), e.g. when not
validating? Or perhaps it would never have to have such an encounter
unless it was already required to include?

If it does not include, what can it assume about the subsequent text?
For example, after "<!ELEMENT %erk;", can it assume that "%erk;" only
expands to a 'Name', and covers no part of the 'contentspec'? Or does
it have to abandon any detailed parsing until it synchronizes with the
closing ">"?

Where exactly is a 'PEReference' allowed in a 'markupdecl' (other than
inside 'EntityValue')? Where "SYSTEM"/"PUBLIC"/"NDATA" are expected?
As the 'Name' of an entity or notation being declared? Inside, or as
the whole of, a 'SystemLiteral' or 'PubidLiteral'?

Thanks,

Steven
 
R

Richard Tobin

Steven Simpson said:
If an XML processor encounters a 'PEReference' in a 'markupdecl' (but
not in 'EntityValue'), are there circumstances where it does not have to
include it (i.e. use the replacement text and re-parse), e.g. when not
validating? Or perhaps it would never have to have such an encounter
unless it was already required to include?

A non-validating processor can choose whether to expand parameter
entities, but once it has not processed one it must not process any
more entity or attlist declarations (because it may have missed an
overriding declaration).

In practice, I don't know of any parsers that selectively expand PEs.
Since the internal subset cannot contain PE references inside
declarations, a parser will only encounter such things if it is
processing the external subset. Since the external subset is exactly
the same as an external PE, I would expect any processor that read the
external subset to handle all PEs.
If it does not include, what can it assume about the subsequent text?
For example, after "<!ELEMENT %erk;", can it assume that "%erk;" only
expands to a 'Name', and covers no part of the 'contentspec'?

No. This is valid for example:

<!ENTITY % x "foo EMPTY">
<!ELEMENT %x;>

Furthermore, the proper nesting requirement is only a validity
constraint, so a non-validating parser should even accept

<!ENTITY % x "foo EMPTY> <!ATTLIST foo bar CDATA 'zzz'">
Where exactly is a 'PEReference' allowed in a 'markupdecl' (other than
inside 'EntityValue')? Where "SYSTEM"/"PUBLIC"/"NDATA" are expected?
As the 'Name' of an entity or notation being declared? Inside, or as
the whole of, a 'SystemLiteral' or 'PubidLiteral'?

Almost anywhere, *but* the replacement text is enlarged with a space
at each end (see section 4.4.8 of the spec), so it won't work to put
one in the middle of a name for example. This effectively ensures
that PEs must expand to a sequence of complete tokens. (The
space-enlargement doesn't happen in entity values, so you can
construct arbitrary text out of PEs that way.)

% is not recognised as starting a PE reference in public and system
literals, so the question doesn't arise for them. Look at the
productions for literals in section 2.3 of the spec.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top