External Entity in Att Value: Why Forbidden?

D

Douglas Reith

Hi There,
Can someone please tell me why the XML spec states that an attribute
value with an external entity is forbidden? Or point me to the
appropriate document? Or better still, perhaps you know of a work
around?

It is a little frustrating that the normally powerful external
entities are limited in this fashion.

Example (myextent.txt contains just one word without a CR):
-------------------------------------------------
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
]>
<mydoc>
<atag name="fred" value="&myextent;"/>
</mydoc>
 
M

Micah Cowan

Hi There,
Can someone please tell me why the XML spec states that an attribute
value with an external entity is forbidden? Or point me to the
appropriate document? Or better still, perhaps you know of a work
around?

It is a little frustrating that the normally powerful external
entities are limited in this fashion.

Example (myextent.txt contains just one word without a CR):
-------------------------------------------------
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
]>
<mydoc>
<atag name="fred" value="&myextent;"/>
</mydoc>
-------------------------------------------------

I can't really understand why you would want that. If you want
the value of value to be literally "./myextent.txt", then you
don't need SYSTEM in the <!ENTITY> declaration (and don't want
it). If you really want to be able to specify the value in the
content of an external document, then you could try:

-----(contents of your (modified) example:)-------------

<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
&myextent;
]>
<mydoc>
<atag name="fred" value="&myvalue;"/>
</mydoc>
--------------------------------------------------------

-----(contents of myextent.txt)-------------------------
<!ENTITY myvalue "onewordwithoutCR">
--------------------------------------------------------

Or, you could redesign <atag> (if you have such control) so that
&myextent; goes in the content, rather than an attribute.

If you're thinking, "that's not fair", recall that XML was never
designed to be able to import arbitrary text files directly as
content (after all, what if your one word contained an
ampersand?).

-Micah
 
D

Douglas Reith

Thanks Micah,
I can't really understand why you would want that.
Users don't want to modify an XML document when configuring the
system, they prefer one word documents. It's a long story and I don't
agree with it but now we have the unenviable requirement.
the value of value to be literally "./myextent.txt"
No we really need the contents of the file.
then you could try:
I didn't think of this work around (thanks). I think it will be the
closest we can get to a solution.
Or, you could redesign <atag> (if you have such control)
The DTD is directed by a third party, so no, we don't have control.
content (after all, what if your one word contained an
ampersand?).
I'd argue that we'd just have to manage the error. That is, understand
the problem -> create the solution. You have extra flexibility and
extra risk which I think is fair. On another note, sometimes I use
ampersands in internal entities so that I can refer to entities within
entities!

Again, I'm unclear as to why there is a restiction.

Regards,
Douglas


Micah Cowan said:
Hi There,
Can someone please tell me why the XML spec states that an attribute
value with an external entity is forbidden? Or point me to the
appropriate document? Or better still, perhaps you know of a work
around?

It is a little frustrating that the normally powerful external
entities are limited in this fashion.

Example (myextent.txt contains just one word without a CR):
-------------------------------------------------
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
]>
<mydoc>
<atag name="fred" value="&myextent;"/>
</mydoc>
-------------------------------------------------

I can't really understand why you would want that. If you want
the value of value to be literally "./myextent.txt", then you
don't need SYSTEM in the <!ENTITY> declaration (and don't want
it). If you really want to be able to specify the value in the
content of an external document, then you could try:

-----(contents of your (modified) example:)-------------

<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
&myextent;
]>
<mydoc>
<atag name="fred" value="&myvalue;"/>
</mydoc>
--------------------------------------------------------

-----(contents of myextent.txt)-------------------------
<!ENTITY myvalue "onewordwithoutCR">
--------------------------------------------------------

Or, you could redesign <atag> (if you have such control) so that
&myextent; goes in the content, rather than an attribute.

If you're thinking, "that's not fair", recall that XML was never
designed to be able to import arbitrary text files directly as
content (after all, what if your one word contained an
ampersand?).

-Micah
 
R

Richard Tobin

Douglas Reith said:
Can someone please tell me why the XML spec states that an attribute
value with an external entity is forbidden?

I think - though it takes ages to try and find any reference for this
sort of thing in the W3C archives - that the rationale is that:

- parsers don't have to expand external entity references, unlike
internal ones;
- attributes should be returnable as simple strings, which they
wouldn't be if they contained unexpanded (necessarily external)
entity references.

Unfortunately this explanation doesn't really work, since there may
be references to externally-declared internal entities, and parsers
may not know about them either.
Or better still, perhaps you know of a work around?

Yes, but it's a bit messy.

You could use an externally declared internal entity (i.e. instead of
putting just the text in an external entity, put an entity definition
in the external entity and make it a parameter entity).

Or - if you want to just have the replacement text in the file as you
do now - you could use the external entity as a parameter entity and
refer to it in the definition of an internal entity. Unfortunately
this doesn't quite work:


<!DOCTYPE mydoc [
<!ENTITY % myextent SYSTEM "./myextent.txt">
<!ENTITY myintent "%myextent;"> <!-- WRONG!!! -->
]>
<mydoc>
<atag name="fred" value="&myintent;"/>
</mydoc>

because you can't use a parameter entity reference that way in the
internal subset, but if you put your declarations in an external
parameter entity it will work:

<!DOCTYPE mydoc [
<!ENTITY % mydecls SYSTEM "mydecls">
%mydecls;
]>
<mydoc>
<atag name="fred" value="&myintent;"/>
</mydoc>

with this in mydecls:

<!ENTITY % myextent SYSTEM "./myextent.txt">
<!ENTITY myintent "%myextent;">

-- Richard
 
R

Richard Tobin

Micah Cowan said:
<!ENTITY myextent SYSTEM "./myextent.txt">
&myextent;

That needs to be a parameter entity:

<!ENTITY % myextent SYSTEM "./myextent.txt">
%myextent;

-- Richard
 
D

Douglas Reith

Actually Micah,
I might be doing something wrong but with your example I get this:

Invalid character found in DTD. Error processing resource
'file:///C:/Documents and Settings/user/Desktop/text.xml'. Line 3,
Position 4

&myextent;
---^

Perhaps it's not possible to use entities in the document prolog?

Douglas


Micah Cowan said:
Hi There,
Can someone please tell me why the XML spec states that an attribute
value with an external entity is forbidden? Or point me to the
appropriate document? Or better still, perhaps you know of a work
around?

It is a little frustrating that the normally powerful external
entities are limited in this fashion.

Example (myextent.txt contains just one word without a CR):
-------------------------------------------------
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
]>
<mydoc>
<atag name="fred" value="&myextent;"/>
</mydoc>
-------------------------------------------------

I can't really understand why you would want that. If you want
the value of value to be literally "./myextent.txt", then you
don't need SYSTEM in the <!ENTITY> declaration (and don't want
it). If you really want to be able to specify the value in the
content of an external document, then you could try:

-----(contents of your (modified) example:)-------------

<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
&myextent;
]>
<mydoc>
<atag name="fred" value="&myvalue;"/>
</mydoc>
--------------------------------------------------------

-----(contents of myextent.txt)-------------------------
<!ENTITY myvalue "onewordwithoutCR">
--------------------------------------------------------

Or, you could redesign <atag> (if you have such control) so that
&myextent; goes in the content, rather than an attribute.

If you're thinking, "that's not fair", recall that XML was never
designed to be able to import arbitrary text files directly as
content (after all, what if your one word contained an
ampersand?).

-Micah
 
D

Douglas Reith

But..
I have managed to get this to work:

----------------------------------------------------
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
<!ENTITY myvalue "&myextent;">
]>
<mydoc>
<atag name="fred" value="&myvalue;"/>
</mydoc>
-----------------------------------------------------

myextent.txt just contains the single word.

The parser has been tricked!
Douglas


Micah Cowan said:
Hi There,
Can someone please tell me why the XML spec states that an attribute
value with an external entity is forbidden? Or point me to the
appropriate document? Or better still, perhaps you know of a work
around?

It is a little frustrating that the normally powerful external
entities are limited in this fashion.

Example (myextent.txt contains just one word without a CR):
-------------------------------------------------
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
]>
<mydoc>
<atag name="fred" value="&myextent;"/>
</mydoc>
-------------------------------------------------

I can't really understand why you would want that. If you want
the value of value to be literally "./myextent.txt", then you
don't need SYSTEM in the <!ENTITY> declaration (and don't want
it). If you really want to be able to specify the value in the
content of an external document, then you could try:

-----(contents of your (modified) example:)-------------

<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
&myextent;
]>
<mydoc>
<atag name="fred" value="&myvalue;"/>
</mydoc>
--------------------------------------------------------

-----(contents of myextent.txt)-------------------------
<!ENTITY myvalue "onewordwithoutCR">
--------------------------------------------------------

Or, you could redesign <atag> (if you have such control) so that
&myextent; goes in the content, rather than an attribute.

If you're thinking, "that's not fair", recall that XML was never
designed to be able to import arbitrary text files directly as
content (after all, what if your one word contained an
ampersand?).

-Micah
 
R

Richard Tobin

Douglas Reith said:
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
<!ENTITY myvalue "&myextent;">
]>
<mydoc>
<atag name="fred" value="&myvalue;"/>
</mydoc>
The parser has been tricked!

That shouldn't work. If you switch to a different parser, or newer
version of the one you're using, it will quite likely stop working.

(The reference to myextent is *not* expanded when myvalue is defined,
it is "bypassed", and should cause an error when encountered during
attribute value processing.)

What parser accepted this?

-- Richard
 
B

Bob Foster

Here is a version of the above that does work. The entity must be replaced
by a parameter entity and the whole thing must be moved to an external DTD.
That's all.

--in mydoc.doc
<!DOCTYPE mydoc SYSTEM "mydtd.dtd">
<mydoc>
<atag name="fred" value="&myvalue;"/>
</mydoc>

--in mydtd.dtd
<!ELEMENT mydoc (atag)>
<!ELEMENT atag EMPTY>
<!ATTLIST atag
name CDATA #IMPLIED
value CDATA #IMPLIED<!ENTITY % myextent SYSTEM "myextent.txt">
<!ENTITY myvalue "%myextent;">

--in myextent.txt
foo

The element and attlist declarations were needed to get the document to
validate. Otherwise, it's pretty much the same trick.

In light of this, the original question seems like a good one. What's the
point of disallowing a direct reference to an external entity in an
attribute when the reference can be easily made indirectly?

Bob Foster




Richard Tobin said:
Douglas Reith said:
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
<!ENTITY myvalue "&myextent;">
]>
<mydoc>
<atag name="fred" value="&myvalue;"/>
</mydoc>
The parser has been tricked!

That shouldn't work. If you switch to a different parser, or newer
version of the one you're using, it will quite likely stop working.

(The reference to myextent is *not* expanded when myvalue is defined,
it is "bypassed", and should cause an error when encountered during
attribute value processing.)

What parser accepted this?

-- Richard
 
D

Douglas Reith

Bob, Richard,
Thanks for the assistance.

Just FYI:
The parser that allowed me to 'double' reference was Internet
Explorer, but I was only using it to test the XML validity and it
won't be the final parser. Hmmm, it now seems obvious that using
Internet Explorer as a benchmark was naive to say the least!
However I have a feeling I achieved a similar thing in the Perl
XML::parser module. This would need to be retested.

I have not yet implemented the alternate solutions but will take a
look tomorrrow.

Thanks again,
Doug


Bob Foster said:
In case it's not obvious, a workaround is in my reply to Richard Tobin.

Bob Foster

Douglas Reith said:
Hi There,
Can someone please tell me why the XML spec states that an attribute
value with an external entity is forbidden? Or point me to the
appropriate document? Or better still, perhaps you know of a work
around?

It is a little frustrating that the normally powerful external
entities are limited in this fashion.

Example (myextent.txt contains just one word without a CR):
-------------------------------------------------
<!DOCTYPE mydoc [
<!ENTITY myextent SYSTEM "./myextent.txt">
]>
<mydoc>
<atag name="fred" value="&myextent;"/>
</mydoc>
-------------------------------------------------

Error:
-------------------------------------------------
Cannot reference an external general parsed entity 'myextent' in an
attribute value. Error processing resource...
Line 6, Position 44
<atag name="fred" value="&myextent;"/>
 
M

Micah Cowan

That needs to be a parameter entity:

<!ENTITY % myextent SYSTEM "./myextent.txt">
%myextent;

No it doesn't, but I agree that it might be slightly better.

-Micah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,135
Latest member
VeronaShap
Top