[XSLT] Absolute URI of an unparsed entity and catalog

V

Vincent Lefevre

I would like to know if the base URI considered to resolve an unparsed
entity defined by a relative URI should be the URI before or after its
rewriting due to a possible catalog.

Let's take an example. Here's my XML file:

<?xml version="1.0"?>
<!DOCTYPE para
PUBLIC "-//Norman Walsh//DTD Website Full V2.4.0//EN"
"http://docbook.sourceforge.net/release/website/2.4.0/website-full.dtd"
[
<!ENTITY % entities SYSTEM "http://www.vinc17.org/www.ent">
%entities;
]>
<para><olink targetdocent="local.index.en">test</olink></para>

and my XSLT file:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="olink">
<a href="{unparsed-entity-uri(@targetdocent)}">
<xsl:apply-templates/>
</a>
</xsl:template>
</xsl:stylesheet>

http://www.vinc17.org/www.ent is a file in which I define unparsed
entities that are relative to http://www.vinc17.org/. For instance:

<!ENTITY local.index.en SYSTEM "index.en.html" NDATA XML>

As I don't want to connect to http://www.vinc17.org/ to generate the
URI, I use a catalog with the following entry:

<rewriteSystem systemIdStartString="http://www.vinc17.org/www.ent"
rewritePrefix="file:///home/lefevre/wd/www-new/www.ent"/>

(in fact, http://www.vinc17.org/www.ent doesn't even exist in the
reality, however the XSLT processor doesn't have to know that). But
then, xsltproc generates the following file:

<?xml version="1.0"?>
<a href="file:///home/lefevre/wd/www-new/index.en.html">test</a>

instead of:

<?xml version="1.0"?>
<a href="http://www.vinc17.org/index.en.html">test</a>

Is that correct? I would have said that since the XSLT specifications
don't define the notion of catalog, a catalog should be regarded only
as a cacheing system (i.e. transparent for XML generation by XSLT); in
this case, I should have got the version with http://www.vinc17.org/.

Otherwise, I would have been interested in a different version of the
unparsed-entity-uri function that would have yielded a relative URI. If
I define all the URIs and filenames with relative names, then xsltproc
does generate a relative URI, but this URI is relative to the current
directory and not the document defining the entity; therefore this is
not acceptable.
 
B

Bob Foster

I believe the answer to all such questions should be before. The catalog
should not "show through" to the infoset in any way.

Without any knowledge of how the catalog is implemented, I would guess the
problem is in the entity resolver. It is probably returning the system id of
the actual location it fetched the resource from rather than, as it should,
the "virtual" system id it was handed.

Bob Foster

Vincent Lefevre said:
I would like to know if the base URI considered to resolve an unparsed
entity defined by a relative URI should be the URI before or after its
rewriting due to a possible catalog.

Let's take an example. Here's my XML file:

<?xml version="1.0"?>
<!DOCTYPE para
PUBLIC "-//Norman Walsh//DTD Website Full V2.4.0//EN"
"http://docbook.sourceforge.net/release/website/2.4.0/website-full.dtd"
[
<!ENTITY % entities SYSTEM "http://www.vinc17.org/www.ent">
%entities;
]>
<para><olink targetdocent="local.index.en">test</olink></para>

and my XSLT file:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="olink">
<a href="{unparsed-entity-uri(@targetdocent)}">
<xsl:apply-templates/>
</a>
</xsl:template>
</xsl:stylesheet>

http://www.vinc17.org/www.ent is a file in which I define unparsed
entities that are relative to http://www.vinc17.org/. For instance:

<!ENTITY local.index.en SYSTEM "index.en.html" NDATA XML>

As I don't want to connect to http://www.vinc17.org/ to generate the
URI, I use a catalog with the following entry:

<rewriteSystem systemIdStartString="http://www.vinc17.org/www.ent"
rewritePrefix="file:///home/lefevre/wd/www-new/www.ent"/>

(in fact, http://www.vinc17.org/www.ent doesn't even exist in the
reality, however the XSLT processor doesn't have to know that). But
then, xsltproc generates the following file:

<?xml version="1.0"?>
<a href="file:///home/lefevre/wd/www-new/index.en.html">test</a>

instead of:

<?xml version="1.0"?>
<a href="http://www.vinc17.org/index.en.html">test</a>

Is that correct? I would have said that since the XSLT specifications
don't define the notion of catalog, a catalog should be regarded only
as a cacheing system (i.e. transparent for XML generation by XSLT); in
this case, I should have got the version with http://www.vinc17.org/.

Otherwise, I would have been interested in a different version of the
unparsed-entity-uri function that would have yielded a relative URI. If
I define all the URIs and filenames with relative names, then xsltproc
does generate a relative URI, but this URI is relative to the current
directory and not the document defining the entity; therefore this is
not acceptable.

--
Vincent Lefèvre <[email protected]> - Web: <http://www.vinc17.org/> - 100%
validated (X)HTML - Acorn Risc PC, Yellow Pig 17, Championnat International
des Jeux Mathématiques et Logiques, TETRHEX, etc.
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
 
P

Philippe Poulard

3.3 Unparsed Entities

The root node has a mapping that gives the URI for each unparsed entity
declared in the document's DTD. The URI is generated from the system
identifier and public identifier specified in the entity declaration.
The XSLT processor may use the public identifier to generate a URI for
the entity instead of the URI specified in the system identifier. If the
XSLT processor does not use the public identifier to generate the URI,
it must use the system identifier; if the system identifier is a
relative URI, it must be resolved into an absolute URI using the URI of
the resource containing the entity declaration as the base URI [RFC2396].

Vincent said:
I would like to know if the base URI considered to resolve an unparsed
entity defined by a relative URI should be the URI before or after its
rewriting due to a possible catalog.

Let's take an example. Here's my XML file:

<?xml version="1.0"?>
<!DOCTYPE para
PUBLIC "-//Norman Walsh//DTD Website Full V2.4.0//EN"
"http://docbook.sourceforge.net/release/website/2.4.0/website-full.dtd"
[
<!ENTITY % entities SYSTEM "http://www.vinc17.org/www.ent">
%entities;
]>
<para><olink targetdocent="local.index.en">test</olink></para>

and my XSLT file:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="olink">
<a href="{unparsed-entity-uri(@targetdocent)}">
<xsl:apply-templates/>
</a>
</xsl:template>
</xsl:stylesheet>

http://www.vinc17.org/www.ent is a file in which I define unparsed
entities that are relative to http://www.vinc17.org/. For instance:

<!ENTITY local.index.en SYSTEM "index.en.html" NDATA XML>

As I don't want to connect to http://www.vinc17.org/ to generate the
URI, I use a catalog with the following entry:

<rewriteSystem systemIdStartString="http://www.vinc17.org/www.ent"
rewritePrefix="file:///home/lefevre/wd/www-new/www.ent"/>

(in fact, http://www.vinc17.org/www.ent doesn't even exist in the
reality, however the XSLT processor doesn't have to know that). But
then, xsltproc generates the following file:

<?xml version="1.0"?>
<a href="file:///home/lefevre/wd/www-new/index.en.html">test</a>

instead of:

<?xml version="1.0"?>
<a href="http://www.vinc17.org/index.en.html">test</a>

Is that correct? I would have said that since the XSLT specifications
don't define the notion of catalog, a catalog should be regarded only
as a cacheing system (i.e. transparent for XML generation by XSLT); in
this case, I should have got the version with http://www.vinc17.org/.

Otherwise, I would have been interested in a different version of the
unparsed-entity-uri function that would have yielded a relative URI. If
I define all the URIs and filenames with relative names, then xsltproc
does generate a relative URI, but this URI is relative to the current
directory and not the document defining the entity; therefore this is
not acceptable.
--
Cordialement,

///
(. .)
-----ooO--(_)--Ooo-----
| Philippe Poulard |
-----------------------
 
V

Vincent Lefevre

Philippe Poulard said:
3.3 Unparsed Entities
The root node has a mapping that gives the URI for each unparsed entity
declared in the document's DTD. The URI is generated from the system
identifier and public identifier specified in the entity declaration.
The XSLT processor may use the public identifier to generate a URI for
the entity instead of the URI specified in the system identifier. If the
XSLT processor does not use the public identifier to generate the URI,
it must use the system identifier; if the system identifier is a
relative URI, it must be resolved into an absolute URI using the URI of
the resource containing the entity declaration as the base URI [RFC2396].

I know how to read the specs. :) But what if catalogs are used?
My point was that this paragraph doesn't mention catalogs; thus, the
XSLT processor should behave as if there were no catalogs (catalogs
are just a transparent way of cacheing resources). In this case, this
would mean that there is a bug in xsltproc. Before reporting a bug,
I'd like to know whether my interpretation is correct or not.
 
B

Bob Foster

Vincent Lefevre said:
Before reporting a bug,
I'd like to know whether my interpretation is correct or not.

Nobody is going to be able to give you chapter and verse on this (as the
recent attempt illustrates). You can use "if the system identifier is a
relative URI, it must be resolved into an absolute URI using the URI of the
resource containing the entity declaration as the base URI" to support
either side of this question. However, your point of view makes sense and
the behavior you report is very counter-intuitive, so if I were you I'd file
a bug report.

If the response comes back, as I would expect, "We can't do anything because
the catalog is reporting the wrong URI," then file a bug against the
catalog. Or fix it yourself and contribute the fix.

Bob Foster
 
V

Vincent Lefevre

Bob Foster said:
Nobody is going to be able to give you chapter and verse on this (as
the recent attempt illustrates). You can use "if the system
identifier is a relative URI, it must be resolved into an absolute
URI using the URI of the resource containing the entity declaration
as the base URI" to support either side of this question. However,
your point of view makes sense and the behavior you report is very
counter-intuitive, so if I were you I'd file a bug report.

OK, done: http://bugzilla.gnome.org/show_bug.cgi?id=122001

Thanks,
 
P

Philippe Poulard

Vincent said:
3.3 Unparsed Entities

The root node has a mapping that gives the URI for each unparsed entity
declared in the document's DTD. The URI is generated from the system
identifier and public identifier specified in the entity declaration.
The XSLT processor may use the public identifier to generate a URI for
the entity instead of the URI specified in the system identifier. If the
XSLT processor does not use the public identifier to generate the URI,
it must use the system identifier; if the system identifier is a
relative URI, it must be resolved into an absolute URI using the URI of
the resource containing the entity declaration as the base URI [RFC2396].


I know how to read the specs. :) But what if catalogs are used?
My point was that this paragraph doesn't mention catalogs; thus, the
XSLT processor should behave as if there were no catalogs (catalogs
are just a transparent way of cacheing resources). In this case, this
would mean that there is a bug in xsltproc. Before reporting a bug,
I'd like to know whether my interpretation is correct or not.

Well, in my opinion, catalogs are just a convenient way to resolve
entities, so the rules should be the same with or without catalogs, that
is to say that as the caching relies on a local file system (in your
case) the behaviour describe on the specs will be applied.

However, [in java] it is possible to an external resource to endorse a
specific BASE URI (case of XSLT) or SYSTEM ID or PUBLIC ID (case of XML)
javax.xml.transform.Source#setSystemId()
org.xml.sax.InputSource#setPublicId()
org.xml.sax.InputSource#setSystemId()

I don't really know if catalogs can do that.
See the specs, as you know how to read them :)
--
Cordialement,

///
(. .)
-----ooO--(_)--Ooo-----
| Philippe Poulard |
-----------------------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top