SYSTEM vs. PUBLIC

R

Razvan

Hi !




Sometimes ago I posted a question regarding the use of the PUBLIC vs
SYSTEM declaration in an XML file. After reading the initial answers
and after gaining a little more experience with XML I came up with the
following conclusions:




SYSTEM declaration can be used to specify a file on the local file
system like:

<!DOCTYPE RootElement SYSTEM "C:\validate.dtd">

The problem with this approach is that if the file is made public the
path specified on the local file system will not have any meaning any
more. Even if the path specified in the SYSTEM declaration *is* a URL:

<!DOCTYPE RootElement SYSTEM "http://www.mihaiu.name/validate.dtd">

the parser might be unable to retrieve the DTD file if the system is
not connected to the Internet.

The PUBLIC declaration constitutes a partial solution to this problem.
The string contained in a PUBLIC declaration is not an URL but an URN
(Uniform Resource Name). A URN does not pinpoint the precise location
of the resource, but only clearly specify its name. The *parser* of the
document must be smart enough to be able to generate a URL from a URN
using some internal logic.

Example of a PUBLIC declaration:

<!DOCTYPE RootElement PUBLIC "mihaiu/validate.dtd"
SYSTEM "http://www.mihaiu.name/validate.dtd">

In this case, a custom parser that already has a catalogue of DTDs
published by mihaiu can generate a URL from the PUBLIC declaration. The
generated URL can look like

c:\DTDs\validate.dtd

There is no standard way to convert a URN to a URL, so, if this
conversion fails because the parser does not contain the internal logic
to perform such a conversion (or for whatever other reason) the parser
will attempt to use the SYSTEM declaration which in this case resolves
to

http://www.mihaiu.name/validate.dtd

Important observation:
Since there is no standard way to generate a URL from a URN the PUBLIC
declarations can only be useful for customized parsers !!! (e.g. they
are not useful for general purpose parsers like Xerces)

DID I GOT IT RIGHT ?




Regards,
Razvan

www.mihaiu.name
 
J

Johannes Koch

Razvan said:
Hi !




Sometimes ago I posted a question regarding the use of the PUBLIC vs
SYSTEM declaration in an XML file. After reading the initial answers
and after gaining a little more experience with XML I came up with the
following conclusions:






SYSTEM declaration can be used to specify a file on the local file
system like:

<!DOCTYPE RootElement SYSTEM "C:\validate.dtd">

As far as I know, the system identifier must be a URL (or URI), so it
must be something like "file://C|/validate.dtd".
The problem with this approach is that if the file is made public the
path specified on the local file system will not have any meaning any
more.
Right.

Even if the path specified in the SYSTEM declaration *is* a

.... HTTP ...
URL:

<!DOCTYPE RootElement SYSTEM "http://www.mihaiu.name/validate.dtd">

the parser might be unable to retrieve the DTD file if the system is
not connected to the Internet.
Yes.

The PUBLIC declaration constitutes a partial solution to this problem.
The string contained in a PUBLIC declaration is not an URL but an URN
(Uniform Resource Name).

No, it's not a URN, it's an FPI (formal public identifier), although it
acts somehow like a URN. To map FPIs to system identifiers, the parser
may use a catalog (look for XML Catalog or OASIS Open Catalog).
DID I GOT IT RIGHT ?

Not really.
 
D

David Carlisle

As far as I know, the system identifier must be a URL (or URI), so it
must be something like "file://C|/validate.dtd".

It's correct that it has to be a URI, although the mapping (usually)
suggested between URIs and windows paths is
file:///c/validate.dtd
rather than the form with | (which was used by some early netscapes)


No, it's not a URN, it's an FPI (formal public identifier), although it
acts somehow like a URN. To map FPIs to system identifiers, the parser
may use a catalog (look for XML Catalog or OASIS Open Catalog).

Although most systems (HTML, Docbook, ...) that use a PUBLIC ID do use
FPI syntax, that is a refection of the SGML ancestory of these systems.

XML just defines this to be a more or less arbitrary string (of ASCII
printing characters), and the XML (unlike SGML) has no way of specifying
that the PUBLIC identifier has any specific syntax, it just enforces a
set of characters:

[13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

David
 
R

Razvan

Please tell me what the difference between URN and FIPS is. Can you
pinpoint me to a web site where I can find out more about FIPS and
especially about the subtle differences between URN and FIPS?

By the way: FIPS are URIs ?


How about the rest ? Ok: it's not URN, it's FIPS. Just replace URN
with FIPS. The rest of my essay is correct ?



Regards,
Razvan
 
J

Johannes Koch

Razvan said:
Please tell me what the difference between URN and FIPS is.

FPIs (= Formal Public Identifiers), not FIPS.
Can you
pinpoint me to a web site where I can find out more about FIPS and
especially about the subtle differences between URN and FIPS?

Use the search engine of your choice. E.g. google search for "formal
public identifier" returns said:
By the way: FIPS are URIs ?

No. Read RFC 2396 for the definition of URI.
How about the rest ? Ok: it's not URN, it's FIPS. Just replace URN
with FIPS. The rest of my essay is correct ?

No. Read about FPI, URN, the difference between windows file path and
URL, catalogs, entity resolver, ...
 
A

Arjun Ray

SYSTEM declaration can be used to specify a file on the local file
system

[Note: _identifier_, not _declaration_]

Yes, in the sense of "original intent" going back to SGML, and no, because
XML has repurposed SYSTEM into a slot for URIs.
The PUBLIC declaration constitutes a partial solution to this problem.

Arguably, this may be what the XML spec intends.
The string contained in a PUBLIC declaration is not an URL but an URN
(Uniform Resource Name).

Not really. In XML, a PUBLIC identifier is what SGML calls a _minimum
literal_, an arbitrary string with characters limited to a certain small
set.

Now, it so happens that PUBLIC identifiers in *SGML* (not XML!) are very
much like URNs, inasmuch as they share the properties of persistence and
uniqueness. XML usage has inherited this understanding in some quarters,
but there is no formal basis for this in the XML spec.
DID I GOT IT RIGHT ?

Only in part. You are trying to make sense of something that has been
screwed up beyond any reasonable hope of recovery. The critical blunder
was the repurposing of SYSTEM identifiers for URIs. Unfortunately, URIs
don't fit into the PUBLIC/SYSTEM dichotomy of SGML at all, but the
catchphrase, "Cool URIs don't change", is a good indication that URIs are
really more *useful* in their PUBLIC than in their SYSTEM aspect.

Note that the XML spec doesn't *define* the PUBLIC and SYSTEM keywords,
i.e. explain what they mean and thus why they're different (and both there
to begin with). One could fill this gap from SGML, but again, there is no
formal basis for this in the XML spec.

In SGML, PUBLIC means "well-known, widely understood/accepted" while
SYSTEM means "private, local, custom, homegrown, proprietary". For those
who have Goldfarb's SGML Handbook, there is an extensive, and IMHO clear,
exposition of this on p.378-9. He writes, inter alia,

: A _public identifier_ is a name that is intended to be meaningful across
: systems and different user environments.

: A _system identifier_ is system-specific information that enables the
: entity manager component of an SGMl system to locate the file or the
: memory location or the pointer within a file where the entity can be
: found [...] a system identifier could be an invocation of a program that
: controls access to an entity that is being identified.

: The system identifier itself need not be the full storage identifier; it
: is just a method of expressing information that the entity manager can
: use to determine the storage identifier [...] In that regard, it would
: be very sensible for an implementation to devise a defaulting scheme in
: which the storage identifier could be determined from the entity name
: alone. SGML encourages this by providing syntactically that the keyword
: SYSTEM can be specified for an external identifier without actually
: specifying a system identifier at all.

In other words, you have complete freedom to decide what your SYSTEM
identifiers "mean". And, in general, system identifiers should *not* be
used in documents meant for exchange, because there is no expectation that
anyone except you could make sense of them. At a pinch, you could leave
just the SYSTEM keyword in, hoping that the other guy has as sensible
system of defaults in his catalogs as you.

Thus, XML goofed in two ways to take this useful functionality (how you
organize your own local system) away from you - by mandating the presence
of the system identifier, and further constraining its form to URIs.

Take it from there.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top