RFC: A Distributed Universal SGML/XML Catalogue Management System

Discussion in 'XML' started by Nick Kew, Sep 10, 2003.

  1. Nick Kew

    Nick Kew Guest


    Many applications today benefit from an SGML and/or XML Entity Catalogue
    to dereference entities referenced by a Public Identifier. For a
    validating SGML parser this is an absolute requirement. For any
    SGML or XML parser it serves to enable entities such as DTDs and
    modules to be resolved locally.

    Hitherto, different packages and applications have distributed entity
    catalogues. Examples are Docbook, HTML Validators, the OpenSP parser,
    and operating system distros. However, there is little coordination
    between the distributors of these, and no common package distributors
    can rely on. Even in tightly-controlled environments such as the
    Debian packages, the W3C Validator includes its own Entity
    Catalogue rather than relying on it being available as a dependency.

    This situation should be rationalised to allow for an SGML and XML
    catalogue to be a single package on which other packages can depend.
    In this note, we propose a framework for managing such a package.

    * To maintain a Universal Catalogue
    * To provide an automated process for generating local installations of
    all or part of the Universal Catalogue.
    * To minimise the effort and coordination required to ensure that the
    universal catalogues and local installations remain up-to-date.
    In particular, end-users should be offered a self-maintaining default
    installation that eliminates effort on their part altogether.
    * To enable control of different parts of the catalogue to be delegated
    to the people/organisations responsible for them.

    A loose analogy could be drawn to DNS. But since immediate lookup of
    [SG|X]ML entities is dealt with by SYSTEM ids, we only have to deal with
    efficient cacheing of local copies of PUBLIC ids. Entities are in
    general long-lived, but by no means immutable (for example, the MathML 2
    DTD modules have undergone several minor revisions).

    Managing a Universal Catalogue

    In principal, all organisations creating public identifiers should be
    registered with ISO.
    But this is not widely practiced, and the present chaotic situation
    indicates that it is not effectively meeting todays needs. We propose
    that a distributed architecture for automating catalogue management
    is both feasible and preferable.

    #### ISO registry: availability???

    Our proposal envisages a central registry, cooperating with a set of
    recognised repositories each managing its own entity catalogue locally.
    For example, the W3C, WapForum and Oasis each manage their own catalogues
    independently. Likewise, different groups acting independently within
    W3C are responsible for different areas such as HTML, MathML, SVG and
    We propose that a universal catalogue will work best if responsibility
    for each sub-catalogue is explicitly devolved to the working group
    responsible for defining it. The central registry will serve merely
    to reference the reponsible groups, in a manner somewhat analagous to DNS.

    This is broadly in line with the registry already run by the ISO but
    not widely used. What our proposal adds is the availability of the
    registry online in machine-readable format, and its integration with
    catalogues maintained by each participating organisation. It is
    possible that tying the registry in to distribution of Markup libraries
    and catalogues may in itself be an incentive for organisations to

    #### Implications for naming conventions?


    Since the Universal Catalogue serves SGML and XML applications, it is
    appropriate that it should itself be capable of implementation as an
    SGML or XML application. This is straightforward: all we need is a
    DTD for declaring catalogues and catalogue entries, and a list of
    entities defining catalogues maintained by the groups entrusted with
    doing so. This is then implemented by a program to fetch the data
    required and write the catalogues. Local installations may be
    customised by selecting which entities to include, while package
    maintainers can ship a standard configuration.

    An implementation demonstrating the above is available at
    <URL:http://valet.webthing.com/catalogue/>. It fetches the master
    catalogue, DTD and Entities by HTTP. It updates all entries defined,
    but uses HTTP If-Modified-Since header to avoid the overhead of re-
    fetching anything that is already up-to-date in the local installation.
    It can therefore be run regularly (e.g. monthly) with minimal overhead.

    CatalogueManager may be used as-is, but is intended as a proof-of-concept.
    Non-technical issues such as how to delegate responsibility for different
    sub-catalogues need to be addressed, and the file format used for
    the demonstrator is likely to be subject to improvement.


    A package such as CatalogueManager that updates system files based on
    third-party definitions has potential to introduce malicious files.
    It is strongly recommended that standard system security be used to
    avoid serious consequences in the event of any of the sub-catalogues
    being compromised. CatalogueManager should run as a user with no
    privilege to write to the local filesystem except within a designated
    SGML/XML library area, such as /usr/local/share/sgmlib.
    Distributors creating a package such as an RPM of CatalogueManager
    should ensure your users' security.

    A more inherently secure architecture would generate all local filenames
    internally, and is probably preferable. The current implementation serves
    for back-compatibility until the proposal can be considered stable.

    Nick Kew

    In urgent need of paying work - see http://www.webthing.com/~nick/cv.html
    Nick Kew, Sep 10, 2003
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jim Royal
    Jim Royal
    Jan 14, 2005
  2. Usman
    Morus Walter
    Jul 30, 2003
  3. Clifford W. Racz
    Clifford W. Racz
    Feb 13, 2004
  4. melledge
    May 3, 2006
  5. Rodney Dangerfield

    Video Catalogue

    Rodney Dangerfield, Nov 30, 2004, in forum: Python
    Harlin Seritt
    Dec 2, 2004
  6. Richard Harter

    RFC on a storage management utility package

    Richard Harter, Nov 8, 2006, in forum: C Programming
    Mark McIntyre
    Nov 9, 2006
  7. mark | r

    req: prebuilt product catalogue system

    mark | r, Jul 9, 2003, in forum: ASP General
    mark | r
    Jul 9, 2003
  8. Ivan Shmakov
    Kari Hurtta
    Feb 13, 2012