Updating DTD to agree with its use in doc's

Discussion in 'XML' started by christopher.c.brewster@lmco.com, Jan 25, 2005.

  1. Guest

    A few years ago my department defined a DTD for a projected class of
    documents. Like the US Constitution, this DTD has details that are
    never actually used, so I want to clean it up. Is there any tool that
    looks at existing documents and compares with the DTD they use?

    [I can think of other possible uses for such a tool, so I thought
    someone might have invented it. I have XML Spy but do not see a feature
    that would do this.]

    Christopher Brewster
     
    , Jan 25, 2005
    #1
    1. Advertising

  2. wrote:

    > A few years ago my department defined a DTD for a projected class of
    > documents. Like the US Constitution, this DTD has details that are
    > never actually used, so I want to clean it up. Is there any tool that
    > looks at existing documents and compares with the DTD they use?


    I have written a tool that reads an XML file
    and produces a DTD. The DTD covers only those
    parts that are actually used in the original
    XML file.

    http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Generating-a-DTD-from-a-sample-file

    It should not be too hard to change the script
    so that it reads an arbitrary number of example
    files and cumulates knowledge about their structure,
    finally producing a DTD that covers all files.

    If you don't find another tool and you really
    need such a tool, I could write the script for
    you. But you should be aware that the language
    which is used (XMLgawk) is currently only in an
    experimental state.
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Jan 25, 2005
    #2
    1. Advertising

  3. Guest

    Juergen --

    A script to do this would be amazing, if you're interested in doing it.
    Here is a further question: I followed the link from the gawk page to
    Saxon's site, which led me to a front-end for the program at HiT
    Software:

    http://www.hitsw.com/xml_utilites/

    This utility does not work, however, for a reason that seems to
    contradict what it's for: it wants to open the file's DTD! One would
    think that this utility, of all utilties, would not need the DTD. It
    also wants to pull in all the external entities, but again this seems
    pointless for the utility's purpose. Any idea how to get around this?
    Thanks for your information.

    Chris Brewster
     
    , Jan 25, 2005
    #3
  4. Guest

    OK, I got this working by omitting the reference to the DTD, deleting
    entity references, and deleting strings such as &text. But maybe this
    utility should ignore these things. Thanks very much for the
    information.

    Other utilities that would help (which I might make my own versions
    of): printing DTDs in structured formats for analysis (such as in table
    form), and ways to compare and/or combine related DTDs.
    Thanks again...

    Chris Brewster
     
    , Jan 25, 2005
    #4
  5. wrote:

    > A script to do this would be amazing, if you're interested in doing it.


    I just had a look at the DTD generator script again.
    It looks like the script already does what you want.
    On my RedHat Linux for example, I did this to generate
    a DTD which covers all the files whose names are passed
    on the command line:

    gawk -f dtd_generator.awk /usr/share/doc/libxml2-devel-2.6.10/examples/test*.xml

    <!ELEMENT doc ( dest | src | parent )* >
    <!ELEMENT dest ( #PCDATA ) >
    <!ATTLIST dest id CDATA #REQUIRED>
    <!ELEMENT src ( #PCDATA ) >
    <!ATTLIST src ref CDATA #REQUIRED>
    <!ELEMENT parent ( discarded | preserved )* >
    <!ELEMENT discarded ( discarded )* >
    <!ELEMENT preserved ( child2 | preserved | child1 )* >
    <!ELEMENT child2 ( #PCDATA ) >
    <!ELEMENT child1 ( #PCDATA ) >

    I guess that's what you wanted.
    Such a DTD is far from perfect of course.
    You should take it as a starting point, rearrange
    the sequence of lines and insert comments from your
    original (much larger) DTD.
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Jan 26, 2005
    #5
  6. Peter Flynn Guest

    wrote:

    > Juergen --
    >
    > A script to do this would be amazing, if you're interested in doing it.


    I did this as part of a migration from TEI SGML to XML. Basically:

    a) run nsgmls over the documents and produce ESIS
    b) use awk to extract the element type names
    c) sort and uniq them
    d) use Perl::SGML to read the DTD and list the element type names
    e) sort them
    f) caseless join the two lists with -a to spit out the non-matches

    If you're not using a Unix-based system, I think Cygwin can run these tools.

    ///Peter
    --
    "The cat in the box is both a wave and a particle"
    -- Terry Pratchett, introducing quantum physics in _The Authentic Cat_
     
    Peter Flynn, Jan 26, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt
    Replies:
    3
    Views:
    549
    Tor Iver Wilhelmsen
    Sep 17, 2004
  2. C B
    Replies:
    4
    Views:
    4,661
    Chris Uppal
    Nov 10, 2004
  3. thunk
    Replies:
    1
    Views:
    359
    thunk
    Mar 30, 2010
  4. thunk
    Replies:
    0
    Views:
    536
    thunk
    Apr 1, 2010
  5. thunk
    Replies:
    14
    Views:
    659
    thunk
    Apr 3, 2010
Loading...

Share This Page