Implementing a DTD-based XML validator

Discussion in 'XML' started by Tom Anderson, May 29, 2009.

  1. Tom Anderson

    Tom Anderson Guest

    Afternoon all,

    Call me mad, but i'm interested in writing an XML validator. Not as part
    of a parser, but operating on DOM-like objects in a program. Basically, i
    want to write a function createElement that looks a bit like:

    Node a, b, c; // create these somehow
    Element list = createElement("xhtml:p", new Node[] {a, b, c});

    Where createElement is able to determine whether {a, b, c} is a valid
    sequence of child elements for an xhtml:p element, and so throw an
    exception of something if it isn't.

    The idea would be to parse a DTD in order to create objects representing
    the content model, then use those to validate the nodes.

    The XML spec says:

    More formally: a finite state automaton may be constructed from the
    content model using the standard algorithms, e.g. algorithm 3.5 in
    section 3.9 of Aho, Sethi, and Ullman [Aho/Ullman]. In many such
    algorithms, a follow set is constructed for each position in the regular
    expression (i.e., each leaf node in the syntax tree for the regular
    expression); if any position has a follow set in which more than one
    following position is labeled with the same element type name, then the
    content model is in error and maybe reported as an error.

    Firstly, roughly how hard is this? Expressed in, say,
    milli-Dijkstra's-algorithms - 5000? 20 000? 100 000?

    Secondly, i'm not keen to rush out and buy Aho et al's no doubt wonderful
    book on compilers just so i can do this. Can anyone direct me to anything
    i can read online where i can learn about this? That could be in English
    or source code - presumably, there are numerous open-source projects which
    have implemented XML validators, right?

    It occurs to me that i could avoid having to write the validator myself by
    using a grotesque hack - if i can map node types to strings, i can express
    a node sequence as a string, and a content model as a regular expression,
    and then just let a standard regexp library do the heavy lifting. In
    python, operating on standard DOM objects:

    def validateAsParagraph(nodelist):
    nodeString = "".join(map(lambda node: "<" + node.nodeName + ">", nodelist))
    pPattern = re.compile("(?:<(?:#PCDATA|br|span|bdo|map|tt|i|b|big|small|em|strong|dfn|code|q|samp|kbd|var|cite|abbr|acronym|sub|sup|input|select|textarea|label|button|ins|del|script)>)*")
    m = pPattern.match(nodeString)
    return (m != None) and (m.end() == len(nodeString))

    I can't decide if this is brilliant or revolting, or both.

    tom

    --
    Many of us adopted the File's slang as our own, feeling that we'd found a
    tangible sign of the community of minds we'd half-guessed to be out there.
    Tom Anderson, May 29, 2009
    #1
    1. Advertising

  2. Fri, 29 May 2009 13:38:08 +0100, /Tom Anderson/:

    > Call me mad, but i'm interested in writing an XML validator. Not as part
    > of a parser, but operating on DOM-like objects in a program.


    JAXP 1.3 provides validation API which is implemented [1] by Xerces2
    and which could operate on already parsed and built DOM.

    > Can anyone direct me
    > to anything i can read online where i can learn about this? That could
    > be in English or source code - presumably, there are numerous
    > open-source projects which have implemented XML validators, right?


    You could read the Xerces2 Implementation API documentation [2] -
    packages like org.apache.xerces.impl.dtd.models and
    org.apache.xerces.impl.xs.models. You could browse the sources [3]
    as well.

    [1]
    http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/package-summary.html
    [2] http://xerces.apache.org/xerces2-j/javadocs/xerces2/index.html
    [3] http://xerces.apache.org/xerces2-j/source-repository.html

    --
    Stanimir
    Stanimir Stamenkov, Jun 3, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joseph Tilian
    Replies:
    0
    Views:
    348
    Joseph Tilian
    Dec 21, 2004
  2. Ronald Fischer
    Replies:
    4
    Views:
    1,752
    Ronald Fischer
    Mar 17, 2005
  3. Manuel Collado
    Replies:
    0
    Views:
    548
    Manuel Collado
    Oct 7, 2003
  4. Ale Vesely
    Replies:
    0
    Views:
    568
    Ale Vesely
    Mar 21, 2005
  5. test
    Replies:
    2
    Views:
    2,015
    Oliver Wong
    Jul 28, 2006
Loading...

Share This Page