Help with xml.parsers.expat please?

Discussion in 'Python' started by Will Stuyvesant, Jul 4, 2003.

  1. There seems to be no XML parser that can do validation in
    the Python Standard Libraries. And I am stuck with Python
    2.1.1. until my web master upgrades (I use Python for
    CGI). I know pyXML has validating parsers, but I can not
    compile things on the (unix) webserver. And even if I
    could, the compiler I have access to would be different
    than what was used to compile python for CGI.

    I need to write a CGI script that does XML validation (and
    then later also does other things). It does not have to
    be complete standards compliant validation but at least it
    should check if elements are declared and allowed in
    special places in the XML tree.

    I tried to understand SAX and DOM but I gave up, and
    effbot advises to avoid them anyway. So I am studying
    xml.parsers.expat now, but I am stuck.

    The program below *does* print information about DOCTYPE
    declarations but nothing about the element definitions in
    the DTD. I feed it an XML file with a DOCTYPE declaration
    like <!DOCTYPE ROOTTAG SYSTEM "MYDTD.DTD"> and the DTD is
    in the same directory. I also tried inputting the DTD
    itself to this program but that doesn't work either
    (ExpatError: syntaxerror at the first element definition).

    Please help if you can.




    # file: minimal_validate.py
    #
    import xml.parsers.expat

    def element_decl_handler(name, model):
    print 'ELEMENT definition: ', name, ' model: ', model

    def doctype_decl_handler(doctypeName, systemId, publicId, has_internal_subset):
    print 'DOCTYPE declaration: '
    print ' doctypeName: ', doctypeName
    print ' systemId: ', systemId
    print ' publicId:', publicId
    print ' internal subset:', has_internal_subset

    p = xml.parsers.expat.ParserCreate()

    p.ElementDeclHandler = element_decl_handler
    p.StartDoctypeDeclHandler = doctype_decl_handler

    import sys
    input = file(sys.argv[1]).read()
    p.Parse(input)
     
    Will Stuyvesant, Jul 4, 2003
    #1
    1. Advertising

  2. Will Stuyvesant

    Alan Kennedy Guest

    Will Stuyvesant wrote:

    > There seems to be no XML parser that can do validation in
    > the Python Standard Libraries. And I am stuck with Python
    > 2.1.1. until my web master upgrades (I use Python for
    > CGI). I know pyXML has validating parsers, but I can not
    > compile things on the (unix) webserver. And even if I
    > could, the compiler I have access to would be different
    > than what was used to compile python for CGI.


    So it didn't work out with xmlproc? Isn't xmlproc a pure python
    parser that you should be able to drop in and run without
    compiling anything?

    > I need to write a CGI script that does XML validation (and
    > then later also does other things). It does not have to
    > be complete standards compliant validation but at least it
    > should check if elements are declared and allowed in
    > special places in the XML tree.


    I think you would be much more likely to get constructive help
    if you posted some examples of the tree structures and data
    that you're processing.

    > I tried to understand SAX and DOM but I gave up, and
    > effbot advises to avoid them anyway. So I am studying
    > xml.parsers.expat now, but I am stuck.


    SAX and DOM aren't solutions, they're tools. They are simply
    different ways to accessing the contents of an XML document.
    They may or may not be suitable for your problem, depending
    on a wide variety of considerations.

    I think the problem needs to be clearly defined before an
    appropriate solution can be reached.

    > The program below *does* print information about DOCTYPE
    > declarations but nothing about the element definitions in
    > the DTD. I feed it an XML file with a DOCTYPE declaration
    > like <!DOCTYPE ROOTTAG SYSTEM "MYDTD.DTD"> and the DTD is
    > in the same directory. I also tried inputting the DTD
    > itself to this program but that doesn't work either
    > (ExpatError: syntaxerror at the first element definition).
    >
    > Please help if you can.
    >
    > # file: minimal_validate.py
    > #
    > import xml.parsers.expat
    >
    > def element_decl_handler(name, model):
    > print 'ELEMENT definition: ', name, ' model: ', model
    >
    > def doctype_decl_handler(doctypeName, systemId, publicId, has_internal_subset):
    > print 'DOCTYPE declaration: '
    > print ' doctypeName: ', doctypeName
    > print ' systemId: ', systemId
    > print ' publicId:', publicId
    > print ' internal subset:', has_internal_subset
    >
    > p = xml.parsers.expat.ParserCreate()
    >
    > p.ElementDeclHandler = element_decl_handler
    > p.StartDoctypeDeclHandler = doctype_decl_handler
    >
    > import sys
    > input = file(sys.argv[1]).read()
    > p.Parse(input)


    I think you need to do some reading on what SAX does. In summary, it
    gives you the pieces of an XML document, in a series of function
    callbacks. You've got to do something with the pieces that you're
    given.
    SAX won't solve your problem any more than anything else unless you
    know what pieces you are receiving, and are doing something with them.

    One memory efficient way of building up a document in memory is to
    create a python object to represent every element, and with each
    "element object" being a (python) attribute of its parent. It's a lot
    easier than it sounds, and can be read about here

    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/149368

    And you can read about SAX in general here

    http://www.devarticles.com/art/1/383/2
    http://www-106.ibm.com/developerworks/xml/library/x-tipsaxflex.html

    The latter is a good example from Uche Ogbuji about extracting pieces
    of a document from a SAX stream, which might be easily adaptable to
    your
    problem.

    But I still think you'd be better to describe the problem as simply as
    you can here, rather than fumbling around.

    --
    alan kennedy
    -----------------------------------------------------
    check http headers here: http://xhaus.com/headers
    email alan: http://xhaus.com/mailto/alan
     
    Alan Kennedy, Jul 4, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas Guettler

    xml.parsers.expat vs. xml.sax

    Thomas Guettler, Apr 27, 2004, in forum: Python
    Replies:
    2
    Views:
    906
    Martijn Faassen
    Apr 27, 2004
  2. Replies:
    2
    Views:
    789
    Kent Johnson
    May 4, 2005
  3. kaens
    Replies:
    6
    Views:
    341
    Stefan Behnel
    May 23, 2007
  4. kaens
    Replies:
    0
    Views:
    384
    kaens
    May 23, 2007
  5. sharan
    Replies:
    1
    Views:
    727
    Pavel Lepin
    Oct 26, 2007
Loading...

Share This Page