Looking for source preservation features in XML libs

Discussion in 'Python' started by Grzegorz Adam Hankiewicz, Dec 28, 2004.

  1. Hi.

    I'm looking for two specific features in XML libraries. One is two be
    able to tell which source file line a tag starts and ends. Say, tag
    <para> is located on line 34 column 7, and the matching </para> three
    lines later on column 56.

    Another feature is to be able to save the processed XML code in a way
    that unmodified tags preserve the original identation. Or in the worst
    case, all identation is lost, but I can control to some degree the
    outlook of the final XML output.

    I have looked at xml.minidom, elementtree and gnosis and haven found any
    such features. Are there libs providing these?

    --
    Please don't send me private copies of your public answers. Thanks.
     
    Grzegorz Adam Hankiewicz, Dec 28, 2004
    #1
    1. Advertising

  2. Grzegorz Adam Hankiewicz

    Guest

    Grzegorz Adam Hankiewicz <> wrote:

    > I have looked at xml.minidom, elementtree and gnosis and haven't
    > found any such features. Are there libs providing these?


    pxdom (http://www.doxdesk.com/software/py/pxdom.html) has some of this,
    but I think it's still way off what you're envisaging.

    > One is to be able to tell which source file line a tag starts
    > and ends.


    You can get the file and line/column where a node begins in pxdom using
    the non-standard property Node.pxdomLocation, which returns a DOM Level
    3 DOMLocator object, eg.:

    uri= node.pxdomLocation.uri
    line= node.pxdomLocation.lineNumber
    col= node.pxdomLocation.columnNumber

    There is no way to get the location of an Element's end-tag, however.
    Except guessing by looking at the positions of adjacent nodes, which is
    kind of cheating and probably not reliable.

    SAX processors can in theory use Locator information too, but AFAIK (?)
    this isn't currently implemented.

    > Another feature is to be able to save the processed XML code in a way
    > that unmodified tags preserve the original identation.


    Do you mean whitespace *inside* the start-tag? I don't know of any XML
    processor that will do anything but ignore whitespace here; in XML
    terms it is utterly insignificant and there is no place to store the
    information in the infoset or DOM properties.

    pxdom will preserve the *order* of the attributes, but even that is not
    required by any XML standard.

    > Or in the worst case, all identation is lost, but I can control to
    > some degree the outlook of the final XML output.


    The DOM Level 3 LS feature format-pretty-print (and PyXML's
    PrettyPrint) influence whitespace in content. However if you do want
    control of whitespace inside the tags themselves I don't know of any
    XML tools that will do it. You might have to write your own serializer,
    or hack it into a DOM implementation of your choice.
    --
    Andrew Clover
    mailto:
    http://www.doxdesk.com/
     
    , Dec 28, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Christoph
    Replies:
    2
    Views:
    554
    Richard Bos
    Sep 17, 2003
  2. Jonathan Mcdougall
    Replies:
    2
    Views:
    522
    Kaz Kylheku
    Nov 3, 2005
  3. BillJosephson
    Replies:
    148
    Views:
    2,768
    peter koch
    Jan 27, 2007
  4. Raman
    Replies:
    5
    Views:
    1,091
    Raman
    May 9, 2008
  5. Greg Hauptmann
    Replies:
    4
    Views:
    225
    Stefano Crocco
    Feb 7, 2009
Loading...

Share This Page