formatted xml output from ElementTree inconsistency

Discussion in 'Python' started by Matthew Thorley, Jun 23, 2005.

  1. Greetings, perhaps someone can explain this. I get to different styles
    of formatting for xmla and xmlb when I do the following:

    from elementtree import ElementTree as et

    xmla = et.ElementTree('some_file.xml')
    xmlb = et.Element('parent')
    et.SubElement(xmlb, 'child1')
    et.SubElement(xmlb, 'child2')

    root = et.Element('root')
    root.append(xmla.getroot())
    root.append(xmlb)

    print et.tostring(root)

    The output I get shows xmla as nicely formatted text, with elements on
    different lines and everything all tabbed and pretty. Inverly, xmlb is
    one long string on one line.

    Is that because the some_file.xml is already nicely formatted? I thought
    that the formatting was ignored when creating new elements.

    Is their a function to 'pretty print' an element? I looked in api ref
    and didn't see anything that would do it. It would be nice if their was
    a way to create 'standard' formatted output for all elements regardless
    of how they were created.

    Comments and suggestions are greatly appreciated.

    regards
    -Matthew
     
    Matthew Thorley, Jun 23, 2005
    #1
    1. Advertising

  2. Matthew Thorley

    Jarek Zgoda Guest

    Matthew Thorley napisa³(a):

    > The output I get shows xmla as nicely formatted text, with elements on
    > different lines and everything all tabbed and pretty. Inverly, xmlb is
    > one long string on one line.
    >
    > Is that because the some_file.xml is already nicely formatted? I thought
    > that the formatting was ignored when creating new elements.


    Why want you to read an XML document "by hand"? It's a "machine related"
    data chunk.

    Document formatting should be done by means of CSS and/or XSL stylesheet.

    --
    Jarek Zgoda
    http://jpa.berlios.de/
     
    Jarek Zgoda, Jun 23, 2005
    #2
    1. Advertising

  3. Jarek Zgoda wrote:
    > Matthew Thorley napisa³(a):
    >
    >> The output I get shows xmla as nicely formatted text, with elements on
    >> different lines and everything all tabbed and pretty. Inverly, xmlb is
    >> one long string on one line.
    >>
    >> Is that because the some_file.xml is already nicely formatted? I
    >> thought that the formatting was ignored when creating new elements.

    >
    >
    > Why want you to read an XML document "by hand"? It's a "machine related"
    > data chunk.
    >
    > Document formatting should be done by means of CSS and/or XSL stylesheet.
    >

    It is just data to the machine, but people may have to read and
    interpret this data. I don't think there is anything unsual about
    formatting xml with tabs. Most web pages do that in their html/xhtml.
    Just imagine if you wanted to change a broken link on your web page, and
    the entire page was one long string. That may not matter to Dream
    Weaver, but it sure would be annoying if you were using vi :)

    -Matthew
     
    Matthew Thorley, Jun 23, 2005
    #3
  4. Matthew Thorley

    Kent Johnson Guest

    Matthew Thorley wrote:
    > Greetings, perhaps someone can explain this. I get to different styles
    > of formatting for xmla and xmlb when I do the following:
    > <snip>
    > Is that because the some_file.xml is already nicely formatted? I thought
    > that the formatting was ignored when creating new elements.


    ElementTree is preserving the whitespace of the original.
    >
    > Is their a function to 'pretty print' an element?


    AFAIK this is not supported in ElementTree. I hacked my own by modifying ElementTree._write(); it wasn't too hard to make a version that suited my purposes.

    Kent
     
    Kent Johnson, Jun 24, 2005
    #4
  5. Jarek Zgoda wrote:

    > Why want you to read an XML document "by hand"? It's a "machine related"
    > data chunk.
    >


    I see this attitude all the time, and frankly I don't understand it.
    Please explain why XML is in ASCII/unicode instead of binary. Is it
    because it is easier for a machine to parse? No, I thought not. It's
    obviously so humans can read it. The next question is: why is
    arbitrary whitespace allowed? Is that to make it easier for machines
    to parse? Is it any easier for machines to generate arbitrary
    whitespace than it would have been for them to always insert, e.g., a
    single space? No, I thought not there as well.

    > Document formatting should be done by means of CSS and/or XSL stylesheet.


    He's not formatting the (rendered) document -- he's just formatting the
    raw data to make it more readable in an editor. You could use CSS/XSL,
    and then selectively add whitespace without actually affecting the
    rendering. Alternatively, as you point out, it is a "machine related"
    data chunk -- some XML documents are never even destined for human
    eyes, _except_ for debugging. For some of those documents, CSS and XSL
    are just a waste of CPU cycles.

    Regards,
    Pat
     
    Patrick Maupin, Jun 24, 2005
    #5
  6. Matthew Thorley wrote:
    > from elementtree import ElementTree as et
    >
    > xmla = et.ElementTree('some_file.xml')
    > xmlb = et.Element('parent')
    > et.SubElement(xmlb, 'child1')
    > et.SubElement(xmlb, 'child2')
    >
    > root = et.Element('root')
    > root.append(xmla.getroot())
    > root.append(xmlb)
    >
    > print et.tostring(root)

    [snip]
    > Is their a function to 'pretty print' an element?


    Depends on how pretty you want it. I've found that putting each element
    on its own line has been sufficient for many of my manual-inspection use
    cases. This isn't too hard with a cheap hack:

    py> import elementtree.ElementTree as et
    py> root = et.Element('root')
    py> parent = et.SubElement(root, 'parent')
    py> child = et.SubElement(parent, 'child')
    py> print et.tostring(root)
    <root><parent><child /></parent></root>
    py> print et.tostring(root).replace('><', '>\n<')
    <root>
    <parent>
    <child />
    </parent>
    </root>

    Not ideal, but it may work well enough for you.

    STeVe
     
    Steven Bethard, Jun 25, 2005
    #6
  7. On 24 Jun 2005 13:53:43 -0700, "Patrick Maupin" <>
    declaimed the following in comp.lang.python:


    > I see this attitude all the time, and frankly I don't understand it.
    > Please explain why XML is in ASCII/unicode instead of binary. Is it
    > because it is easier for a machine to parse? No, I thought not. It's
    > obviously so humans can read it. The next question is: why is


    Off hand, I'd consider the non-binary nature to be because the
    internet protocols are mostly designed for text, not binary.

    --
    > ============================================================== <
    > | Wulfraed Dennis Lee Bieber KD6MOG <
    > | Bestiaria Support Staff <
    > ============================================================== <
    > Home Page: <http://www.dm.net/~wulfraed/> <
    > Overflow Page: <http://wlfraed.home.netcom.com/> <
     
    Dennis Lee Bieber, Jun 25, 2005
    #7
  8. Dennis Bieber wrote:

    > Off hand, I'd consider the non-binary nature to be because the
    > internet protocols are mostly designed for text, not binary.


    A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
    XML".

    One of the listed goals is "XML documents should be human-legible and
    reasonably clear".

    To your point, the very _first_ listed goal (if order means anything in
    this list) is "XML shall be straightforwardly usable over the
    Internet", so it's reasonable to assume "the non-binary nature to be
    because the internet protocols are mostly designed for text, not
    binary."

    But this assumption turns cause and effect on its head. It is
    perfectly feasible to pass binary data through every known internet
    protocol (with a little simplistic encoding), and is done all the time.
    The real next question is: why ARE the internet protocols "mostly
    designed for text, not binary"?

    SMTP, for example, was designed at a time when memory, bandwidth, and
    CPU cycles were all at a premium, and MTAs were coded using fairly
    low-level constructs in C where parsing was a pain in the rear. Even
    so, the developers decided to use relatively free-formatted ASCII in
    the protocol. To follow your theory to its logical conclusion, they
    must have wasted all that bandwith, all those CPU cycles, all that
    memory, all that disk space, and all that effort writing parsing code
    because of yet another underlying mechanism which was "designed for
    text."

    On that account, your theory is correct, but only when you realize the
    underlying mechanism which is "designed for text" is the human brain,
    which has to try to make sense of all this mess when things aren't
    quite interoperating properly.

    Regards,
    Pat
     
    Patrick Maupin, Jun 25, 2005
    #8
  9. Matthew Thorley

    Guest

    Patrick Maupin wrote:
    """
    Dennis Bieber wrote:
    > Off hand, I'd consider the non-binary nature to be because the
    > internet protocols are mostly designed for text, not binary.


    A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
    XML".

    One of the listed goals is "XML documents should be human-legible and
    reasonably clear".
    """

    Yes. Thanks for mentioning this, because people too often forget it.

    minidom, 4Suite's Domlette and Amara all provide good pretty-print
    output functions. The latter two use rules from the XSLT spec, which
    is designed by people who have the above design goal well in their
    blood.

    --
    Uche
    http://copia.ogbuji.net
     
    , Jun 26, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Damjan
    Replies:
    4
    Views:
    313
    Damjan
    Dec 10, 2005
  2. Kee Nethery
    Replies:
    12
    Views:
    2,197
    Stefan Behnel
    Jun 27, 2009
  3. Stefan Behnel
    Replies:
    0
    Views:
    835
    Stefan Behnel
    May 4, 2010
  4. Barak, Ron
    Replies:
    1
    Views:
    1,201
    John Machin
    May 5, 2010
  5. Terry Reedy
    Replies:
    1
    Views:
    558
    John Machin
    May 5, 2010
Loading...

Share This Page