formatted xml output from ElementTree inconsistency

M

Matthew Thorley

Greetings, perhaps someone can explain this. I get to different styles
of formatting for xmla and xmlb when I do the following:

from elementtree import ElementTree as et

xmla = et.ElementTree('some_file.xml')
xmlb = et.Element('parent')
et.SubElement(xmlb, 'child1')
et.SubElement(xmlb, 'child2')

root = et.Element('root')
root.append(xmla.getroot())
root.append(xmlb)

print et.tostring(root)

The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.

Is their a function to 'pretty print' an element? I looked in api ref
and didn't see anything that would do it. It would be nice if their was
a way to create 'standard' formatted output for all elements regardless
of how they were created.

Comments and suggestions are greatly appreciated.

regards
-Matthew
 
J

Jarek Zgoda

Matthew Thorley napisa³(a):
The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.

Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

Document formatting should be done by means of CSS and/or XSL stylesheet.
 
M

Matthew Thorley

Jarek said:
Matthew Thorley napisa³(a):



Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

Document formatting should be done by means of CSS and/or XSL stylesheet.
It is just data to the machine, but people may have to read and
interpret this data. I don't think there is anything unsual about
formatting xml with tabs. Most web pages do that in their html/xhtml.
Just imagine if you wanted to change a broken link on your web page, and
the entire page was one long string. That may not matter to Dream
Weaver, but it sure would be annoying if you were using vi :)

-Matthew
 
K

Kent Johnson

Matthew said:
Greetings, perhaps someone can explain this. I get to different styles
of formatting for xmla and xmlb when I do the following:
<snip>
Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.

ElementTree is preserving the whitespace of the original.
Is their a function to 'pretty print' an element?

AFAIK this is not supported in ElementTree. I hacked my own by modifying ElementTree._write(); it wasn't too hard to make a version that suited my purposes.

Kent
 
P

Patrick Maupin

Jarek said:
Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

I see this attitude all the time, and frankly I don't understand it.
Please explain why XML is in ASCII/unicode instead of binary. Is it
because it is easier for a machine to parse? No, I thought not. It's
obviously so humans can read it. The next question is: why is
arbitrary whitespace allowed? Is that to make it easier for machines
to parse? Is it any easier for machines to generate arbitrary
whitespace than it would have been for them to always insert, e.g., a
single space? No, I thought not there as well.
Document formatting should be done by means of CSS and/or XSL stylesheet.

He's not formatting the (rendered) document -- he's just formatting the
raw data to make it more readable in an editor. You could use CSS/XSL,
and then selectively add whitespace without actually affecting the
rendering. Alternatively, as you point out, it is a "machine related"
data chunk -- some XML documents are never even destined for human
eyes, _except_ for debugging. For some of those documents, CSS and XSL
are just a waste of CPU cycles.

Regards,
Pat
 
S

Steven Bethard

Matthew said:
from elementtree import ElementTree as et

xmla = et.ElementTree('some_file.xml')
xmlb = et.Element('parent')
et.SubElement(xmlb, 'child1')
et.SubElement(xmlb, 'child2')

root = et.Element('root')
root.append(xmla.getroot())
root.append(xmlb)

print et.tostring(root) [snip]
Is their a function to 'pretty print' an element?

Depends on how pretty you want it. I've found that putting each element
on its own line has been sufficient for many of my manual-inspection use
cases. This isn't too hard with a cheap hack:

py> import elementtree.ElementTree as et
py> root = et.Element('root')
py> parent = et.SubElement(root, 'parent')
py> child = et.SubElement(parent, 'child')
py> print et.tostring(root)
<root><parent><child /></parent></root>
py> print et.tostring(root).replace('><', '>\n<')
<root>
<parent>
<child />
</parent>
</root>

Not ideal, but it may work well enough for you.

STeVe
 
D

Dennis Lee Bieber

I see this attitude all the time, and frankly I don't understand it.
Please explain why XML is in ASCII/unicode instead of binary. Is it
because it is easier for a machine to parse? No, I thought not. It's
obviously so humans can read it. The next question is: why is

Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.

--
 
P

Patrick Maupin

Dennis said:
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.

A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".

To your point, the very _first_ listed goal (if order means anything in
this list) is "XML shall be straightforwardly usable over the
Internet", so it's reasonable to assume "the non-binary nature to be
because the internet protocols are mostly designed for text, not
binary."

But this assumption turns cause and effect on its head. It is
perfectly feasible to pass binary data through every known internet
protocol (with a little simplistic encoding), and is done all the time.
The real next question is: why ARE the internet protocols "mostly
designed for text, not binary"?

SMTP, for example, was designed at a time when memory, bandwidth, and
CPU cycles were all at a premium, and MTAs were coded using fairly
low-level constructs in C where parsing was a pain in the rear. Even
so, the developers decided to use relatively free-formatted ASCII in
the protocol. To follow your theory to its logical conclusion, they
must have wasted all that bandwith, all those CPU cycles, all that
memory, all that disk space, and all that effort writing parsing code
because of yet another underlying mechanism which was "designed for
text."

On that account, your theory is correct, but only when you realize the
underlying mechanism which is "designed for text" is the human brain,
which has to try to make sense of all this mess when things aren't
quite interoperating properly.

Regards,
Pat
 
U

uche.ogbuji

Patrick Maupin wrote:
"""
Dennis said:
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.

A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".
"""

Yes. Thanks for mentioning this, because people too often forget it.

minidom, 4Suite's Domlette and Amara all provide good pretty-print
output functions. The latter two use rules from the XSLT spec, which
is designed by people who have the above design goal well in their
blood.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top