Creating referenceable objects from XML

  • Thread starter Michael Williams
  • Start date
M

Michael Williams

Hi All,

I'm looking for a quality Python XML implementation. All of the DOM
and SAX implementations I've come across so far are rather
convoluted. Are there any quality implementations that will (after
parsing the XML) return an object that is accessible by name? Such as
the following:




xml = """
<book>
<title>MyBook</title>
<author>the author</author>
</book>
"""





And after parsing the XML allow me to access it as so:

book.title

I need it to somehow convert my XML to intuitively referenceable
object. Any ideas? I could even do it myself if I knew the
mechanism by which python classes do this (create variables on the fly).

Thanks in advance!
 
R

rurpy

Michael said:
Hi All,

I'm looking for a quality Python XML implementation. All of the DOM
and SAX implementations I've come across so far are rather
convoluted. Are there any quality implementations that will (after
parsing the XML) return an object that is accessible by name? Such as
the following:


xml = """
<book>
<title>MyBook</title>
<author>the author</author>
</book>
"""

And after parsing the XML allow me to access it as so:

book.title

I need it to somehow convert my XML to intuitively referenceable
object. Any ideas? I could even do it myself if I knew the
mechanism by which python classes do this (create variables on the fly).

Thanks in advance!

You might want to take a look at Fredrik Lundh's ElementTree
(and cElementTree) modules:
http://effbot.org/zone/element-index.htm
 
D

Diez B. Roggisch

Michael said:
I'm looking for a quality Python XML implementation. All of the DOM
and SAX implementations I've come across so far are rather
convoluted.

Welcome to the wonderful world of XML.
I need it to somehow convert my XML to intuitively referenceable
object. Any ideas? I could even do it myself if I knew the
mechanism by which python classes do this (create variables on the fly).

You've been given the advice to use ElementTree - I can only second
that.

But if for whatever reason you do want to do it yourself (or for future
use), the

getattr/setattr

functions are what you are looking for. Look them up in TFM.

Regards,

Diez
 
G

Gerard Flanagan

Michael said:
Hi All,

I'm looking for a quality Python XML implementation. All of the DOM
and SAX implementations I've come across so far are rather
convoluted. Are there any quality implementations that will (after
parsing the XML) return an object that is accessible by name? Such as
the following:




xml = """
<book>
<title>MyBook</title>
<author>the author</author>
</book>
"""





And after parsing the XML allow me to access it as so:

book.title

I need it to somehow convert my XML to intuitively referenceable
object. Any ideas? I could even do it myself if I knew the
mechanism by which python classes do this (create variables on the fly).

Thanks in advance!

Michael

Here's an approach to ElementTree that worked for me. It's not generic
or anything and a bit brittle (eg. it won't handle missing nodes) but
maybe for a simple, flat schema or for a prototype?

All the best

Gerard

(TOY CODE - NOT TESTED MUCH)

from elementtree import ElementTree

class ElementWrapper(object):

def __tostring(self):
return ElementTree.tostring(self.element)

def __fromstring(self, xml):
self.element = ElementTree.fromstring(xml)

xml = property( __tostring, __fromstring )

def __init__(self, element=None):
self.element = element

def __str__(self):
return self.xml

def parse(self, infile):
tree = ElementTree.parse(infile)
self.element = tree.getroot()

def write(self, outfile):
ElementTree.ElementTree(self.element).write(outfile)

###########


from elementtree.ElementTree import Element
from elementwrapper import ElementWrapper

xmlns = 'http://schemas/email/0.1'


class MailDocument(ElementWrapper):

def __build_element(self):
root = Element('{%s}Mail' % xmlns)
root.append( Element('{%s}Date' % xmlns) )
root.append( Element('{%s}From' % xmlns) )
root.append( Element('{%s}Subject' % xmlns) )
root.append( Element('{%s}To' % xmlns) )
root.append( Element('{%s}Cc' % xmlns) )
root.append( Element('{%s}Body' % xmlns) )
root.append( Element('{%s}Attachments' % xmlns) )
self.element = root

#####################################################
# Properties
#
def __get_uid(self):
return self.element.get('id')

def __set_uid(self, id=''):
self.element.set('id', id)

def __get_date(self):
return self.element[0].text

def __set_date(self, value=''):
self.element[0].text = value

def __get_from(self):
addr = self.element[1].get('address')
nm = self.element[1].get('name')
return addr, nm

def __get_subject(self):
return self.element[2].text

def __set_subject(self, value=''):
self.element[2].text = value

def __get_body(self):
return self.element[5].text

def __set_body(self, value=''):
self.element[5].text = value

uid = property( __get_uid, __set_uid )
From = property( __get_from)
subject = property( __get_subject, __set_subject )
date = property( __get_date, __set_date )
body = property( __get_body, __set_body )

def set_from_header(self, address='', name=''):
self.element[1].set('address', address)
self.element[1].set('name', name)
#
# End Properties
#####################################################

#####################################################
# Lists
#
def add_to_header(self, address='', name=''):
self.__add_mailto( self.element[3], address, name )

def remove_to_header(self, index):
elem = self.element[3][index]
self.element[3].remove(elem)

def add_cc_header(self, address='', name=''):
self.__add_mailto( self.element[4], address, name )

def remove_cc_header(self, index):
elem = self.element[4][index]
self.element[4].remove(elem)

def add_attachment(self, filename='', fileuri='', filetype=''):
elem = Element("{%s}Uri" % xmlns, value=fileuri, type=filetype
)
elem.text = filename
self.element[6].append( elem )

def remove_attachment(self, index):
elem = self.element[6][index]
self.element[6].remove(elem)

def __add_mailto(self, element, Address='', Name=''):
element.append( Element("{%s}mailto" % xmlns, address=Address,
name=Name ) )

def get_to_headers(self):
hdrs = []
for item in self.element[3]:
hdrs.append( ( item.get('address'), item.get('name') ) )
return hdrs

def get_cc_headers(self):
hdrs = []
for item in self.element[4]:
hdrs.append( (item.get('address'), item.get('name') ) )
return hdrs

def get_attachments(self):
ret = []
for item in self.__element[6]:
hdrs.append( (item.text, item.get('value'),
item.get('type') ) )
return hdrs
#
# End Lists
########################################################

########################################################
# Initialise
#
def __init__(self):
self.__build_element()
self.__set_uid()
self.__set_date()
self.__set_subject()
self.set_from_header()
self.__set_body()
#
# End Initialise
########################################################

xml_test ='''
<mail:Mail xmlns:mail="http://schemas/email/0.1">
<mail:Date>10/10/05</mail:Date>
<mail:From address='(e-mail address removed)' name='Mr. Jones'/>
<mail:Subject>just a note</mail:Subject>
<mail:To>
<mail:mailto address='(e-mail address removed)' name='Mrs Jones' />
<mail:mailto address='(e-mail address removed)' name='Alan Nother' />
</mail:To>
<mail:Cc></mail:Cc>
<mail:Body>hi there,
just a note to say hi there!</mail:Body>
<mail:Attachments></mail:Attachments>
</mail:Mail>
'''
if __name__ == '__main__':
mail = MailDocument()
mail.xml = xml_test
#mail.parse('test/data/test.xml')
print 'From: ' + mail.From[0]
print 'Subject: ' + mail.subject
mail.set_from_header('(e-mail address removed)')
print 'From: ' + mail.From[0]
mail.add_to_header('aaa.bbb@ccc', 'aaaaaa')
mail.add_to_header('fff.ggg@hhh', 'ffffff')
print 'To:'
for hdr in mail.get_to_headers():
print hdr
mail.remove_to_header(1)
print 'To:'
for hdr in mail.get_to_headers():
print hdr
#mail.write('test_copy.xml')
 
L

Laurent Pointal

Michael said:
Hi All,

I'm looking for a quality Python XML implementation. All of the DOM
and SAX implementations I've come across so far are rather convoluted.
Are there any quality implementations that will (after parsing the XML)
return an object that is accessible by name? Such as the following:




xml = """
<book>
<title>MyBook</title>
<author>the author</author>
</book>
"""





And after parsing the XML allow me to access it as so:

book.title

I need it to somehow convert my XML to intuitively referenceable
object. Any ideas? I could even do it myself if I knew the mechanism
by which python classes do this (create variables on the fly).

Thanks in advance!

Another tool (ElementsTree already quoted): Amara
( http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/ )

[never tested but bookmarked as it seem interresting]

A+

Laurent.
 
U

uche.ogbuji

Michael said:
I'm looking for a quality Python XML implementation. All of the DOM
and SAX implementations I've come across so far are rather
convoluted. Are there any quality implementations that will (after
parsing the XML) return an object that is accessible by name? Such as
the following:
xml = """
<book>
<title>MyBook</title>
<author>the author</author>
</book>
"""
And after parsing the XML allow me to access it as so:

I need it to somehow convert my XML to intuitively referenceable
object. Any ideas? I could even do it myself if I knew the
mechanism by which python classes do this (create variables on the fly).

Looks as if MIchael is working with Amara now, but I did want to note
for the record that APIs that allow one to access a node in the
"book.title" fashion are what I call Python data bindings.

Python data bindings I usually point out are:

Amara Bindery: http://www.xml.com/pub/a/2005/01/19/amara.html
Gnosis: http://www.xml.com/pub/a/2003/07/02/py-xml.html
generateDS: http://www.xml.com/pub/a/2003/06/11/py-xml.html

Based on updates to EaseXML in response to my article another entry
might be:

EaseXML: http://www.xml.com/pub/a/2005/07/27/py-xml.html

ElementTree ( http://www.xml.com/pub/a/2003/02/12/py-xml.html ) is a
Python InfoSet rather than a Python data binding. You access nodes
using generic names related to the node type rather than the node name.
Whether data bindings or Infosets are your preference is a matter of
taste, but it's a useful distinction to make between the approaches.
It looks as if Gerald Flanagan has constructed a little specialized
binding tool on top of ElementTree, and that's one possible hybrid
approach.

xmltramp ( http://www.aaronsw.com/2002/xmltramp/ ) is another
interesting hybrid.
 
A

Alan Kennedy

[Michael Williams]
I need it to somehow convert my XML to intuitively referenceable
object. Any ideas? I could even do it myself if I knew the mechanism
by which python classes do this (create variables on the fly).

You seem to already have a fair idea what kind of model you need, and to
know that there is a simple way for you to create one. I encourage you
to progress on this path: it will increase the depth of your understanding.

One mistake I think that some people make about XML is relying on other
peoples interpretations of the subject, rather than forming their own
opinions.

The multitude of document models provided by everyone and his mother all
make assumptions about how the components of the model will be accessed,
in what order those components will be accessed, how often and when, how
memory efficient the model is, etc, etc.

To really understand the trade-offs and strengths of all the different
models, it is a good exercise to build your own object model. It's a
simple exercise, due to pythons highly dynamic nature. Understanding
your own model will help you understand what the other models do and do
not provide. You can then evaluate other off-the-shelf models for your
specific applications: I always find different XML tools suit different
situations.

See this post of mine from a couple years back about different ways of
building your own document/data models.

http://groups.google.com/group/comp.lang.python/msg/e2a4a1c35395ffec

I think the reference to the ActiveState recipe will be of particular
interest, since you could have a running example very quickly indeed.

See also my tutorial post on extracting document content from a SAX
stream. I gave the example of a simple stack-based xpath-style
expression matcher.

http://groups.google.com/group/comp.lang.python/msg/6853bddbb9326948

Also contained in that thread is an illuminating and productive
discussion between the effbot and myself about how wonderfully simple
ElementTree makes this, not to mention unbeatably efficient.

this-week-i-ave-been-mostly-using-kid-for-templating-ly'yrs,
 
F

Fredrik Lundh

ElementTree ( http://www.xml.com/pub/a/2003/02/12/py-xml.html ) is a
Python InfoSet rather than a Python data binding. You access nodes
using generic names related to the node type rather than the node name.
Whether data bindings or Infosets are your preference is a matter of
taste, but it's a useful distinction to make between the approaches.
It looks as if Gerald Flanagan has constructed a little specialized
binding tool on top of ElementTree, and that's one possible hybrid
approach.

in my experience, it's hard to make a python/xml mapping that's well suited for all
possible use cases (many bindings suffer from issues with namespaces, collisions
between tags/attribute names and python names, etc), but it's usually trivial to write
a custom wrapper for a specific case.

for most normal use, manual infoset navigation is often the easiest way to pull out
data from the infoset (find, get, findtext, int, float, etc).

for certain cases, creating wrappers on demand can be quite efficient; e.g.

http://online.effbot.org/2003_07_01_archive.htm#element-tricks

and for highly regular cases, incremental parsing/conversion up front is often the
fastest and most efficient way to deal with data; e.g.

http://effbot.org/zone/element-iterparse.htm#plist

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top