searching an XML doc

G

Gowri

Hello,

I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?

Thanks in advance
 
D

Diez B. Roggisch

Gowri said:
Hello,

I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?


lxml does that.

Diez
 
G

Gowri

lxml does that.

Diez

Hi Diez

I was trying lxml out and was unable to find any examples that would
help me parse an XML file with namespaces. For example, my XML file
looks like this:

<phedexData xmlns="http://a.b.com/phedex"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://a.b.com/phedex requests.xsd">
<!-- Low priority replication request -->
<request id="1234" last_update="1060199000.0">
<status>
<approved>T1_RAL_MSS</approved>
<approved>T2_London_ICHEP</approved>
<disapproved>T2_Southgrid_Bristol</disapproved>
<pending/>
<move_pending/>
</status>
<subscription open="1" priority="0" type="replicate">
<items>
<dataset>/PrimaryDS1/ProcessedDS1/Tier</dataset>
<block>/PrimaryDS2/ProcessedDS2/Tier/block</block>
</items>
</subscription>
</request>
</phedexData>

If my Xpath query is //request, it obviously would not work. Is there
some sort of namespace registration etc. that is to be done before
issuing a query? Example code would help a lot.
 
G

grflanagan

Hello,

I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?

Thanks in advance

Create your query like:

ns0 = '{http://a.b.com/phedex}'

query = '%srequest/%sstatus' % (ns0, ns0)

Also, although imperfect, some people have found this useful:

http://gflanagan.net/site/python/utils/elementfilter/elementfilter.py.txt

Code:
test = '''<phedexData xmlns="http://a.b.com/phedex"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://a.b.com/phedex requests.xsd">
        <!--  Low priority replication request -->
        <request id="1234" last_update="1060199000.0">
                <status>
                        <approved>T1_RAL_MSS</approved>
                        <approved>T2_London_ICHEP</approved>
                        <disapproved>T2_Southgrid_Bristol</
disapproved>
                        <pending/>
                        <move_pending/>
                </status>
                <subscription open="1" priority="0" type="replicate">
                        <items>
                                <dataset>/PrimaryDS1/ProcessedDS1/
Tier</dataset>
                                        <block>/PrimaryDS2/
ProcessedDS2/Tier/block</block>
                        </items>
                </subscription>
        </request>
</phedexData>
'''

from xml.etree import ElementTree as ET

root = ET.fromstring(test)

ns0 = '{http://a.b.com/phedex}'

from rattlebag.elementfilter import findall, data

#http://gflanagan.net/site/python/utils/elementfilter/
elementfilter.py.txt

query0 = '%(ns)srequest/%(ns)sstatus' % {'ns': ns0}
query1 = '%(ns)srequest/%(ns)ssubscription[@type=="replicate"]/%
(ns)sitems' % {'ns': ns0}
query2 = '%(ns)srequest[@id==1234]/%(ns)sstatus/%(ns)sapproved' %
{'ns': ns0}

print 'With ElementPath: '
print root.findall(query0)
print
print 'With ElementFilter:'
for query in [query0, query1, query2]:
    print
    print '+'*50
    print 'query: ', query
    print
    for item in findall(root, query):
        print 'item: ', item
        print 'xml:'
        ET.dump(item)

print '-'*50
print
print 'approved: ', data(root, query2)

[OUTPUT]
With ElementPath:
[<Element {http://a.b.com/phedex}status at b95ad0>]

With ElementFilter:

++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b.com/phedex}request/{http://a.b.com/phedex}status

item: <Element {http://a.b.com/phedex}status at b95ad0>
xml:
<ns0:status xmlns:ns0="http://a.b.com/phedex">
<ns0:approved>T1_RAL_MSS</ns0:approved>
<ns0:approved>T2_London_ICHEP</ns0:approved>
<ns0:disapproved>T2_Southgrid_Bristol</
ns0:disapproved>
<ns0:pending />
<ns0:move_pending />
</ns0:status>


++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b.com/phedex}request/{http://a.b.com/
phedex}subscription[@type
=="replicate"]/{http://a.b.com/phedex}items

item: <Element {http://a.b.com/phedex}items at b95eb8>
xml:
<ns0:items xmlns:ns0="http://a.b.com/phedex">
<ns0:dataset>/PrimaryDS1/ProcessedDS1/
Tier</ns0:
dataset>
<ns0:block>/PrimaryDS2/
ProcessedDS2/Tier
/block</ns0:block>
</ns0:items>


++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b.com/phedex}request[@id==1234]/{http://a.b.com/
phedex}status/
{http://a.b.com/phedex}approved

item: <Element {http://a.b.com/phedex}approved at b95cd8>
xml:
<ns0:approved xmlns:ns0="http://a.b.com/phedex">T1_RAL_MSS</
ns0:approved>

item: <Element {http://a.b.com/phedex}approved at b95cb0>
xml:
<ns0:approved xmlns:ns0="http://a.b.com/phedex">T2_London_ICHEP</
ns0:approved>
 
G

Gowri

Hi Gerard,

I don't know what to say :) thank you so much for taking time to post
all of this. truly appreciate it :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top