XML parsing with python

inder · Aug 17, 2009

Hi All,

I am new to xml . I need to parse the xml file . After reading and
browsing on the web , I could get much help .

I guess SAX would be better suited for my requirement .

Could some juct provide me a sample python code so that I can execute
it and see how the parsing actually happens .

Lets say my xml file -
<?xml version="1.0"?>
<library>
<book id="ISBN001">
<title>I,Robot</title>
<pages>100</pages>
<author>Isaac Asimov</author>
</book>
<book id="ISBN001" damaged="true">
<title>Blade Runner</title>
<pages>400</pages>
<author>Philip K. Dick</author>
</book>
</category>
<category code="Boring" room="2">
<book id="ISBN003">
<title>Lord Of The Rings</title>
<pages>20000</pages>
<author>Tolkien</author>
</book>
<book id="ISBN004" damaged="true">
<title>XML-Schema Specification</title>
<pages>5000</pages>
<author>W3C</author>
</book>
</category>
<category code="Fantasy">
<book id="ISBN005" damaged="true">
<title>Aladin</title>
<pages>150</pages>
<author>Don't know</author>
</book>
</category>
</library>

--------------------------

I need the output to be - (elements containing 'title' )

I,Robot
Blade Runner
Lord Of The Rings
XML-Schema Specification
Aladin

Your responses are greatly appreciated .

Thanks in advace

Stefan Behnel · Aug 17, 2009

inder said:
I am new to xml . I need to parse the xml file . After reading and
browsing on the web , I could get much help .

I guess SAX would be better suited for my requirement .

That's a common misconception.

Could some juct provide me a sample python code so that I can execute
it and see how the parsing actually happens .

Lets say my xml file -
<?xml version="1.0"?>
<library>
<category code="SciFi" room="1"> <!--if you want to test invalid
document against schema you can just cut the mandatory id attribute --
<book id="ISBN001">
<title>I,Robot</title>
<pages>100</pages>
<author>Isaac Asimov</author>
</book>
<book id="ISBN001" damaged="true">
<title>Blade Runner</title>
<pages>400</pages>
<author>Philip K. Dick</author>
</book>
</category>
<category code="Boring" room="2">
<book id="ISBN003">
<title>Lord Of The Rings</title>
<pages>20000</pages>
<author>Tolkien</author>
</book>
<book id="ISBN004" damaged="true">
<title>XML-Schema Specification</title>
<pages>5000</pages>
<author>W3C</author>
</book>
</category>
<category code="Fantasy">
<book id="ISBN005" damaged="true">
<title>Aladin</title>
<pages>150</pages>
<author>Don't know</author>
</book>
</category>
</library>

--------------------------

I need the output to be - (elements containing 'title' )

I,Robot
Blade Runner
Lord Of The Rings
XML-Schema Specification
Aladin

Use the iterparse() function of the xml.etree.ElementTree package.

http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk

Stefan

problems with xml parsing (python 3.3)	5	Oct 28, 2012
Parsing XML with ElementTree (unicode problem?)	13	Jul 23, 2007
possible issue with mechanize/python parsing	0	Jul 10, 2006
Problem Parsing XML into ASP	3	Sep 13, 2007
Newbie: parsing simple XML with C/C++	2	Oct 29, 2003
Schema with ID/IDREF validates, but xml-file that uses it does not	3	Feb 21, 2007
E-learning website with XML/XSL	1	Apr 1, 2009
XSLT Noob with a problem	5	Jul 3, 2010

XML parsing with python

inder

Stefan Behnel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads