Splitting SAX results

IamIan · Jun 7, 2007

Hi list,

I have a very simple SAX script from which I get results like
'Title1

escription','Title2

escription'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

class reportHandler(ContentHandler):
def __init__(self):
self.isReport = 0

def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''

def characters(self, ch):
if self.isReport:
self.reportText += ch

def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

Stefan Behnel · Jun 7, 2007

IamIan said:
I have a very simple SAX script from which I get results like
'Title1escription','Title2escription'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!

Sounds like a problem with the data to me rather than SAX.

However, SAX tends to make things much more complex than necessary, so you
loose the sight on the real problems. Try a library like ElementTree or lxml
to make your life easier. You might especially like lxml.objectify.

http://effbot.org/zone/element.htm
http://effbot.org/zone/element-iterparse.htm

http://codespeak.net/lxml/dev/
http://codespeak.net/lxml/dev/objectify.html

Stefan

IamIan · Jun 8, 2007

Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1

escription' on the colon without getting an
IndexError. Ideas anyone?

Jerry Hill · Jun 8, 2007

Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1escription' on the colon without getting an
IndexError. Ideas anyone?

I don't think you've showed us any examples of the code you're having
trouble with. I don't see anything in your original post that tries
to split strings. If you just want to know how split works, here's an
example:

If that doesn't help, show us a sample of some of the data you're
working with, what you've
tried so far, and what the end result is supposed to look like.

IamIan · Jun 12, 2007

I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2

escription'}.

The XML data looks like:
<item>
<title>Title1

escription</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2

escription</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

tracker = [] # Option 1
tracker = {} # Option 2

class reportHandler(ContentHandler):

def __init__(self):
self.isReport = 0

def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''

def characters(self, ch):
if self.isReport:
self.reportText += ch
tracker.append(ch) # Option 1
key, value = ch.split (':') # Option 2
tracker[key] = value

def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

print tracker

Option 1 returns a list with the markup included, looking like:
[u'Title1:", u'\n', u'Description ', u'\n', u'\t\t\t', u'Title2:',
u'\n', u'Description ', u'\n', u'\t\t\t', etc]

Option 2 fails with the traceback:
File "C:\test.py", line 21, in characters
key, value = ch.split(':')
ValueError: need more than 1 value to unpack

Thank you for the help!

Gabriel Genellina · Jun 13, 2007

En Tue said:
I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2escription'}.

The XML data looks like:
<item>
<title>Title1escription</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2escription</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

Forget about SAX. Use ElementTree instead

py> import xml.etree.cElementTree as ET
py> f = open("x.xml","r")
py> tree = ET.parse(f)
py> for item in tree.getiterator('item'):
.... print item.findtext('title')
....
Title1

escription
Title2

escription

ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>

Stefan Behnel · Jun 20, 2007

Gabriel said:
Forget about SAX. Use ElementTree instead
ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>

That's what I told him/her already

Rephrasing a famous word:

Being faced with an XML problem, you might think "Ok, I'll just use SAX". And
now you have two problems.

SAX is a great way to hide your real problems behind a wall of unreadable
code. If you want my opinion, lxml is currently the straightest way to get XML
work done in Python.

Stefan

XML / Unicode / SAX question	2	Jul 4, 2007
SAX XML Parse Python error message	5	Jul 13, 2008
Newbie XML SAX Parsing: How do I ignore an invalid token?	5	Jan 5, 2007
Daily WTF with XML, or error handling in SAX	0	May 3, 2008
Error handling in SAX	1	May 3, 2008
sax.handler.Contenthandler.__init__	1	Aug 30, 2013
trying to use sax for a very basic first xml parser	4	Jul 14, 2008
Parsing xml file in python	5	Oct 30, 2007

Splitting SAX results

IamIan

Stefan Behnel

IamIan

Jerry Hill

IamIan

Gabriel Genellina

Stefan Behnel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads