Splitting SAX results

I

IamIan

Hi list,

I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

class reportHandler(ContentHandler):
def __init__(self):
self.isReport = 0

def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''

def characters(self, ch):
if self.isReport:
self.reportText += ch

def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')
 
S

Stefan Behnel

IamIan said:
I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!

Sounds like a problem with the data to me rather than SAX.

However, SAX tends to make things much more complex than necessary, so you
loose the sight on the real problems. Try a library like ElementTree or lxml
to make your life easier. You might especially like lxml.objectify.

http://effbot.org/zone/element.htm
http://effbot.org/zone/element-iterparse.htm

http://codespeak.net/lxml/dev/
http://codespeak.net/lxml/dev/objectify.html

Stefan
 
I

IamIan

Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?
 
J

Jerry Hill

Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?

I don't think you've showed us any examples of the code you're having
trouble with. I don't see anything in your original post that tries
to split strings. If you just want to know how split works, here's an
example:

If that doesn't help, show us a sample of some of the data you're
working with, what you've
tried so far, and what the end result is supposed to look like.
 
I

IamIan

I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.

The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

tracker = [] # Option 1
tracker = {} # Option 2

class reportHandler(ContentHandler):

def __init__(self):
self.isReport = 0

def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''

def characters(self, ch):
if self.isReport:
self.reportText += ch
tracker.append(ch) # Option 1
key, value = ch.split (':') # Option 2
tracker[key] = value

def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

print tracker


Option 1 returns a list with the markup included, looking like:
[u'Title1:", u'\n', u'Description ', u'\n', u'\t\t\t', u'Title2:',
u'\n', u'Description ', u'\n', u'\t\t\t', etc]

Option 2 fails with the traceback:
File "C:\test.py", line 21, in characters
key, value = ch.split(':')
ValueError: need more than 1 value to unpack

Thank you for the help!
 
G

Gabriel Genellina

I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.

The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

Forget about SAX. Use ElementTree instead

py> import xml.etree.cElementTree as ET
py> f = open("x.xml","r")
py> tree = ET.parse(f)
py> for item in tree.getiterator('item'):
.... print item.findtext('title')
....
Title1:Description
Title2:Description

ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>
 
S

Stefan Behnel

Gabriel said:
Forget about SAX. Use ElementTree instead
ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>

That's what I told him/her already :)

Rephrasing a famous word:

Being faced with an XML problem, you might think "Ok, I'll just use SAX". And
now you have two problems.

SAX is a great way to hide your real problems behind a wall of unreadable
code. If you want my opinion, lxml is currently the straightest way to get XML
work done in Python.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top