XML Parsing

A

Alok Kothari

Hello,
I am new to XML parsing.Could you kindly tell me whats the
problem with the following code:

import xml.dom.minidom
import xml.parsers.expat
document = """<token pos="nn">Letterman</token><token pos="bez">is</
token><token pos="jjr">better</token><token pos="cs">than</
token><token pos="np">Jay</token><token pos="np">Leno</token>"""



# 3 handler functions
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)

p = xml.parsers.expat.ParserCreate()

p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
p.Parse(document, 1)

OUTPUT:

Start element: token {u'pos': u'nn'}
Character data: u'Letterman'
End element: token

Traceback (most recent call last):
File "C:/Python25/Programs/eg.py", line 20, in <module>
p.Parse(document, 1)
ExpatError: junk after document element: line 1, column 33
 
J

Jason Scheirer

Hello,
I am new to XML parsing.Could you kindly tell me whats the
problem with the following code:

import xml.dom.minidom
import xml.parsers.expat
document = """<token pos="nn">Letterman</token><token pos="bez">is</
token><token pos="jjr">better</token><token pos="cs">than</
token><token pos="np">Jay</token><token pos="np">Leno</token>"""

# 3 handler functions
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)

p = xml.parsers.expat.ParserCreate()

p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
p.Parse(document, 1)

OUTPUT:

Start element: token {u'pos': u'nn'}
Character data: u'Letterman'
End element: token

Traceback (most recent call last):
File "C:/Python25/Programs/eg.py", line 20, in <module>
p.Parse(document, 1)
ExpatError: junk after document element: line 1, column 33

Your XML is wrong. Don't put line breaks between </ and token>.
 
7

7stud

Hello,
          I am new to XML parsing.Could you kindly tell me whats the
problem with the following code:

import xml.dom.minidom
import xml.parsers.expat
document = """<token pos="nn">Letterman</token><token pos="bez">is</
token><token pos="jjr">better</token><token pos="cs">than</
token><token pos="np">Jay</token><token pos="np">Leno</token>"""

# 3 handler functions
def start_element(name, attrs):
    print 'Start element:', name, attrs
def end_element(name):
    print 'End element:', name
def char_data(data):
    print 'Character data:', repr(data)

p = xml.parsers.expat.ParserCreate()

p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
p.Parse(document, 1)

OUTPUT:

Start element: token {u'pos': u'nn'}
Character data: u'Letterman'
End element: token

Traceback (most recent call last):
  File "C:/Python25/Programs/eg.py", line 20, in <module>
    p.Parse(document, 1)
ExpatError: junk after document element: line 1, column 33


I don't know if you are aware of the BeautifulSoup module:


import BeautifulSoup as bs

xml = """<token pos="nn">Letterman</token><token pos="bez">is</
token><token pos="jjr">better</token><token pos="cs">than</
token><token pos="np">Jay</token><token pos="np">Leno</token>"""

doc = bs.BeautifulStoneSoup(xml)

tokens = doc.findAll("token")
for token in tokens:
for attr in token.attrs:
print "%s : %s" % attr


print token.string

--output:--
pos : nn
Letterman
pos : bez
is
pos : jjr
better
pos : cs
than
pos : np
Jay
pos : np
Leno
 
G

Gabriel Genellina

I don't know if you are aware of the BeautifulSoup module:
Or ElementTree:

import xml.etree.ElementTree as ET

doctext = """<tokens><token pos="nn">Letterman</token><token
pos="bez">is</token><token pos="jjr">better</token><token
pos="cs">than</token><token pos="np">Jay</token><token
pos="np">Leno</token></tokens>"""

doc = ET.fromstring(doctext)
for token in doc.findall("token"):
print 'pos:', token.get('pos')
print 'text:', token.text
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top