lxml question

U

Uwe Schmitt

Hi,

I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
I think that this situation is not unusual, so: is there a way to
force lxml to parse it ?

My work around is wrapping the text with "<root>...</root>" before
feeding lxmls parser.

Greetings, Uwe
 
M

Mark Thomas

I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
I think that this situation is not unusual, so: is there a way to
force lxml to parse it ?

By "pretends to be XML" you mean XML-like but not really XML?
My work around is wrapping the text with "<root>...</root>" before
feeding lxmls parser.

That's actually not a bad solution, if you know that the document is
otherwise well-formed. Another thing you can do is use libxml2's
"recover" mode which accommodates non-well-formed XML.

parser = etree.XMLParser(recover=True)
tree = etree.XML(your_xml_string, parser)

You'll still need to use your wrapper root element, because recover
mode will ignore everything after the first root closes (and it won't
throw an error).

-- Mark.
 
S

Stefan Behnel

Uwe said:
I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
I think that this situation is not unusual, so: is there a way to
force lxml to parse it ?

My work around is wrapping the text with "<root>...</root>" before
feeding lxmls parser.

Yes, you can do that. To avoid creating an intermediate string, you can use
the feed parser and do something like this:

parser = etree.XMLParser()
parser.feed("<root>")
parser.feed(your_xml_tag_sequence_data)
parser.feed("</root>")
root = parser.close()

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top