HTMLDocument and Xpath

swilson · Feb 3, 2006

Hi, I want to use xpath to scrape info from a website using pyXML but I
keep getting no results.

For example, in the following, I want to return the text "Element1" I
can't get xpath to return anything at all. What's wrong with this
code?

--------------------
from xml.dom.ext.reader import HtmlLib
from xml.xpath import Evaluate

reader = HtmlLib.Reader()
doc_node = reader.fromString("""
<html>
<head>
<title>Python Programming Language</title>
</head>
<body>
<table><tr><td>element1</td></tr></table>
</body>
</html>
""")

test = Evaluate('td', doc_node.documentElement)
print "test =", test
------------

All I get is an empty list for output.

Thx in advance

Shawn

Alan Kennedy · Feb 3, 2006

[[email protected]]

Hi, I want to use xpath to scrape info from a website using pyXML but I
keep getting no results.

For example, in the following, I want to return the text "Element1" I
can't get xpath to return anything at all. What's wrong with this
code?

Your xpath expression is wrong.

test = Evaluate('td', doc_node.documentElement)

Try one of the following alternatives, all of which should work.

test = Evaluate('//td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td[1]', doc_node.documentElement)

HTH,

Alan.

swilson · Feb 3, 2006

Alan said:
[[email protected]]

Hi, I want to use xpath to scrape info from a website using pyXML but I
keep getting no results.

For example, in the following, I want to return the text "Element1" I
can't get xpath to return anything at all. What's wrong with this
code?

Click to expand...

Your xpath expression is wrong.

test = Evaluate('td', doc_node.documentElement)

Click to expand...

Try one of the following alternatives, all of which should work.

test = Evaluate('//td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td[1]', doc_node.documentElement)

HTH,

Alan.

I tried all of those and in every case, test returns "[]". Does
Evaluate only work with XML documents?

Shawn

swilson · Feb 7, 2006

Got the answer - there's a bug in xpath. I think the HTML parser
converts all the tags (but not the attributes) to uppercase. Xpath
definitely does not like my first string but, these work fine:

test = Evaluate('//TD', doc_node.documentElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD', doc_node.documentElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD[1]', doc_node.documentElement)

Shawn

Sort by number of characters	1	Nov 2, 2023
Javascript DOM	1	Mar 29, 2023
How to wrap <td> content .	16	Sep 28, 2023
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
Script to send email not working	1	Apr 10, 2023
When I send email as HTML, why do erroneous whitespaces getintroduced to the HTML source and a few <	2	Nov 8, 2013
Loop through record in datareader in aspx for textbox other item	1	Sep 26, 2023
Getting extra blank rows from appending HTML..?	2	Oct 24, 2023

HTMLDocument and Xpath

swilson

Alan Kennedy

swilson

swilson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads