Functionality similar to PHP's SimpleXML?

P

Phillip B Oldham

I'm sure I'll soon figure out how to find these things out for myself,
but I'd like to get the community's advice on something.

I'm going to throw together a quick project over the weekend: a
spider. I want to scan a website for certain elements.

I come from a PHP background, so normally I'd:
- throw together a quick REST script to handle http request/responses
- load the pages into a simplexml object and
- run an xpath over the dom to find the nodes I need to test

One of the benefits of PHP's dom implementation is that you can easily
load both XML and HTML4 documents - the HTML gets normalised to XML
during the import.

So, my questions are:

Is there a python module to easily handle http request/responses?

Is there a python dom module that works similar to php's when working
with older html?

What python module would I use to apply an XPath expression over a dom
and return the results?
 
S

Stefan Behnel

Phillip said:
I'm going to throw together a quick project over the weekend: a
spider. I want to scan a website for certain elements.

I come from a PHP background, so normally I'd:
- throw together a quick REST script to handle http request/responses

Use the urllib/urllib2 module in the stdlib for GET/POST with parameters or
lxml for simple page requests.

- load the pages into a simplexml object and
- run an xpath over the dom to find the nodes I need to test

Use lxml.

http://codespeak.net/lxml/

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top