P
Phillip B Oldham
I'm sure I'll soon figure out how to find these things out for myself,
but I'd like to get the community's advice on something.
I'm going to throw together a quick project over the weekend: a
spider. I want to scan a website for certain elements.
I come from a PHP background, so normally I'd:
- throw together a quick REST script to handle http request/responses
- load the pages into a simplexml object and
- run an xpath over the dom to find the nodes I need to test
One of the benefits of PHP's dom implementation is that you can easily
load both XML and HTML4 documents - the HTML gets normalised to XML
during the import.
So, my questions are:
Is there a python module to easily handle http request/responses?
Is there a python dom module that works similar to php's when working
with older html?
What python module would I use to apply an XPath expression over a dom
and return the results?
but I'd like to get the community's advice on something.
I'm going to throw together a quick project over the weekend: a
spider. I want to scan a website for certain elements.
I come from a PHP background, so normally I'd:
- throw together a quick REST script to handle http request/responses
- load the pages into a simplexml object and
- run an xpath over the dom to find the nodes I need to test
One of the benefits of PHP's dom implementation is that you can easily
load both XML and HTML4 documents - the HTML gets normalised to XML
during the import.
So, my questions are:
Is there a python module to easily handle http request/responses?
Is there a python dom module that works similar to php's when working
with older html?
What python module would I use to apply an XPath expression over a dom
and return the results?