Getting HTTP responses - a python linkchecking script.

  • Thread starter blair.bethwaite
  • Start date
B

blair.bethwaite

Hi Folks,

I'm thinking about writing a script that can be run over a whole site
and produce a report about broken links etc...

I've been playing with the urllib2 and httplib modules as a starting
point and have found that with urllib2 it doesn't seem possible to get
HTTP status codes.

I've had more success with httplib...
Firstly I create a new HTTPConnection object with a given hostname and
port then I try connecting to the host and catch any socket errors
which I can assume mean the server is either down or doesn't exist at
this place anymore.
If the connection was successful I try requesting the resource in
question, I then get the response and check the status code.

So, I've got the tools I need to do the job sufficiently. Just
wondering whether anybody can recommend any alternatives.

Cheers,
-Blair
 
B

blair.bethwaite

Rene said:
(e-mail address removed):

except urllib2.HTTPError, e:
if e.code == 403:

Thanks. Is there documentation for this available somewhere online, I
can't see it to obviously in the library reference?

Cheers,
-Blair
 
R

Rene Pijlman

(e-mail address removed):
Thanks. Is there documentation for this available somewhere online, I
can't see it to obviously in the library reference?

No, this seems to be missing from the documentation.
 
S

Scott David Daniels

Thanks. Is there documentation for this available somewhere online, I
can't see it to obviously in the library reference?

You can help by mentioning where you'd most expect to find it in a
Python documentation bug (or enhancement) report. Then you to can be a
Python contributor.

--Scott David Daniels
(e-mail address removed)
 
G

Guest

(e-mail address removed) a écrit :
Hi Folks,

I'm thinking about writing a script that can be run over a whole site
and produce a report about broken links etc...

I've been playing with the urllib2 and httplib modules as a starting
point and have found that with urllib2 it doesn't seem possible to get
HTTP status codes.

I've had more success with httplib...
Firstly I create a new HTTPConnection object with a given hostname and
port then I try connecting to the host and catch any socket errors
which I can assume mean the server is either down or doesn't exist at
this place anymore.
If the connection was successful I try requesting the resource in
question, I then get the response and check the status code.

So, I've got the tools I need to do the job sufficiently. Just
wondering whether anybody can recommend any alternatives.

Cheers,
-Blair
have a look at

urllib2 - The Missing Manual

http://www.voidspace.org.uk/python/articles/urllib2.shtml
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top