Determine Whether File Exists On HTTP Server

O

OvErboRed

Hi, I'm trying to determine whether a given URL exists. I'm new to Python
but I think that urllib is the tool for the job. However, if I give it a
non-existent file, it simply returns the 404 page. Aside from grepping this
for '404', is there a better way to do this? (Preferrably, there is a
solution that can be applied to both HTTP and FTP.) Thanks in advance.
 
T

Troy Melhase

Hi, I'm trying to determine whether a given URL exists. I'm new to Python
but I think that urllib is the tool for the job. However, if I give it a
non-existent file, it simply returns the 404 page. Aside from grepping this
for '404', is there a better way to do this? (Preferrably, there is a
solution that can be applied to both HTTP and FTP.) Thanks in advance.

Try urllib2.urlopen, and put a try/except block around it. Here's what an
unhandled exception from a 404 response looks like:

Python 2.3.3 (#1, May 14 2004, 09:49:22)
[GCC 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/urllib2.py", line 129, in urlopen
return _opener.open(url, data)
File "/usr/lib/python2.3/urllib2.py", line 326, in open
'_open', req)
File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
result = func(*args)
File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
return self.do_open(httplib.HTTP, req)
File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
return self.parent.error('http', req, fp, code, msg, hdrs)
File "/usr/lib/python2.3/urllib2.py", line 346, in error
result = self._call_chain(*args)
File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
result = func(*args)
File "/usr/lib/python2.3/urllib2.py", line 472, in http_error_302
return self.parent.open(new)
File "/usr/lib/python2.3/urllib2.py", line 326, in open
'_open', req)
File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
result = func(*args)
File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
return self.do_open(httplib.HTTP, req)
File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
return self.parent.error('http', req, fp, code, msg, hdrs)
File "/usr/lib/python2.3/urllib2.py", line 352, in error
return self._call_chain(*args)
File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
result = func(*args)
File "/usr/lib/python2.3/urllib2.py", line 412, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
 
F

FeU Hagen

This works with HTTP:

import sys # exc_info
import httplib # HTTPConnection

HOST = "www.python.org"
PAGE = "/path/to/some/file.html"

try:
c = httplib.HTTPConnection( HOST )
# c._http_vsn = 10; c._http_vsn_str = "HTTP/1.0"
c.connect( )
c.putrequest ( "GET", PAGE )
c.endheaders()
r = c.getresponse()
print "%s\n%s\n%s\n" % (r.status, r.reason, r.msg)
if r.status == 200: # OK
print "%s exists" % PAGE
PageContent = r.read() # this is the requested html file in a
string
elif r.status == 404: # not found
print "%s does not exist" % PAGE
Page404 = r.read() # this is the 404 page in a string
else:
print "%s : status %s %s %s" % (PAGE, r.status, r.reason, r.msg)
except:
print sys.exc_info()[1]



Greetings
Harald Walter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top