Determine Whether File Exists On HTTP Server

Discussion in 'Python' started by OvErboRed, May 22, 2004.

  1. OvErboRed

    OvErboRed Guest

    Hi, I'm trying to determine whether a given URL exists. I'm new to Python
    but I think that urllib is the tool for the job. However, if I give it a
    non-existent file, it simply returns the 404 page. Aside from grepping this
    for '404', is there a better way to do this? (Preferrably, there is a
    solution that can be applied to both HTTP and FTP.) Thanks in advance.
     
    OvErboRed, May 22, 2004
    #1
    1. Advertising

  2. OvErboRed

    Troy Melhase Guest

    On Saturday 22 May 2004 12:28 am, OvErboRed wrote:
    > Hi, I'm trying to determine whether a given URL exists. I'm new to Python
    > but I think that urllib is the tool for the job. However, if I give it a
    > non-existent file, it simply returns the 404 page. Aside from grepping this
    > for '404', is there a better way to do this? (Preferrably, there is a
    > solution that can be applied to both HTTP and FTP.) Thanks in advance.


    Try urllib2.urlopen, and put a try/except block around it. Here's what an
    unhandled exception from a 404 response looks like:

    Python 2.3.3 (#1, May 14 2004, 09:49:22)
    [GCC 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import urllib2
    >>> handle = urllib2.urlopen('http://google.com/this_page_doesnt_exist')

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "/usr/lib/python2.3/urllib2.py", line 129, in urlopen
    return _opener.open(url, data)
    File "/usr/lib/python2.3/urllib2.py", line 326, in open
    '_open', req)
    File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
    return self.do_open(httplib.HTTP, req)
    File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
    return self.parent.error('http', req, fp, code, msg, hdrs)
    File "/usr/lib/python2.3/urllib2.py", line 346, in error
    result = self._call_chain(*args)
    File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.3/urllib2.py", line 472, in http_error_302
    return self.parent.open(new)
    File "/usr/lib/python2.3/urllib2.py", line 326, in open
    '_open', req)
    File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
    return self.do_open(httplib.HTTP, req)
    File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
    return self.parent.error('http', req, fp, code, msg, hdrs)
    File "/usr/lib/python2.3/urllib2.py", line 352, in error
    return self._call_chain(*args)
    File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.3/urllib2.py", line 412, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
    urllib2.HTTPError: HTTP Error 404: Not Found

    --
    Troy Melhase,
    --
    When Christ calls a man, he bids him come and die. - Dietrich Bonhoeffer
     
    Troy Melhase, May 22, 2004
    #2
    1. Advertising

  3. OvErboRed

    FeU Hagen Guest

    This works with HTTP:

    import sys # exc_info
    import httplib # HTTPConnection

    HOST = "www.python.org"
    PAGE = "/path/to/some/file.html"

    try:
    c = httplib.HTTPConnection( HOST )
    # c._http_vsn = 10; c._http_vsn_str = "HTTP/1.0"
    c.connect( )
    c.putrequest ( "GET", PAGE )
    c.endheaders()
    r = c.getresponse()
    print "%s\n%s\n%s\n" % (r.status, r.reason, r.msg)
    if r.status == 200: # OK
    print "%s exists" % PAGE
    PageContent = r.read() # this is the requested html file in a
    string
    elif r.status == 404: # not found
    print "%s does not exist" % PAGE
    Page404 = r.read() # this is the 404 page in a string
    else:
    print "%s : status %s %s %s" % (PAGE, r.status, r.reason, r.msg)
    except:
    print sys.exc_info()[1]



    Greetings
    Harald Walter



    "OvErboRed" <> wrote in message
    news:Xns94F1EA84483Byangstaoverbored@127.0.0.1...
    > Hi, I'm trying to determine whether a given URL exists. I'm new to Python
    > but I think that urllib is the tool for the job. However, if I give it a
    > non-existent file, it simply returns the 404 page. Aside from grepping

    this
    > for '404', is there a better way to do this? (Preferrably, there is a
    > solution that can be applied to both HTTP and FTP.) Thanks in advance.
     
    FeU Hagen, May 22, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Totan
    Replies:
    0
    Views:
    1,080
    Totan
    Apr 17, 2006
  2. sword
    Replies:
    5
    Views:
    429
    Moonlit
    Jul 30, 2006
  3. Malte Forkel
    Replies:
    0
    Views:
    90
    Malte Forkel
    Nov 25, 2013
  4. Chris Angelico
    Replies:
    0
    Views:
    94
    Chris Angelico
    Nov 25, 2013
  5. Malte Forkel
    Replies:
    2
    Views:
    91
    Malte Forkel
    Nov 27, 2013
Loading...

Share This Page