mysteries of urllib/urllib2

Discussion in 'Python' started by Adrian Smith, Jul 3, 2007.

  1. Adrian Smith

    Adrian Smith Guest

    I'm trying to use urllib2 to download a page (I'd rather use urllib,
    but I need to change the User-Agent header to look like a browser or
    G**gle won't send it to me, the big meanies). The following (pinched
    from Dive Into Python) seems to work perfectly in Idle, but falls at
    the final hurdle when run as a cgi script - can anyone suggest
    anything I may have overlooked?

    request = urllib2.Request(some_URL)
    request.add_header('User-Agent', 'some_plausible_string')
    opener = urllib2.build_opener()
    data = opener.open(request).read()
     
    Adrian Smith, Jul 3, 2007
    #1
    1. Advertising

  2. On Jul 3, 9:43 am, Adrian Smith <> wrote:
    > The following (pinched
    > from Dive Into Python) seems to work perfectly in Idle, but falls at
    > the final hurdle when run as a cgi script - can anyone suggest
    > anything I may have overlooked?
    >
    > request = urllib2.Request(some_URL)
    > request.add_header('User-Agent', 'some_plausible_string')
    > opener = urllib2.build_opener()
    > data = opener.open(request).read()


    Most likely the account that cgi script is running as does not have
    permissions to access the net. Check the traceback to be sure. Put
    this at the top of your cgi script:

    import cgitb; cgitb.enable()

    --Ben
     
    Ben Cartwright, Jul 3, 2007
    #2
    1. Advertising

  3. Adrian Smith

    Adrian Smith Guest

    On Jul 3, 11:25 pm, Ben Cartwright <> wrote:
    > On Jul 3, 9:43 am, Adrian Smith <> wrote:
    >
    > > The following (pinched
    > > from Dive Into Python) seems to work perfectly in Idle, but
    > > falls at the final hurdle when run as a cgi script - can
    > > anyone suggest anything I may have overlooked?

    >
    > > request = urllib2.Request(some_URL)
    > > request.add_header('User-Agent', 'some_plausible_string')
    > > opener = urllib2.build_opener()
    > > data = opener.open(request).read()

    >
    > Most likely the account that cgi script is running as does not
    > have permissions to access the net. Check the traceback to be
    > sure. Put this at the top of your cgi script:
    >
    > import cgitb; cgitb.enable()


    Well, it worked with urllib (resulting in a G**gle 403 your-client-
    does-not-have-permission-to-get-urlX page), so I think it must have
    some access. Apparently there's a way to change the user-agent string
    by subclassing urllib's URLopener class, but that's beyond my comfort
    zone at present.
     
    Adrian Smith, Jul 3, 2007
    #3
  4. On Jul 3, 11:14 am, Adrian Smith <> wrote:
    > > > The following (pinched
    > > > from Dive Into Python) seems to work perfectly in Idle, but
    > > > falls at the final hurdle when run as a cgi script

    > > Put this at the top of your cgi script:

    >
    > > import cgitb; cgitb.enable()


    Did you even try this? Asking for Python help without posting the
    traceback is like phoning your mechanic and saying, "My car is making
    a generic rattling noise, can you tell me what the problem is without
    looking under the hood?"

    > Apparently there's a way to change the user-agent string
    > by subclassing urllib's URLopener class, but that's beyond my comfort
    > zone at present.


    Untested:

    import urllib
    url = 'http://groups.google.com/group/Google-AJAX-Search-API/
    browse_thread/thread/a0eb87ad13b11762'
    opener = urllib.FancyURLopener()
    opener.addheaders = [('User-Agent', 'Fauxzilla 4.0')]
    data = opener.open(url).read()

    Hope that helps,
    --Ben
     
    Ben Cartwright, Jul 3, 2007
    #4
  5. Adrian Smith

    John Nagle Guest

    Adrian Smith wrote:
    > I'm trying to use urllib2 to download a page (I'd rather use urllib,
    > but I need to change the User-Agent header to look like a browser or
    > G**gle won't send it to me, the big meanies). The following (pinched
    > from Dive Into Python) seems to work perfectly in Idle, but falls at
    > the final hurdle when run as a cgi script - can anyone suggest
    > anything I may have overlooked?
    >
    > request = urllib2.Request(some_URL)
    > request.add_header('User-Agent', 'some_plausible_string')
    > opener = urllib2.build_opener()
    > data = opener.open(request).read()


    I doubt that's the problem here, but don't use a USER-AGENT string
    that ends in "m" without a preceding "m" when the USER-AGENT
    string is the last element of the header. Coyote Point load balancers
    will drop the packet.

    (Coyote Point uses regular expressions to parse HTTP headers, and
    I think somebody wrote "\m" where they meant "\n".)

    John Nagle
     
    John Nagle, Jul 3, 2007
    #5
  6. * Adrian Smith <> [2007-07-03 08:14:32]:

    > some access. Apparently there's a way to change the user-agent string
    > by subclassing urllib's URLopener class, but that's beyond my comfort
    > zone at present.


    Read the urllib2 how-to located at ActiveState Documentation pages.
    That gives the concise snippets as how you will set the USER-AGENT string.
    <snip>
    import urllib
    import urllib2

    url = 'http://www.someserver.com/cgi-bin/register.cgi'
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    values = {'name' : 'Michael Foord',
    'location' : 'Northampton',
    'language' : 'Python' }
    headers = { 'User-Agent' : user_agent }

    data = urllib.urlencode(values)
    req = urllib2.Request(url, data, headers)
    response = urllib2.urlopen(req)
    the_page = response.read()
    </snip>

    --
    O.R.Senthil Kumaran
    http://uthcode.sarovar.org
     
    O.R.Senthil Kumaran, Jul 3, 2007
    #6
  7. Adrian Smith

    Adrian Smith Guest

    On Jul 4, 12:42 am, Ben Cartwright <> wrote:
    > On Jul 3, 11:14 am, Adrian Smith <> wrote:
    >
    > > > > The following (pinched
    > > > > from Dive Into Python) seems to work perfectly in Idle, but
    > > > > falls at the final hurdle when run as a cgi script
    > > > Put this at the top of your cgi script:

    >
    > > > import cgitb; cgitb.enable()

    >
    > Did you even try this? Asking for Python help without posting the
    > traceback is like phoning your mechanic and saying, "My car is
    > making a generic rattling noise, can you tell me what the problem
    > is without looking under the hood?"


    Sorry, I thought as the cgi did appear to have web access it wasn't
    applicable, and it's amazing what some mechanics can infer from engine
    noise. cgitb certainly does send back an impressive amount of
    information, I'll be sure to use it in future.

    > > Apparently there's a way to change the user-agent string
    > > by subclassing urllib's URLopener class, but that's beyond my
    > > comfort zone at present.

    >
    > Untested:
    >
    > import urllib
    > url = 'http://groups.google.com/group/Google-AJAX-Search-API/
    > browse_thread/thread/a0eb87ad13b11762'
    > opener = urllib.FancyURLopener()
    > opener.addheaders = [('User-Agent', 'Fauxzilla 4.0')]
    > data = opener.open(url).read()


    That works a treat, thanks!
     
    Adrian Smith, Jul 3, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon Devlin
    Replies:
    0
    Views:
    689
    Simon Devlin
    Jan 20, 2004
  2. Prem Mallappa

    BooK "Obfsucated C and other mysteries"

    Prem Mallappa, Apr 7, 2004, in forum: C Programming
    Replies:
    0
    Views:
    365
    Prem Mallappa
    Apr 7, 2004
  3. David Abrahams

    import mysteries

    David Abrahams, Jun 21, 2007, in forum: Python
    Replies:
    11
    Views:
    525
    Peter Otten
    Jul 5, 2007
  4. Fons Adriaensen

    Callback mysteries

    Fons Adriaensen, Mar 25, 2011, in forum: Python
    Replies:
    0
    Views:
    181
    Fons Adriaensen
    Mar 25, 2011
  5. Tom Cloyd

    newbie mysteries

    Tom Cloyd, Sep 11, 2005, in forum: Ruby
    Replies:
    9
    Views:
    176
    Tom Cloyd
    Sep 11, 2005
Loading...

Share This Page