urllib2 - 403 that _should_ not occur.

Discussion in 'Python' started by James Mills, Jan 12, 2009.

  1. James Mills

    James Mills Guest

    Hey all,

    The following fails for me:

    >>> from urllib2 import urlopen
    >>> f = urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib/python2.6/urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
    File "/usr/lib/python2.6/urllib2.py", line 389, in open
    response = meth(req, response)
    File "/usr/lib/python2.6/urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
    File "/usr/lib/python2.6/urllib2.py", line 427, in error
    return self._call_chain(*args)
    File "/usr/lib/python2.6/urllib2.py", line 361, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.6/urllib2.py", line 510, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
    urllib2.HTTPError: HTTP Error 403: Forbidden
    >>>


    However, that _same_ url works perfectly fine on the
    same machine (and same network) using any of:
    * curl
    * wget
    * elinks
    * firefox

    Any helpful ideas ?

    cheers
    James

    --
    -- "Problems are solved by method"
     
    James Mills, Jan 12, 2009
    #1
    1. Advertising

  2. James Mills

    ajaksu Guest

    On Jan 11, 11:59 pm, "James Mills" <>
    wrote:
    > Hey all,
    >
    > The following fails for me:
    >
    > >>> from urllib2 import urlopen
    > >>> f = urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")

    >
    > Traceback (most recent call last):

    [...]
    > Any helpful ideas ?


    Maybe raise a real bug @ bugs.python.org instead of just mentioning it
    like I did: http://bugs.python.org/msg77889

    I think at least some sites would be willing to add the new UA to
    their whitelists.

    HTH,
    Daniel
     
    ajaksu, Jan 12, 2009
    #2
    1. Advertising

  3. On Jan 12, 2009, at 6:48 PM, ajaksu wrote:

    > On Jan 11, 11:59 pm, "James Mills" <>
    > wrote:
    >> Hey all,
    >>
    >> The following fails for me:
    >>
    >>>>> from urllib2 import urlopen
    >>>>> f = urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml
    >>>>> ")

    >>
    >> Traceback (most recent call last):

    > [...]
    >> Any helpful ideas ?

    >
    > Maybe raise a real bug @ bugs.python.org instead of just mentioning it
    > like I did: http://bugs.python.org/msg77889
    >
    > I think at least some sites would be willing to add the new UA to
    > their whitelists.


    I don't think I understand you clearly. Whether or not Google et al
    whitelist the Python UA isn't a Python issue, is it?
     
    Philip Semanchuk, Jan 13, 2009
    #3
  4. James Mills

    Steve Holden Guest

    Philip Semanchuk wrote:
    >
    > On Jan 12, 2009, at 6:48 PM, ajaksu wrote:
    >
    >> On Jan 11, 11:59 pm, "James Mills" <>
    >> wrote:
    >>> Hey all,
    >>>
    >>> The following fails for me:
    >>>
    >>>>>> from urllib2 import urlopen
    >>>>>> f =
    >>>>>> urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")
    >>>>>>
    >>>
    >>> Traceback (most recent call last):

    >> [...]
    >>> Any helpful ideas ?

    >>
    >> Maybe raise a real bug @ bugs.python.org instead of just mentioning it
    >> like I did: http://bugs.python.org/msg77889
    >>
    >> I think at least some sites would be willing to add the new UA to
    >> their whitelists.

    >
    > I don't think I understand you clearly. Whether or not Google et al
    > whitelist the Python UA isn't a Python issue, is it?
    >

    I'd say it's an issue relevant to Python users, which woudl seem to put
    it pretty much in the mainstream for c.l.py - especially as the code
    causing concern was written in Python.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Jan 13, 2009
    #4
  5. On Jan 13, 2009, at 1:22 AM, Steve Holden wrote:

    > Philip Semanchuk wrote:
    >>
    >> On Jan 12, 2009, at 6:48 PM, ajaksu wrote:
    >>
    >>> On Jan 11, 11:59 pm, "James Mills" <>
    >>> wrote:
    >>>> Hey all,
    >>>>
    >>>> The following fails for me:
    >>>>
    >>>>>>> from urllib2 import urlopen
    >>>>>>> f =
    >>>>>>> urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml
    >>>>>>> ")
    >>>>>>>
    >>>>
    >>>> Traceback (most recent call last):
    >>> [...]
    >>>> Any helpful ideas ?
    >>>
    >>> Maybe raise a real bug @ bugs.python.org instead of just
    >>> mentioning it
    >>> like I did: http://bugs.python.org/msg77889
    >>>
    >>> I think at least some sites would be willing to add the new UA to
    >>> their whitelists.

    >>
    >> I don't think I understand you clearly. Whether or not Google et al
    >> whitelist the Python UA isn't a Python issue, is it?
    >>

    > I'd say it's an issue relevant to Python users, which woudl seem to
    > put
    > it pretty much in the mainstream for c.l.py - especially as the code
    > causing concern was written in Python.


    I didn't mean to imply that the conversation didn't belong here. I
    think that is perfectly appropriate. What I don't understand is the
    suggestion that Google's server config should be raised as a bug
    against Python. (i.e. "raise a real bug @ bugs.python.org...")
     
    Philip Semanchuk, Jan 13, 2009
    #5
  6. James Mills

    Steve Holden Guest

    Philip Semanchuk wrote:
    >
    > On Jan 13, 2009, at 1:22 AM, Steve Holden wrote:
    >
    >> Philip Semanchuk wrote:
    >>>
    >>> On Jan 12, 2009, at 6:48 PM, ajaksu wrote:
    >>>
    >>>> On Jan 11, 11:59 pm, "James Mills" <>
    >>>> wrote:
    >>>>> Hey all,
    >>>>>
    >>>>> The following fails for me:
    >>>>>
    >>>>>>>> from urllib2 import urlopen
    >>>>>>>> f =
    >>>>>>>> urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")
    >>>>>>>>
    >>>>>>>>
    >>>>>
    >>>>> Traceback (most recent call last):
    >>>> [...]
    >>>>> Any helpful ideas ?
    >>>>
    >>>> Maybe raise a real bug @ bugs.python.org instead of just mentioning it
    >>>> like I did: http://bugs.python.org/msg77889
    >>>>
    >>>> I think at least some sites would be willing to add the new UA to
    >>>> their whitelists.
    >>>
    >>> I don't think I understand you clearly. Whether or not Google et al
    >>> whitelist the Python UA isn't a Python issue, is it?
    >>>

    >> I'd say it's an issue relevant to Python users, which woudl seem to put
    >> it pretty much in the mainstream for c.l.py - especially as the code
    >> causing concern was written in Python.

    >
    > I didn't mean to imply that the conversation didn't belong here. I think
    > that is perfectly appropriate. What I don't understand is the suggestion
    > that Google's server config should be raised as a bug against Python.
    > (i.e. "raise a real bug @ bugs.python.org...")
    >

    Oh, I see! Yes, it's hard to know what actions anyone could take on such
    a bug report. I suppose the documentation could be modified to describe
    how some services require specific agents, but that wouldn't help a huge
    amount.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Jan 13, 2009
    #6
  7. James Mills

    Falcolas Guest

    On Jan 11, 6:59 pm, "James Mills" <>
    wrote:
    > Hey all,
    >
    > The following fails for me:
    >
    > >>> from urllib2 import urlopen
    > >>> f = urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")


    For what it's worth, I've had a similar problem with the urlopen as
    well. Using the library default urlopen results in an error, but if I
    build an opener with the basic handlers, it works just fine.

    >>> import urllib2
    >>> f = urllib2.urlopen("http://localhost:8000")

    Traceback (most recent call last):
    File "<pyshell#1>", line 1, in <module>
    f = urllib2.urlopen("http://localhost:8000")
    File "C:\Python25\lib\urllib2.py", line 121, in urlopen
    return _opener.open(url, data)
    File "C:\Python25\lib\urllib2.py", line 380, in open
    response = meth(req, response)
    File "C:\Python25\lib\urllib2.py", line 491, in http_response
    'http', request, response, code, msg, hdrs)
    File "C:\Python25\lib\urllib2.py", line 418, in error
    return self._call_chain(*args)
    File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
    result = func(*args)
    File "C:\Python25\lib\urllib2.py", line 499, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
    HTTPError: HTTP Error 403: Forbidden
    >>> opener = urllib2.OpenerDirector()
    >>> opener.add_handler(urllib2.HTTPHandler())
    >>> opener.add_handler(urllib2.HTTPDefaultErrorHandler())
    >>> f = opener.open("http://localhost:8000")
    >>> f.read()

    'something relevant'
     
    Falcolas, Jan 13, 2009
    #7
  8. James Mills

    ajaksu Guest

    On Jan 13, 1:33 am, Philip Semanchuk <> wrote:
    > I don't think I understand you clearly. Whether or not Google et al  
    > whitelist the Python UA isn't a Python issue, is it?


    Hi, sorry for taking so long to reply :)

    I imagine it's something akin to Firefox's 'Report broken website':
    evangelism.

    IMHO, if the PSF *cough* Steve *cough* or individual Python hackers
    can contact key sites (as Wikipedia, groups.google, etc.) the issue
    can be solved sooner.

    Instead of waiting for each whitelist maintainer's to find out we have
    a new UA, go out and tell them. A template for such requests could
    help those inside e.g. Google to bring the issue to the attention of
    the whitelist admins. The community has lots of connections that could
    be useful to pass the message along, if only 'led by the nose' to
    achieve that :)

    Hence, the suggestion to raise a bug.

    Regards,
    Daniel
     
    ajaksu, Jan 14, 2009
    #8
  9. On Jan 13, 2009, at 9:42 PM, ajaksu wrote:

    > On Jan 13, 1:33 am, Philip Semanchuk <> wrote:
    >> I don't think I understand you clearly. Whether or not Google et al
    >> whitelist the Python UA isn't a Python issue, is it?

    >
    > Hi, sorry for taking so long to reply :)
    >
    > I imagine it's something akin to Firefox's 'Report broken website':
    > evangelism.
    >
    > IMHO, if the PSF *cough* Steve *cough* or individual Python hackers
    > can contact key sites (as Wikipedia, groups.google, etc.) the issue
    > can be solved sooner.
    >
    > Instead of waiting for each whitelist maintainer's to find out we have
    > a new UA, go out and tell them. A template for such requests could
    > help those inside e.g. Google to bring the issue to the attention of
    > the whitelist admins. The community has lots of connections that could
    > be useful to pass the message along, if only 'led by the nose' to
    > achieve that :)
    >
    > Hence, the suggestion to raise a bug.


    Gotcha.

    In this case I think there is no whitelist. I think Google has a
    default accept policy supplemented with a blacklist rather than a
    default ban policy mitigated by a whitelist. As evidence I submit the
    fact that my user agent of "funny fish" was accepted. In other words,
    Google has taken explicit steps to ban agents sending the default
    Python UA. Now, if the default UA changed in Python 3.0, maybe the
    best thing to do is keep quiet and maybe it will fly under the Google
    radar for a while. =)

    Cheers
    Philip
     
    Philip Semanchuk, Jan 14, 2009
    #9
  10. James Mills

    Steve Holden Guest

    ajaksu wrote:
    > On Jan 13, 1:33 am, Philip Semanchuk <> wrote:
    >> I don't think I understand you clearly. Whether or not Google et al
    >> whitelist the Python UA isn't a Python issue, is it?

    >
    > Hi, sorry for taking so long to reply :)
    >
    > I imagine it's something akin to Firefox's 'Report broken website':
    > evangelism.
    >
    > IMHO, if the PSF *cough* Steve *cough* or individual Python hackers
    > can contact key sites (as Wikipedia, groups.google, etc.) the issue
    > can be solved sooner.
    >
    > Instead of waiting for each whitelist maintainer's to find out we have
    > a new UA, go out and tell them. A template for such requests could
    > help those inside e.g. Google to bring the issue to the attention of
    > the whitelist admins. The community has lots of connections that could
    > be useful to pass the message along, if only 'led by the nose' to
    > achieve that :)
    >
    > Hence, the suggestion to raise a bug.
    >

    OK, but be aware that the PSF doesn't monitor the bugs looking for
    actions to take on behalf of the Python user community. In fact we
    aren't overtly "political" in this way at all. This doesn't mean it
    wouldn't be useful for the PSF to get involved in this role; just that
    right now it isn't, and a bug report probably isn't the best way to get
    action.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Jan 14, 2009
    #10
  11. James Mills

    ajaksu Guest

    On Jan 14, 5:14 am, Steve Holden <> wrote:
    > ajaksu wrote:
    >> [snip evangelism stuff]

    > OK, but be aware that the PSF doesn't monitor the bugs looking for
    > actions to take on behalf of the Python user community. In fact we
    > aren't overtly "political" in this way at all. This doesn't mean it
    > wouldn't be useful for the PSF to get involved in this role; just that
    > right now it isn't, and a bug report probably isn't the best way to get
    > action.


    Acknowledged. I have posted a (pretty poor) support request @
    http://groups.google.com/group/Google-Groups-Basics/ and suggest
    others do the same for Wikipedia and other big sites that block 3.0 (I
    might build a list of those later today). Maybe a wiki page, some blog
    posts, etc.

    Best regards,
    Daniel

    Request: http://groups.google.com/group/Google-Groups-Basics/browse_thread/thread/498a39a89d81b650#
    """
    Hi,
    As mentioned in a comp.lang.python thread[1], the new version of
    Python (3.0) cannot open pages @ groups.google.com.

    It seems the UA of Python 3.0 ("User-Agent: Python-urllib/3.1") is
    actively blocked, while that of Python 2.5 ("User-Agent: Python-urllib/
    1.17") isn't.

    This message is a call for help so that we can get Python 3.0 working
    with groups.google.com. Is this the right place to bring the issue to
    the attention of those that can fix it? Does anyone have a contact
    that could speed up getting Python 3.0 working?

    Thanks in advance,
    Daniel

    [1]http://groups.google.com/group/comp.lang.python/browse_thread/
    thread/088491d5a0d86f1b
    """
     
    ajaksu, Jan 14, 2009
    #11
  12. James Mills

    Steve Holden Guest

    ajaksu wrote:
    > On Jan 14, 5:14 am, Steve Holden <> wrote:
    >> ajaksu wrote:
    >>> [snip evangelism stuff]

    >> OK, but be aware that the PSF doesn't monitor the bugs looking for
    >> actions to take on behalf of the Python user community. In fact we
    >> aren't overtly "political" in this way at all. This doesn't mean it
    >> wouldn't be useful for the PSF to get involved in this role; just that
    >> right now it isn't, and a bug report probably isn't the best way to get
    >> action.

    >
    > Acknowledged. I have posted a (pretty poor) support request @
    > http://groups.google.com/group/Google-Groups-Basics/ and suggest
    > others do the same for Wikipedia and other big sites that block 3.0 (I
    > might build a list of those later today). Maybe a wiki page, some blog
    > posts, etc.
    >
    > Best regards,
    > Daniel
    >
    > Request: http://groups.google.com/group/Google-Groups-Basics/browse_thread/thread/498a39a89d81b650#
    > """
    > Hi,
    > As mentioned in a comp.lang.python thread[1], the new version of
    > Python (3.0) cannot open pages @ groups.google.com.
    >
    > It seems the UA of Python 3.0 ("User-Agent: Python-urllib/3.1") is
    > actively blocked, while that of Python 2.5 ("User-Agent: Python-urllib/
    > 1.17") isn't.
    >
    > This message is a call for help so that we can get Python 3.0 working
    > with groups.google.com. Is this the right place to bring the issue to
    > the attention of those that can fix it? Does anyone have a contact
    > that could speed up getting Python 3.0 working?
    >
    > Thanks in advance,
    > Daniel
    >
    > [1]http://groups.google.com/group/comp.lang.python/browse_thread/
    > thread/088491d5a0d86f1b
    > """

    Thanks very much. It's good to see Python users taking action that will
    lead to benefits for all. Congratulations on taking the initiative.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Jan 14, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon Storr
    Replies:
    0
    Views:
    496
    Simon Storr
    Jul 14, 2003
  2. Miguel Dias Moura
    Replies:
    4
    Views:
    483
    Hans Kesting
    May 6, 2004
  3. Josef Cihal
    Replies:
    0
    Views:
    881
    Josef Cihal
    Sep 5, 2005
  4. Chris Mellon
    Replies:
    2
    Views:
    345
    Chris Mellon
    Jan 12, 2009
  5. willem joubert

    Error 403-Error 403-Error 403

    willem joubert, Feb 8, 2005, in forum: ASP .Net Web Services
    Replies:
    1
    Views:
    206
    Bruce Johnson [C# MVP]
    Feb 8, 2005
Loading...

Share This Page