n00b with urllib2: How to make it handle cookie automatically?

Discussion in 'Python' started by est, Feb 22, 2008.

  1. est

    est Guest

    Hi all,

    I need urllib2 do perform series of HTTP requests with cookie from
    PREVIOUS request(like our browsers usually do ). Many people suggest I
    use some library(e.g. pycURL) instead but I guess it's good practise
    for a python beginner to DIY something rather than use existing tools.

    So my problem is how to expand the urllib2 class

    from cookielib import CookieJar
    class SmartRequest():
    cj=CookieJar()
    def __init__(self, strUrl, strContent=None):
    self.Request = urllib2.Request(strUrl, strContent)
    self.cj.add_cookie_header(self.Request)
    self.Response = urllib2.urlopen(Request)
    self.cj.extract_cookies(self.Response, self.Request)
    def url
    def read(self, intCount):
    return self.Response.read(intCount)
    def headers(self, strHeaderName):
    return self.Response.headers[strHeaderName]

    The code does not work because each time SmartRequest is initiated,
    object 'cj' is cleared. How to avoid that?
    The only stupid solution I figured out is use a global CookieJar
    object. Is there anyway that could handle all this INSIDE the class?

    I am totally new to OOP & python programming, so could anyone give me
    some suggestions? Thanks in advance
    est, Feb 22, 2008
    #1
    1. Advertising

  2. est

    Rob Wolfe Guest

    est <> writes:

    > Hi all,
    >
    > I need urllib2 do perform series of HTTP requests with cookie from
    > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > use some library(e.g. pycURL) instead but I guess it's good practise
    > for a python beginner to DIY something rather than use existing tools.
    >
    > So my problem is how to expand the urllib2 class
    >
    > from cookielib import CookieJar
    > class SmartRequest():
    > cj=CookieJar()
    > def __init__(self, strUrl, strContent=None):
    > self.Request = urllib2.Request(strUrl, strContent)
    > self.cj.add_cookie_header(self.Request)
    > self.Response = urllib2.urlopen(Request)
    > self.cj.extract_cookies(self.Response, self.Request)
    > def url
    > def read(self, intCount):
    > return self.Response.read(intCount)
    > def headers(self, strHeaderName):
    > return self.Response.headers[strHeaderName]
    >
    > The code does not work because each time SmartRequest is initiated,
    > object 'cj' is cleared. How to avoid that?
    > The only stupid solution I figured out is use a global CookieJar
    > object. Is there anyway that could handle all this INSIDE the class?
    >
    > I am totally new to OOP & python programming, so could anyone give me
    > some suggestions? Thanks in advance


    Google for urllib2.HTTPCookieProcessor.

    HTH,
    Rob
    Rob Wolfe, Feb 22, 2008
    #2
    1. Advertising

  3. On Thu, 21 Feb 2008 22:50:49 -0800 (PST), est <>
    declaimed the following in comp.lang.python:

    <snip>
    >
    > from cookielib import CookieJar
    > class SmartRequest():
    > cj=CookieJar()
    > def __init__(self, strUrl, strContent=None):
    > self.Request = urllib2.Request(strUrl, strContent)
    > self.cj.add_cookie_header(self.Request)
    > self.Response = urllib2.urlopen(Request)
    > self.cj.extract_cookies(self.Response, self.Request)
    > def url
    > def read(self, intCount):
    > return self.Response.read(intCount)
    > def headers(self, strHeaderName):
    > return self.Response.headers[strHeaderName]
    >
    > The code does not work because each time SmartRequest is initiated,
    > object 'cj' is cleared. How to avoid that?


    Well... maybe by not creating new SmartRequest instances, but reuse
    the one instance for the transaction.

    UNTESTED -- this is a mental exercise only:

    class SmartTransaction(object): #new style class
    def __init__(self):
    self.cj = CookieJar()
    def doRequest(self, URL, Content=None): #python names are untyped
    #objects have types
    #so it is rare to see
    # <type>Name forms
    self.request = urllib2.Request(URL, Content)
    self.cj.add_cookie_header(self.request)
    self.response = urllib.urlopen(self.request)
    self.cj.extract_cookies(self.response, self.request)

    myTransaction = SmartTransaction()
    myTransaction.doRequest(aURL)
    myTransaction.doRequest(aFollowUpURL, someContent)
    ....
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
    Dennis Lee Bieber, Feb 22, 2008
    #3
  4. est

    7stud Guest

    On Feb 21, 11:50 pm, est <> wrote:
    > Hi all,
    >
    > I need urllib2 do perform series of HTTP requests with cookie from
    > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > use some library(e.g. pycURL) instead but I guess it's good practise
    > for a python beginner to DIY something rather than use existing tools.
    >
    > So my problem is how to expand the urllib2 class
    >
    > from cookielib import CookieJar
    > class SmartRequest():
    >     cj=CookieJar()
    >     def __init__(self, strUrl, strContent=None):
    >         self.Request    =   urllib2.Request(strUrl, strContent)
    >         self.cj.add_cookie_header(self.Request)
    >         self.Response   =   urllib2.urlopen(Request)
    >         self.cj.extract_cookies(self.Response, self.Request)
    >     def url
    >     def read(self, intCount):
    >         return self.Response.read(intCount)
    >     def headers(self, strHeaderName):
    >         return self.Response.headers[strHeaderName]
    >
    > The code does not work because each time SmartRequest is initiated,
    > object 'cj' is cleared.


    That's because every time you create a SmartRequest, this line
    executes:

    cj=CookieJar()

    That creates a new, *empty* cookie jar, i.e. it has no knowledge of
    any previously set cookies.

    > How to avoid that?


    If you read the docs on the cookielib module, and in particular
    CookieJar objects, you will notice that CookieJar objects are
    described in a section that is titled: CookieJar and FileCookieJar
    Objects.

    Hmm...I wonder what the difference is between a CookieJar object and a
    FileCookieJar Object?

    ----------
    FileCookieJar implements the following additional methods:

    save(filename=None, ignore_discard=False, ignore_expires=False)
    Save cookies to a file.

    load(filename=None, ignore_discard=False, ignore_expires=False)
    Load cookies from a file.
    --------

    That seems promising.
    7stud, Feb 22, 2008
    #4
  5. est

    7stud Guest

    On Feb 21, 11:50 pm, est <> wrote:
    > Hi all,
    >
    > I need urllib2 do perform series of HTTP requests with cookie from
    > PREVIOUS request(like our browsers usually do ).
    >


    Cookies from a previous request made in the currently running
    program? Or cookies from requests that were made when you previously
    ran the program?

    >
    > from cookielib import CookieJar
    > class SmartRequest():
    >     cj=CookieJar()
    >     def __init__(self, strUrl, strContent=None):
    >         self.Request    =   urllib2.Request(strUrl, strContent)
    >         self.cj.add_cookie_header(self.Request)
    >         self.Response   =   urllib2.urlopen(Request)
    >         self.cj.extract_cookies(self.Response, self.Request)
    >     def url
    >     def read(self, intCount):
    >         return self.Response.read(intCount)
    >     def headers(self, strHeaderName):
    >         return self.Response.headers[strHeaderName]
    >
    > The code does not work because each time SmartRequest is initiated,
    > object 'cj' is cleared. How to avoid that?
    > The only stupid solution I figured out is use a global CookieJar
    > object. Is there anyway that could handle all this INSIDE the class?
    >


    Examine this code and its output:

    class SmartRequest(object):
    def __init__(self, id):
    if not getattr(SmartRequest, 'cj', None):
    SmartRequest.cj = "I'm a cookie jar. Created by request:
    %s" % id


    r1 = SmartRequest(1)
    r2 = SmartRequest(2)

    print r1.cj
    print r2.cj

    --output:--
    I'm a cookie jar. Created by request: 1
    I'm a cookie jar. Created by request: 1
    7stud, Feb 22, 2008
    #5
  6. est

    7stud Guest

    On Feb 21, 11:50 pm, est <> wrote:
    >
    > class SmartRequest():
    >


    You should always define a class like this:

    class SmartRequest(object):


    unless you know of a specific reason not to.
    7stud, Feb 23, 2008
    #6
  7. est

    Steve Holden Guest

    7stud wrote:
    > On Feb 21, 11:50 pm, est <> wrote:
    >> class SmartRequest():
    >>

    >
    > You should always define a class like this:
    >
    > class SmartRequest(object):
    >
    >
    > unless you know of a specific reason not to.
    >
    >

    It's much easier, though, just to put

    __metaclass__ = type

    at the start of any module where you want exlusively new-style objects.
    And I do agree that you should use exclusively new-style objects without
    a good reason for not doing, though thanks to Guido's hard work it
    mostly doesn't matter.

    $ cat test94.py
    __metaclass__ = type

    class Rhubarb:
    pass

    rhubarb = Rhubarb()

    print type(Rhubarb)
    print type(rhubarb)


    $ python test94.py
    <type 'type'>
    <class '__main__.Rhubarb'>

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
    Steve Holden, Feb 23, 2008
    #7
  8. est

    est Guest

    On Feb 23, 5:57 am, 7stud <> wrote:
    > On Feb 21, 11:50 pm, est <> wrote:
    >
    > > Hi all,

    >
    > > I need urllib2 do perform series of HTTP requests with cookie from
    > > PREVIOUS request(like our browsers usually do ).

    >
    > Cookies from a previous request made in the currently running
    > program? Or cookies from requests that were made when you previously
    > ran the program?
    >
    >
    >
    >
    >
    >
    >
    > > from cookielib import CookieJar
    > > class SmartRequest():
    > > cj=CookieJar()
    > > def __init__(self, strUrl, strContent=None):
    > > self.Request = urllib2.Request(strUrl, strContent)
    > > self.cj.add_cookie_header(self.Request)
    > > self.Response = urllib2.urlopen(Request)
    > > self.cj.extract_cookies(self.Response, self.Request)
    > > def url
    > > def read(self, intCount):
    > > return self.Response.read(intCount)
    > > def headers(self, strHeaderName):
    > > return self.Response.headers[strHeaderName]

    >
    > > The code does not work because each time SmartRequest is initiated,
    > > object 'cj' is cleared. How to avoid that?
    > > The only stupid solution I figured out is use a global CookieJar
    > > object. Is there anyway that could handle all this INSIDE the class?

    >
    > Examine this code and its output:
    >
    > class SmartRequest(object):
    > def __init__(self, id):
    > if not getattr(SmartRequest, 'cj', None):
    > SmartRequest.cj = "I'm a cookie jar. Created by request:


    the getattr method is exactly what I am looking for, thanks!


    On Feb 23, 2:05 pm, 7stud <> wrote:
    > On Feb 21, 11:50 pm, est <> wrote:
    >
    >
    >
    > > class SmartRequest():

    >
    > You should always define a class like this:
    >
    > class SmartRequest(object):
    >
    > unless you know of a specific reason not to.


    Thanks for the advice!
    est, Feb 24, 2008
    #8
  9. est

    est Guest

    On Feb 23, 2:42 am, Rob Wolfe <> wrote:
    > est <> writes:
    > > Hi all,

    >
    > > I need urllib2 do perform series of HTTP requests with cookie from
    > > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > > use some library(e.g. pycURL) instead but I guess it's good practise
    > > for a python beginner to DIY something rather than use existing tools.

    >
    > > So my problem is how to expand the urllib2 class

    >
    > > from cookielib import CookieJar
    > > class SmartRequest():
    > >     cj=CookieJar()
    > >     def __init__(self, strUrl, strContent=None):
    > >         self.Request    =   urllib2.Request(strUrl, strContent)
    > >         self.cj.add_cookie_header(self.Request)
    > >         self.Response   =   urllib2.urlopen(Request)
    > >         self.cj.extract_cookies(self.Response, self.Request)
    > >     def url
    > >     def read(self, intCount):
    > >         return self.Response.read(intCount)
    > >     def headers(self, strHeaderName):
    > >         return self.Response.headers[strHeaderName]

    >
    > > The code does not work because each time SmartRequest is initiated,
    > > object 'cj' is cleared. How to avoid that?
    > > The only stupid solution I figured out is use a global CookieJar
    > > object. Is there anyway that could handle all this INSIDE the class?

    >
    > > I am totally new to OOP & python programming, so could anyone give me
    > > some suggestions? Thanks in advance

    >
    > Google for urllib2.HTTPCookieProcessor.
    >
    > HTH,
    > Rob- Hide quoted text -
    >
    > - Show quoted text -


    Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
    solved this problem by the following code.

    class HTTPRefererProcessor(urllib2.BaseHandler):
    """Add Referer header to requests.

    This only makes sense if you use each RefererProcessor for a
    single
    chain of requests only (so, for example, if you use a single
    HTTPRefererProcessor to fetch a series of URLs extracted from a
    single
    page, this will break).

    There's a proper implementation of this in module mechanize.

    """
    def __init__(self):
    self.referer = None

    def http_request(self, request):
    if ((self.referer is not None) and
    not request.has_header("Referer")):
    request.add_unredirected_header("Referer", self.referer)
    return request

    def http_response(self, request, response):
    self.referer = response.geturl()
    return response

    https_request = http_request
    https_response = http_response

    def main():
    cj = CookieJar()
    opener = urllib2.build_opener(
    urllib2.HTTPCookieProcessor(cj),
    HTTPRefererProcessor(),
    )
    urllib2.install_opener(opener)

    urllib2.urlopen(url1)
    urllib2.urlopen(url2)

    if "__main__" == __name__:
    main()

    And it's working great!

    Once again, thanks everyone!
    est, Feb 24, 2008
    #9
  10. est

    7stud Guest

    On Feb 24, 4:41 am, est <> wrote:
    > On Feb 23, 2:42 am, Rob Wolfe <> wrote:
    >
    >
    >
    > > est <> writes:
    > > > Hi all,

    >
    > > > I need urllib2 do perform series of HTTP requests with cookie from
    > > > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > > > use some library(e.g. pycURL) instead but I guess it's good practise
    > > > for a python beginner to DIY something rather than use existing tools.

    >
    > > > So my problem is how to expand the urllib2 class

    >
    > > > from cookielib import CookieJar
    > > > class SmartRequest():
    > > >     cj=CookieJar()
    > > >     def __init__(self, strUrl, strContent=None):
    > > >         self.Request    =   urllib2.Request(strUrl, strContent)
    > > >         self.cj.add_cookie_header(self.Request)
    > > >         self.Response   =   urllib2.urlopen(Request)
    > > >         self.cj.extract_cookies(self.Response, self.Request)
    > > >     def url
    > > >     def read(self, intCount):
    > > >         return self.Response.read(intCount)
    > > >     def headers(self, strHeaderName):
    > > >         return self.Response.headers[strHeaderName]

    >
    > > > The code does not work because each time SmartRequest is initiated,
    > > > object 'cj' is cleared. How to avoid that?
    > > > The only stupid solution I figured out is use a global CookieJar
    > > > object. Is there anyway that could handle all this INSIDE the class?

    >
    > > > I am totally new to OOP & python programming, so could anyone give me
    > > > some suggestions? Thanks in advance

    >
    > > Google for urllib2.HTTPCookieProcessor.

    >
    > > HTH,
    > > Rob- Hide quoted text -

    >
    > > - Show quoted text -

    >
    > Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
    > solved this problem by the following code.
    >
    > class HTTPRefererProcessor(urllib2.BaseHandler):
    >     """Add Referer header to requests.
    >
    >     This only makes sense if you use each RefererProcessor for a
    > single
    >     chain of requests only (so, for example, if you use a single
    >     HTTPRefererProcessor to fetch a series of URLs extracted from a
    > single
    >     page, this will break).
    >
    >     There's a proper implementation of this in module mechanize.
    >
    >     """
    >     def __init__(self):
    >         self.referer = None
    >
    >     def http_request(self, request):
    >         if ((self.referer is not None) and
    >             not request.has_header("Referer")):
    >             request.add_unredirected_header("Referer", self.referer)
    >         return request
    >
    >     def http_response(self, request, response):
    >         self.referer = response.geturl()
    >         return response
    >
    >     https_request = http_request
    >     https_response = http_response
    >
    > def main():
    >     cj = CookieJar()
    >     opener = urllib2.build_opener(
    >         urllib2.HTTPCookieProcessor(cj),
    >         HTTPRefererProcessor(),
    >     )
    >     urllib2.install_opener(opener)
    >
    >     urllib2.urlopen(url1)
    >     urllib2.urlopen(url2)
    >
    > if "__main__" == __name__:
    >     main()
    >
    > And it's working great!
    >
    > Once again, thanks everyone!


    How does the class HTTPReferrerProcessor do anything useful for you?
    7stud, Feb 24, 2008
    #10
  11. est

    est Guest

    On Feb 25, 5:46 am, 7stud <> wrote:
    > On Feb 24, 4:41 am, est <> wrote:
    >
    >
    >
    >
    >
    > > On Feb 23, 2:42 am, Rob Wolfe <> wrote:

    >
    > > > est <> writes:
    > > > > Hi all,

    >
    > > > > I need urllib2 do perform series of HTTP requests with cookie from
    > > > > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > > > > use some library(e.g. pycURL) instead but I guess it's good practise
    > > > > for a python beginner to DIY something rather than use existing tools.

    >
    > > > > So my problem is how to expand the urllib2 class

    >
    > > > > from cookielib import CookieJar
    > > > > class SmartRequest():
    > > > >     cj=CookieJar()
    > > > >     def __init__(self, strUrl, strContent=None):
    > > > >         self.Request    =   urllib2.Request(strUrl, strContent)
    > > > >         self.cj.add_cookie_header(self.Request)
    > > > >         self.Response   =   urllib2.urlopen(Request)
    > > > >         self.cj.extract_cookies(self.Response, self.Request)
    > > > >     def url
    > > > >     def read(self, intCount):
    > > > >         return self.Response.read(intCount)
    > > > >     def headers(self, strHeaderName):
    > > > >         return self.Response.headers[strHeaderName]

    >
    > > > > The code does not work because each time SmartRequest is initiated,
    > > > > object 'cj' is cleared. How to avoid that?
    > > > > The only stupid solution I figured out is use a global CookieJar
    > > > > object. Is there anyway that could handle all this INSIDE the class?

    >
    > > > > I am totally new to OOP & python programming, so could anyone give me
    > > > > some suggestions? Thanks in advance

    >
    > > > Google for urllib2.HTTPCookieProcessor.

    >
    > > > HTH,
    > > > Rob- Hide quoted text -

    >
    > > > - Show quoted text -

    >
    > > Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
    > > solved this problem by the following code.

    >
    > > class HTTPRefererProcessor(urllib2.BaseHandler):
    > >     """Add Referer header to requests.

    >
    > >     This only makes sense if you use each RefererProcessor for a
    > > single
    > >     chain of requests only (so, for example, if you use a single
    > >     HTTPRefererProcessor to fetch a series of URLs extracted from a
    > > single
    > >     page, this will break).

    >
    > >     There's a proper implementation of this in module mechanize.

    >
    > >     """
    > >     def __init__(self):
    > >         self.referer = None

    >
    > >     def http_request(self, request):
    > >         if ((self.referer is not None) and
    > >             not request.has_header("Referer")):
    > >             request.add_unredirected_header("Referer", self.referer)
    > >         return request

    >
    > >     def http_response(self, request, response):
    > >         self.referer = response.geturl()
    > >         return response

    >
    > >     https_request = http_request
    > >     https_response = http_response

    >
    > > def main():
    > >     cj = CookieJar()
    > >     opener = urllib2.build_opener(
    > >         urllib2.HTTPCookieProcessor(cj),
    > >         HTTPRefererProcessor(),
    > >     )
    > >     urllib2.install_opener(opener)

    >
    > >     urllib2.urlopen(url1)
    > >     urllib2.urlopen(url2)

    >
    > > if "__main__" == __name__:
    > >     main()

    >
    > > And it's working great!

    >
    > > Once again, thanks everyone!

    >
    > How does the class HTTPReferrerProcessor do anything useful for you?- Hide quoted text -
    >
    > - Show quoted text -


    Well, it's more browser-like. Many be I should have snipped
    HTTPReferrerProcessor code for this discussion.
    est, Feb 25, 2008
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Josef Cihal
    Replies:
    0
    Views:
    744
    Josef Cihal
    Sep 5, 2005
  2. Replies:
    0
    Views:
    484
  3. Antoni Villalonga

    urllib2: handle an error (302)

    Antoni Villalonga, Sep 11, 2007, in forum: Python
    Replies:
    2
    Views:
    982
    Antoni Villalonga
    Sep 11, 2007
  4. Gilles Ganault

    [urllib2 + Tor] How to handle 404?

    Gilles Ganault, Nov 7, 2008, in forum: Python
    Replies:
    2
    Views:
    575
    Steven McKay
    Nov 7, 2008
  5. Karra
    Replies:
    2
    Views:
    768
    Octavian Rasnita
    Dec 29, 2010
Loading...

Share This Page