n00b with urllib2: How to make it handle cookie automatically?

Discussion in 'Python' started by est, Feb 22, 2008.

  1. est

    est Guest

    Hi all,

    I need urllib2 do perform series of HTTP requests with cookie from
    PREVIOUS request(like our browsers usually do ). Many people suggest I
    use some library(e.g. pycURL) instead but I guess it's good practise
    for a python beginner to DIY something rather than use existing tools.

    So my problem is how to expand the urllib2 class

    from cookielib import CookieJar
    class SmartRequest():
    cj=CookieJar()
    def __init__(self, strUrl, strContent=None):
    self.Request = urllib2.Request(strUrl, strContent)
    self.cj.add_cookie_header(self.Request)
    self.Response = urllib2.urlopen(Request)
    self.cj.extract_cookies(self.Response, self.Request)
    def url
    def read(self, intCount):
    return self.Response.read(intCount)
    def headers(self, strHeaderName):
    return self.Response.headers[strHeaderName]

    The code does not work because each time SmartRequest is initiated,
    object 'cj' is cleared. How to avoid that?
    The only stupid solution I figured out is use a global CookieJar
    object. Is there anyway that could handle all this INSIDE the class?

    I am totally new to OOP & python programming, so could anyone give me
    some suggestions? Thanks in advance
     
    est, Feb 22, 2008
    #1
    1. Advertisements

  2. est

    Rob Wolfe Guest

    est <> writes:

    > Hi all,
    >
    > I need urllib2 do perform series of HTTP requests with cookie from
    > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > use some library(e.g. pycURL) instead but I guess it's good practise
    > for a python beginner to DIY something rather than use existing tools.
    >
    > So my problem is how to expand the urllib2 class
    >
    > from cookielib import CookieJar
    > class SmartRequest():
    > cj=CookieJar()
    > def __init__(self, strUrl, strContent=None):
    > self.Request = urllib2.Request(strUrl, strContent)
    > self.cj.add_cookie_header(self.Request)
    > self.Response = urllib2.urlopen(Request)
    > self.cj.extract_cookies(self.Response, self.Request)
    > def url
    > def read(self, intCount):
    > return self.Response.read(intCount)
    > def headers(self, strHeaderName):
    > return self.Response.headers[strHeaderName]
    >
    > The code does not work because each time SmartRequest is initiated,
    > object 'cj' is cleared. How to avoid that?
    > The only stupid solution I figured out is use a global CookieJar
    > object. Is there anyway that could handle all this INSIDE the class?
    >
    > I am totally new to OOP & python programming, so could anyone give me
    > some suggestions? Thanks in advance


    Google for urllib2.HTTPCookieProcessor.

    HTH,
    Rob
     
    Rob Wolfe, Feb 22, 2008
    #2
    1. Advertisements

  3. On Thu, 21 Feb 2008 22:50:49 -0800 (PST), est <>
    declaimed the following in comp.lang.python:

    <snip>
    >
    > from cookielib import CookieJar
    > class SmartRequest():
    > cj=CookieJar()
    > def __init__(self, strUrl, strContent=None):
    > self.Request = urllib2.Request(strUrl, strContent)
    > self.cj.add_cookie_header(self.Request)
    > self.Response = urllib2.urlopen(Request)
    > self.cj.extract_cookies(self.Response, self.Request)
    > def url
    > def read(self, intCount):
    > return self.Response.read(intCount)
    > def headers(self, strHeaderName):
    > return self.Response.headers[strHeaderName]
    >
    > The code does not work because each time SmartRequest is initiated,
    > object 'cj' is cleared. How to avoid that?


    Well... maybe by not creating new SmartRequest instances, but reuse
    the one instance for the transaction.

    UNTESTED -- this is a mental exercise only:

    class SmartTransaction(object): #new style class
    def __init__(self):
    self.cj = CookieJar()
    def doRequest(self, URL, Content=None): #python names are untyped
    #objects have types
    #so it is rare to see
    # <type>Name forms
    self.request = urllib2.Request(URL, Content)
    self.cj.add_cookie_header(self.request)
    self.response = urllib.urlopen(self.request)
    self.cj.extract_cookies(self.response, self.request)

    myTransaction = SmartTransaction()
    myTransaction.doRequest(aURL)
    myTransaction.doRequest(aFollowUpURL, someContent)
    ....
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Feb 22, 2008
    #3
  4. est

    7stud Guest

    On Feb 21, 11:50 pm, est <> wrote:
    > Hi all,
    >
    > I need urllib2 do perform series of HTTP requests with cookie from
    > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > use some library(e.g. pycURL) instead but I guess it's good practise
    > for a python beginner to DIY something rather than use existing tools.
    >
    > So my problem is how to expand the urllib2 class
    >
    > from cookielib import CookieJar
    > class SmartRequest():
    >     cj=CookieJar()
    >     def __init__(self, strUrl, strContent=None):
    >         self.Request    =   urllib2.Request(strUrl, strContent)
    >         self.cj.add_cookie_header(self.Request)
    >         self.Response   =   urllib2.urlopen(Request)
    >         self.cj.extract_cookies(self.Response, self.Request)
    >     def url
    >     def read(self, intCount):
    >         return self.Response.read(intCount)
    >     def headers(self, strHeaderName):
    >         return self.Response.headers[strHeaderName]
    >
    > The code does not work because each time SmartRequest is initiated,
    > object 'cj' is cleared.


    That's because every time you create a SmartRequest, this line
    executes:

    cj=CookieJar()

    That creates a new, *empty* cookie jar, i.e. it has no knowledge of
    any previously set cookies.

    > How to avoid that?


    If you read the docs on the cookielib module, and in particular
    CookieJar objects, you will notice that CookieJar objects are
    described in a section that is titled: CookieJar and FileCookieJar
    Objects.

    Hmm...I wonder what the difference is between a CookieJar object and a
    FileCookieJar Object?

    ----------
    FileCookieJar implements the following additional methods:

    save(filename=None, ignore_discard=False, ignore_expires=False)
    Save cookies to a file.

    load(filename=None, ignore_discard=False, ignore_expires=False)
    Load cookies from a file.
    --------

    That seems promising.
     
    7stud, Feb 22, 2008
    #4
  5. est

    7stud Guest

    On Feb 21, 11:50 pm, est <> wrote:
    > Hi all,
    >
    > I need urllib2 do perform series of HTTP requests with cookie from
    > PREVIOUS request(like our browsers usually do ).
    >


    Cookies from a previous request made in the currently running
    program? Or cookies from requests that were made when you previously
    ran the program?

    >
    > from cookielib import CookieJar
    > class SmartRequest():
    >     cj=CookieJar()
    >     def __init__(self, strUrl, strContent=None):
    >         self.Request    =   urllib2.Request(strUrl, strContent)
    >         self.cj.add_cookie_header(self.Request)
    >         self.Response   =   urllib2.urlopen(Request)
    >         self.cj.extract_cookies(self.Response, self.Request)
    >     def url
    >     def read(self, intCount):
    >         return self.Response.read(intCount)
    >     def headers(self, strHeaderName):
    >         return self.Response.headers[strHeaderName]
    >
    > The code does not work because each time SmartRequest is initiated,
    > object 'cj' is cleared. How to avoid that?
    > The only stupid solution I figured out is use a global CookieJar
    > object. Is there anyway that could handle all this INSIDE the class?
    >


    Examine this code and its output:

    class SmartRequest(object):
    def __init__(self, id):
    if not getattr(SmartRequest, 'cj', None):
    SmartRequest.cj = "I'm a cookie jar. Created by request:
    %s" % id


    r1 = SmartRequest(1)
    r2 = SmartRequest(2)

    print r1.cj
    print r2.cj

    --output:--
    I'm a cookie jar. Created by request: 1
    I'm a cookie jar. Created by request: 1
     
    7stud, Feb 22, 2008
    #5
  6. est

    7stud Guest

    On Feb 21, 11:50 pm, est <> wrote:
    >
    > class SmartRequest():
    >


    You should always define a class like this:

    class SmartRequest(object):


    unless you know of a specific reason not to.
     
    7stud, Feb 23, 2008
    #6
  7. est

    Steve Holden Guest

    7stud wrote:
    > On Feb 21, 11:50 pm, est <> wrote:
    >> class SmartRequest():
    >>

    >
    > You should always define a class like this:
    >
    > class SmartRequest(object):
    >
    >
    > unless you know of a specific reason not to.
    >
    >

    It's much easier, though, just to put

    __metaclass__ = type

    at the start of any module where you want exlusively new-style objects.
    And I do agree that you should use exclusively new-style objects without
    a good reason for not doing, though thanks to Guido's hard work it
    mostly doesn't matter.

    $ cat test94.py
    __metaclass__ = type

    class Rhubarb:
    pass

    rhubarb = Rhubarb()

    print type(Rhubarb)
    print type(rhubarb)


    $ python test94.py
    <type 'type'>
    <class '__main__.Rhubarb'>

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Feb 23, 2008
    #7
  8. est

    est Guest

    On Feb 23, 5:57 am, 7stud <> wrote:
    > On Feb 21, 11:50 pm, est <> wrote:
    >
    > > Hi all,

    >
    > > I need urllib2 do perform series of HTTP requests with cookie from
    > > PREVIOUS request(like our browsers usually do ).

    >
    > Cookies from a previous request made in the currently running
    > program? Or cookies from requests that were made when you previously
    > ran the program?
    >
    >
    >
    >
    >
    >
    >
    > > from cookielib import CookieJar
    > > class SmartRequest():
    > > cj=CookieJar()
    > > def __init__(self, strUrl, strContent=None):
    > > self.Request = urllib2.Request(strUrl, strContent)
    > > self.cj.add_cookie_header(self.Request)
    > > self.Response = urllib2.urlopen(Request)
    > > self.cj.extract_cookies(self.Response, self.Request)
    > > def url
    > > def read(self, intCount):
    > > return self.Response.read(intCount)
    > > def headers(self, strHeaderName):
    > > return self.Response.headers[strHeaderName]

    >
    > > The code does not work because each time SmartRequest is initiated,
    > > object 'cj' is cleared. How to avoid that?
    > > The only stupid solution I figured out is use a global CookieJar
    > > object. Is there anyway that could handle all this INSIDE the class?

    >
    > Examine this code and its output:
    >
    > class SmartRequest(object):
    > def __init__(self, id):
    > if not getattr(SmartRequest, 'cj', None):
    > SmartRequest.cj = "I'm a cookie jar. Created by request:


    the getattr method is exactly what I am looking for, thanks!


    On Feb 23, 2:05 pm, 7stud <> wrote:
    > On Feb 21, 11:50 pm, est <> wrote:
    >
    >
    >
    > > class SmartRequest():

    >
    > You should always define a class like this:
    >
    > class SmartRequest(object):
    >
    > unless you know of a specific reason not to.


    Thanks for the advice!
     
    est, Feb 24, 2008
    #8
  9. est

    est Guest

    On Feb 23, 2:42 am, Rob Wolfe <> wrote:
    > est <> writes:
    > > Hi all,

    >
    > > I need urllib2 do perform series of HTTP requests with cookie from
    > > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > > use some library(e.g. pycURL) instead but I guess it's good practise
    > > for a python beginner to DIY something rather than use existing tools.

    >
    > > So my problem is how to expand the urllib2 class

    >
    > > from cookielib import CookieJar
    > > class SmartRequest():
    > >     cj=CookieJar()
    > >     def __init__(self, strUrl, strContent=None):
    > >         self.Request    =   urllib2.Request(strUrl, strContent)
    > >         self.cj.add_cookie_header(self.Request)
    > >         self.Response   =   urllib2.urlopen(Request)
    > >         self.cj.extract_cookies(self.Response, self.Request)
    > >     def url
    > >     def read(self, intCount):
    > >         return self.Response.read(intCount)
    > >     def headers(self, strHeaderName):
    > >         return self.Response.headers[strHeaderName]

    >
    > > The code does not work because each time SmartRequest is initiated,
    > > object 'cj' is cleared. How to avoid that?
    > > The only stupid solution I figured out is use a global CookieJar
    > > object. Is there anyway that could handle all this INSIDE the class?

    >
    > > I am totally new to OOP & python programming, so could anyone give me
    > > some suggestions? Thanks in advance

    >
    > Google for urllib2.HTTPCookieProcessor.
    >
    > HTH,
    > Rob- Hide quoted text -
    >
    > - Show quoted text -


    Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
    solved this problem by the following code.

    class HTTPRefererProcessor(urllib2.BaseHandler):
    """Add Referer header to requests.

    This only makes sense if you use each RefererProcessor for a
    single
    chain of requests only (so, for example, if you use a single
    HTTPRefererProcessor to fetch a series of URLs extracted from a
    single
    page, this will break).

    There's a proper implementation of this in module mechanize.

    """
    def __init__(self):
    self.referer = None

    def http_request(self, request):
    if ((self.referer is not None) and
    not request.has_header("Referer")):
    request.add_unredirected_header("Referer", self.referer)
    return request

    def http_response(self, request, response):
    self.referer = response.geturl()
    return response

    https_request = http_request
    https_response = http_response

    def main():
    cj = CookieJar()
    opener = urllib2.build_opener(
    urllib2.HTTPCookieProcessor(cj),
    HTTPRefererProcessor(),
    )
    urllib2.install_opener(opener)

    urllib2.urlopen(url1)
    urllib2.urlopen(url2)

    if "__main__" == __name__:
    main()

    And it's working great!

    Once again, thanks everyone!
     
    est, Feb 24, 2008
    #9
  10. est

    7stud Guest

    On Feb 24, 4:41 am, est <> wrote:
    > On Feb 23, 2:42 am, Rob Wolfe <> wrote:
    >
    >
    >
    > > est <> writes:
    > > > Hi all,

    >
    > > > I need urllib2 do perform series of HTTP requests with cookie from
    > > > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > > > use some library(e.g. pycURL) instead but I guess it's good practise
    > > > for a python beginner to DIY something rather than use existing tools.

    >
    > > > So my problem is how to expand the urllib2 class

    >
    > > > from cookielib import CookieJar
    > > > class SmartRequest():
    > > >     cj=CookieJar()
    > > >     def __init__(self, strUrl, strContent=None):
    > > >         self.Request    =   urllib2.Request(strUrl, strContent)
    > > >         self.cj.add_cookie_header(self.Request)
    > > >         self.Response   =   urllib2.urlopen(Request)
    > > >         self.cj.extract_cookies(self.Response, self.Request)
    > > >     def url
    > > >     def read(self, intCount):
    > > >         return self.Response.read(intCount)
    > > >     def headers(self, strHeaderName):
    > > >         return self.Response.headers[strHeaderName]

    >
    > > > The code does not work because each time SmartRequest is initiated,
    > > > object 'cj' is cleared. How to avoid that?
    > > > The only stupid solution I figured out is use a global CookieJar
    > > > object. Is there anyway that could handle all this INSIDE the class?

    >
    > > > I am totally new to OOP & python programming, so could anyone give me
    > > > some suggestions? Thanks in advance

    >
    > > Google for urllib2.HTTPCookieProcessor.

    >
    > > HTH,
    > > Rob- Hide quoted text -

    >
    > > - Show quoted text -

    >
    > Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
    > solved this problem by the following code.
    >
    > class HTTPRefererProcessor(urllib2.BaseHandler):
    >     """Add Referer header to requests.
    >
    >     This only makes sense if you use each RefererProcessor for a
    > single
    >     chain of requests only (so, for example, if you use a single
    >     HTTPRefererProcessor to fetch a series of URLs extracted from a
    > single
    >     page, this will break).
    >
    >     There's a proper implementation of this in module mechanize.
    >
    >     """
    >     def __init__(self):
    >         self.referer = None
    >
    >     def http_request(self, request):
    >         if ((self.referer is not None) and
    >             not request.has_header("Referer")):
    >             request.add_unredirected_header("Referer", self.referer)
    >         return request
    >
    >     def http_response(self, request, response):
    >         self.referer = response.geturl()
    >         return response
    >
    >     https_request = http_request
    >     https_response = http_response
    >
    > def main():
    >     cj = CookieJar()
    >     opener = urllib2.build_opener(
    >         urllib2.HTTPCookieProcessor(cj),
    >         HTTPRefererProcessor(),
    >     )
    >     urllib2.install_opener(opener)
    >
    >     urllib2.urlopen(url1)
    >     urllib2.urlopen(url2)
    >
    > if "__main__" == __name__:
    >     main()
    >
    > And it's working great!
    >
    > Once again, thanks everyone!


    How does the class HTTPReferrerProcessor do anything useful for you?
     
    7stud, Feb 24, 2008
    #10
  11. est

    est Guest

    On Feb 25, 5:46 am, 7stud <> wrote:
    > On Feb 24, 4:41 am, est <> wrote:
    >
    >
    >
    >
    >
    > > On Feb 23, 2:42 am, Rob Wolfe <> wrote:

    >
    > > > est <> writes:
    > > > > Hi all,

    >
    > > > > I need urllib2 do perform series of HTTP requests with cookie from
    > > > > PREVIOUS request(like our browsers usually do ). Many people suggest I
    > > > > use some library(e.g. pycURL) instead but I guess it's good practise
    > > > > for a python beginner to DIY something rather than use existing tools.

    >
    > > > > So my problem is how to expand the urllib2 class

    >
    > > > > from cookielib import CookieJar
    > > > > class SmartRequest():
    > > > >     cj=CookieJar()
    > > > >     def __init__(self, strUrl, strContent=None):
    > > > >         self.Request    =   urllib2.Request(strUrl, strContent)
    > > > >         self.cj.add_cookie_header(self.Request)
    > > > >         self.Response   =   urllib2.urlopen(Request)
    > > > >         self.cj.extract_cookies(self.Response, self.Request)
    > > > >     def url
    > > > >     def read(self, intCount):
    > > > >         return self.Response.read(intCount)
    > > > >     def headers(self, strHeaderName):
    > > > >         return self.Response.headers[strHeaderName]

    >
    > > > > The code does not work because each time SmartRequest is initiated,
    > > > > object 'cj' is cleared. How to avoid that?
    > > > > The only stupid solution I figured out is use a global CookieJar
    > > > > object. Is there anyway that could handle all this INSIDE the class?

    >
    > > > > I am totally new to OOP & python programming, so could anyone give me
    > > > > some suggestions? Thanks in advance

    >
    > > > Google for urllib2.HTTPCookieProcessor.

    >
    > > > HTH,
    > > > Rob- Hide quoted text -

    >
    > > > - Show quoted text -

    >
    > > Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
    > > solved this problem by the following code.

    >
    > > class HTTPRefererProcessor(urllib2.BaseHandler):
    > >     """Add Referer header to requests.

    >
    > >     This only makes sense if you use each RefererProcessor for a
    > > single
    > >     chain of requests only (so, for example, if you use a single
    > >     HTTPRefererProcessor to fetch a series of URLs extracted from a
    > > single
    > >     page, this will break).

    >
    > >     There's a proper implementation of this in module mechanize.

    >
    > >     """
    > >     def __init__(self):
    > >         self.referer = None

    >
    > >     def http_request(self, request):
    > >         if ((self.referer is not None) and
    > >             not request.has_header("Referer")):
    > >             request.add_unredirected_header("Referer", self.referer)
    > >         return request

    >
    > >     def http_response(self, request, response):
    > >         self.referer = response.geturl()
    > >         return response

    >
    > >     https_request = http_request
    > >     https_response = http_response

    >
    > > def main():
    > >     cj = CookieJar()
    > >     opener = urllib2.build_opener(
    > >         urllib2.HTTPCookieProcessor(cj),
    > >         HTTPRefererProcessor(),
    > >     )
    > >     urllib2.install_opener(opener)

    >
    > >     urllib2.urlopen(url1)
    > >     urllib2.urlopen(url2)

    >
    > > if "__main__" == __name__:
    > >     main()

    >
    > > And it's working great!

    >
    > > Once again, thanks everyone!

    >
    > How does the class HTTPReferrerProcessor do anything useful for you?- Hide quoted text -
    >
    > - Show quoted text -


    Well, it's more browser-like. Many be I should have snipped
    HTTPReferrerProcessor code for this discussion.
     
    est, Feb 25, 2008
    #11
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rafael T. Ugolini

    More than one cookie with urllib2

    Rafael T. Ugolini, Dec 11, 2003, in forum: Python
    Replies:
    7
    Views:
    694
    John J. Lee
    Dec 23, 2003
  2. =?ISO-8859-1?Q?Eino_M=E4kitalo?=

    urllib2 and Set-Cookie with "302 Moved temporarily"

    =?ISO-8859-1?Q?Eino_M=E4kitalo?=, Dec 13, 2004, in forum: Python
    Replies:
    2
    Views:
    931
    =?ISO-8859-1?Q?Eino_M=E4kitalo?=
    Dec 13, 2004
  3. Josef Cihal
    Replies:
    0
    Views:
    1,288
    Josef Cihal
    Sep 5, 2005
  4. Replies:
    0
    Views:
    547
  5. itay_k
    Replies:
    7
    Views:
    1,027
    itay_k
    Apr 22, 2006
  6. Antoni Villalonga

    urllib2: handle an error (302)

    Antoni Villalonga, Sep 11, 2007, in forum: Python
    Replies:
    2
    Views:
    1,153
    Antoni Villalonga
    Sep 11, 2007
  7. Gilles Ganault

    [urllib2 + Tor] How to handle 404?

    Gilles Ganault, Nov 7, 2008, in forum: Python
    Replies:
    2
    Views:
    671
    Steven McKay
    Nov 7, 2008
  8. Karra
    Replies:
    2
    Views:
    924
    Octavian Rasnita
    Dec 29, 2010
Loading...