n00b with urllib2: How to make it handle cookie automatically?

est · Feb 22, 2008

Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.

So my problem is how to expand the urllib2 class

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance

Rob Wolfe · Feb 22, 2008

est said:
Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.

So my problem is how to expand the urllib2 class

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance

Google for urllib2.HTTPCookieProcessor.

HTH,
Rob

Dennis Lee Bieber · Feb 22, 2008

said:
from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?

Well... maybe by not creating new SmartRequest instances, but reuse
the one instance for the transaction.

UNTESTED -- this is a mental exercise only:

class SmartTransaction(object): #new style class
def __init__(self):
self.cj = CookieJar()
def doRequest(self, URL, Content=None): #python names are untyped
#objects have types
#so it is rare to see
# <type>Name forms
self.request = urllib2.Request(URL, Content)
self.cj.add_cookie_header(self.request)
self.response = urllib.urlopen(self.request)
self.cj.extract_cookies(self.response, self.request)

myTransaction = SmartTransaction()
myTransaction.doRequest(aURL)
myTransaction.doRequest(aFollowUpURL, someContent)
....
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

7stud · Feb 22, 2008

Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.

So my problem is how to expand the urllib2 class

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared.

That's because every time you create a SmartRequest, this line
executes:

cj=CookieJar()

That creates a new, *empty* cookie jar, i.e. it has no knowledge of
any previously set cookies.

How to avoid that?

If you read the docs on the cookielib module, and in particular
CookieJar objects, you will notice that CookieJar objects are
described in a section that is titled: CookieJar and FileCookieJar
Objects.

Hmm...I wonder what the difference is between a CookieJar object and a
FileCookieJar Object?

----------
FileCookieJar implements the following additional methods:

save(filename=None, ignore_discard=False, ignore_expires=False)
Save cookies to a file.

load(filename=None, ignore_discard=False, ignore_expires=False)
Load cookies from a file.

7stud · Feb 22, 2008

Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ).

Cookies from a previous request made in the currently running
program? Or cookies from requests that were made when you previously
ran the program?

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

Examine this code and its output:

class SmartRequest(object):
def __init__(self, id):
if not getattr(SmartRequest, 'cj', None):
SmartRequest.cj = "I'm a cookie jar. Created by request:
%s" % id

r1 = SmartRequest(1)
r2 = SmartRequest(2)

print r1.cj
print r2.cj

--output:--
I'm a cookie jar. Created by request: 1
I'm a cookie jar. Created by request: 1

7stud · Feb 23, 2008

class SmartRequest():

You should always define a class like this:

class SmartRequest(object):

unless you know of a specific reason not to.

Steve Holden · Feb 23, 2008

7stud said:
You should always define a class like this:

class SmartRequest(object):

unless you know of a specific reason not to.

It's much easier, though, just to put

__metaclass__ = type

at the start of any module where you want exlusively new-style objects.
And I do agree that you should use exclusively new-style objects without
a good reason for not doing, though thanks to Guido's hard work it
mostly doesn't matter.

$ cat test94.py
__metaclass__ = type

class Rhubarb:
pass

rhubarb = Rhubarb()

print type(Rhubarb)
print type(rhubarb)

$ python test94.py
<type 'type'>
<class '__main__.Rhubarb'>

regards
Steve

est · Feb 24, 2008

Hi all,

Click to expand...

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ).

Click to expand...

Cookies from a previous request made in the currently running
program? Or cookies from requests that were made when you previously
ran the program?

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

Click to expand...

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

Click to expand...

Examine this code and its output:

class SmartRequest(object):
def __init__(self, id):
if not getattr(SmartRequest, 'cj', None):
SmartRequest.cj = "I'm a cookie jar. Created by request:

the getattr method is exactly what I am looking for, thanks!

You should always define a class like this:

class SmartRequest(object):

unless you know of a specific reason not to.

Thanks for the advice!

est · Feb 24, 2008

est said:
est said:

Hi all,

Click to expand...

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.

Click to expand...

So my problem is how to expand the urllib2 class

Click to expand...

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

Click to expand...

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

Click to expand...

I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance

Click to expand...

Google for urllib2.HTTPCookieProcessor.

HTH,
Rob- Hide quoted text -

- Show quoted text -

Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.

class HTTPRefererProcessor(urllib2.BaseHandler):
"""Add Referer header to requests.

This only makes sense if you use each RefererProcessor for a
single
chain of requests only (so, for example, if you use a single
HTTPRefererProcessor to fetch a series of URLs extracted from a
single
page, this will break).

There's a proper implementation of this in module mechanize.

"""
def __init__(self):
self.referer = None

def http_request(self, request):
if ((self.referer is not None) and
not request.has_header("Referer")):
request.add_unredirected_header("Referer", self.referer)
return request

def http_response(self, request, response):
self.referer = response.geturl()
return response

https_request = http_request
https_response = http_response

def main():
cj = CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor(cj),
HTTPRefererProcessor(),
)
urllib2.install_opener(opener)

urllib2.urlopen(url1)
urllib2.urlopen(url2)

if "__main__" == __name__:
main()

And it's working great!

Once again, thanks everyone!

7stud · Feb 24, 2008

est said:
est said:

Hi all,
I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.
So my problem is how to expand the urllib2 class
from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]
The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?
I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance

Click to expand...

Click to expand...

Google for urllib2.HTTPCookieProcessor.

Click to expand...

HTH,
Rob- Hide quoted text -

Click to expand...

- Show quoted text -

Click to expand...

Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.

class HTTPRefererProcessor(urllib2.BaseHandler):
"""Add Referer header to requests.

This only makes sense if you use each RefererProcessor for a
single
chain of requests only (so, for example, if you use a single
HTTPRefererProcessor to fetch a series of URLs extracted from a
single
page, this will break).

There's a proper implementation of this in module mechanize.

"""
def __init__(self):
self.referer = None

def http_request(self, request):
if ((self.referer is not None) and
not request.has_header("Referer")):
request.add_unredirected_header("Referer", self.referer)
return request

def http_response(self, request, response):
self.referer = response.geturl()
return response

https_request = http_request
https_response = http_response

def main():
cj = CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor(cj),
HTTPRefererProcessor(),
)
urllib2.install_opener(opener)

urllib2.urlopen(url1)
urllib2.urlopen(url2)

if "__main__" == __name__:
main()

And it's working great!

Once again, thanks everyone!

How does the class HTTPReferrerProcessor do anything useful for you?

est · Feb 25, 2008

Hi all,
I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.
So my problem is how to expand the urllib2 class
from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]
The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?
I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance
Google for urllib2.HTTPCookieProcessor.
HTH,
Rob- Hide quoted text -
- Show quoted text -

Click to expand...

Click to expand...

Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.

Click to expand...

class HTTPRefererProcessor(urllib2.BaseHandler):
"""Add Referer header to requests.

Click to expand...

This only makes sense if you use each RefererProcessor for a
single
chain of requests only (so, for example, if you use a single
HTTPRefererProcessor to fetch a series of URLs extracted from a
single
page, this will break).

Click to expand...

There's a proper implementation of this in module mechanize.

Click to expand...

"""
def __init__(self):
self.referer = None

Click to expand...

def http_request(self, request):
if ((self.referer is not None) and
not request.has_header("Referer")):
request.add_unredirected_header("Referer", self.referer)
return request

Click to expand...

def http_response(self, request, response):
self.referer = response.geturl()
return response

Click to expand...

https_request = http_request
https_response = http_response

Click to expand...

def main():
cj = CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor(cj),
HTTPRefererProcessor(),
)
urllib2.install_opener(opener)

Click to expand...

urllib2.urlopen(url1)
urllib2.urlopen(url2)

Click to expand...

if "__main__" == __name__:
main()

Click to expand...

And it's working great!

Click to expand...

Once again, thanks everyone!

Click to expand...

How does the class HTTPReferrerProcessor do anything useful for you?- Hide quoted text -

- Show quoted text -

Well, it's more browser-like. Many be I should have snipped
HTTPReferrerProcessor code for this discussion.

urllib2 opendirector versus request object	0	Jun 9, 2011
[cookielib] How to add cookies myself?	0	Dec 16, 2008
issue with cookielib.LWPCookieJar	1	Nov 30, 2007
problems with hasattr() and custom __getattr__ inside urllib2	0	Aug 24, 2005
need help to upload file to webserver	0	Apr 8, 2008
urllib2.urlopen(url) pulling something other than HTML	7	Aug 20, 2007
Problems returning/attaching cookies	1	Nov 19, 2005
Reading from socket file handle took too long	1	Feb 24, 2006

n00b with urllib2: How to make it handle cookie automatically?

est

Rob Wolfe

Dennis Lee Bieber

7stud

7stud

7stud

Steve Holden

est

est

7stud

est

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads