n00b with urllib2: How to make it handle cookie automatically?

E

est

Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.

So my problem is how to expand the urllib2 class

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance
 
R

Rob Wolfe

est said:
Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.

So my problem is how to expand the urllib2 class

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance

Google for urllib2.HTTPCookieProcessor.

HTH,
Rob
 
D

Dennis Lee Bieber

said:
from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?

Well... maybe by not creating new SmartRequest instances, but reuse
the one instance for the transaction.

UNTESTED -- this is a mental exercise only:

class SmartTransaction(object): #new style class
def __init__(self):
self.cj = CookieJar()
def doRequest(self, URL, Content=None): #python names are untyped
#objects have types
#so it is rare to see
# <type>Name forms
self.request = urllib2.Request(URL, Content)
self.cj.add_cookie_header(self.request)
self.response = urllib.urlopen(self.request)
self.cj.extract_cookies(self.response, self.request)

myTransaction = SmartTransaction()
myTransaction.doRequest(aURL)
myTransaction.doRequest(aFollowUpURL, someContent)
....
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
7

7stud

Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.

So my problem is how to expand the urllib2 class

from cookielib import CookieJar
class SmartRequest():
    cj=CookieJar()
    def __init__(self, strUrl, strContent=None):
        self.Request    =   urllib2.Request(strUrl, strContent)
        self.cj.add_cookie_header(self.Request)
        self.Response   =   urllib2.urlopen(Request)
        self.cj.extract_cookies(self.Response, self.Request)
    def url
    def read(self, intCount):
        return self.Response.read(intCount)
    def headers(self, strHeaderName):
        return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared.

That's because every time you create a SmartRequest, this line
executes:

cj=CookieJar()

That creates a new, *empty* cookie jar, i.e. it has no knowledge of
any previously set cookies.
How to avoid that?

If you read the docs on the cookielib module, and in particular
CookieJar objects, you will notice that CookieJar objects are
described in a section that is titled: CookieJar and FileCookieJar
Objects.

Hmm...I wonder what the difference is between a CookieJar object and a
FileCookieJar Object?

----------
FileCookieJar implements the following additional methods:

save(filename=None, ignore_discard=False, ignore_expires=False)
Save cookies to a file.

load(filename=None, ignore_discard=False, ignore_expires=False)
Load cookies from a file.
 
7

7stud

Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ).

Cookies from a previous request made in the currently running
program? Or cookies from requests that were made when you previously
ran the program?
from cookielib import CookieJar
class SmartRequest():
    cj=CookieJar()
    def __init__(self, strUrl, strContent=None):
        self.Request    =   urllib2.Request(strUrl, strContent)
        self.cj.add_cookie_header(self.Request)
        self.Response   =   urllib2.urlopen(Request)
        self.cj.extract_cookies(self.Response, self.Request)
    def url
    def read(self, intCount):
        return self.Response.read(intCount)
    def headers(self, strHeaderName):
        return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

Examine this code and its output:

class SmartRequest(object):
def __init__(self, id):
if not getattr(SmartRequest, 'cj', None):
SmartRequest.cj = "I'm a cookie jar. Created by request:
%s" % id


r1 = SmartRequest(1)
r2 = SmartRequest(2)

print r1.cj
print r2.cj

--output:--
I'm a cookie jar. Created by request: 1
I'm a cookie jar. Created by request: 1
 
S

Steve Holden

7stud said:
You should always define a class like this:

class SmartRequest(object):


unless you know of a specific reason not to.
It's much easier, though, just to put

__metaclass__ = type

at the start of any module where you want exlusively new-style objects.
And I do agree that you should use exclusively new-style objects without
a good reason for not doing, though thanks to Guido's hard work it
mostly doesn't matter.

$ cat test94.py
__metaclass__ = type

class Rhubarb:
pass

rhubarb = Rhubarb()

print type(Rhubarb)
print type(rhubarb)


$ python test94.py
<type 'type'>
<class '__main__.Rhubarb'>

regards
Steve
 
E

est

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ).

Cookies from a previous request made in the currently running
program? Or cookies from requests that were made when you previously
ran the program?






from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]
The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

Examine this code and its output:

class SmartRequest(object):
def __init__(self, id):
if not getattr(SmartRequest, 'cj', None):
SmartRequest.cj = "I'm a cookie jar. Created by request:

the getattr method is exactly what I am looking for, thanks!


You should always define a class like this:

class SmartRequest(object):

unless you know of a specific reason not to.

Thanks for the advice!
 
E

est

est said:
I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.
So my problem is how to expand the urllib2 class
from cookielib import CookieJar
class SmartRequest():
    cj=CookieJar()
    def __init__(self, strUrl, strContent=None):
        self.Request    =   urllib2.Request(strUrl, strContent)
        self.cj.add_cookie_header(self.Request)
        self.Response   =   urllib2.urlopen(Request)
        self.cj.extract_cookies(self.Response, self.Request)
    def url
    def read(self, intCount):
        return self.Response.read(intCount)
    def headers(self, strHeaderName):
        return self.Response.headers[strHeaderName]
The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?
I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance

Google for urllib2.HTTPCookieProcessor.

HTH,
Rob- Hide quoted text -

- Show quoted text -

Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.

class HTTPRefererProcessor(urllib2.BaseHandler):
"""Add Referer header to requests.

This only makes sense if you use each RefererProcessor for a
single
chain of requests only (so, for example, if you use a single
HTTPRefererProcessor to fetch a series of URLs extracted from a
single
page, this will break).

There's a proper implementation of this in module mechanize.

"""
def __init__(self):
self.referer = None

def http_request(self, request):
if ((self.referer is not None) and
not request.has_header("Referer")):
request.add_unredirected_header("Referer", self.referer)
return request

def http_response(self, request, response):
self.referer = response.geturl()
return response

https_request = http_request
https_response = http_response

def main():
cj = CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor(cj),
HTTPRefererProcessor(),
)
urllib2.install_opener(opener)

urllib2.urlopen(url1)
urllib2.urlopen(url2)

if "__main__" == __name__:
main()

And it's working great!

Once again, thanks everyone!
 
7

7stud

est said:
Hi all,
I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.
So my problem is how to expand the urllib2 class
from cookielib import CookieJar
class SmartRequest():
    cj=CookieJar()
    def __init__(self, strUrl, strContent=None):
        self.Request    =   urllib2.Request(strUrl, strContent)
        self.cj.add_cookie_header(self.Request)
        self.Response   =   urllib2.urlopen(Request)
        self.cj.extract_cookies(self.Response, self.Request)
    def url
    def read(self, intCount):
        return self.Response.read(intCount)
    def headers(self, strHeaderName):
        return self.Response.headers[strHeaderName]
The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?
I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance
Google for urllib2.HTTPCookieProcessor.
HTH,
Rob- Hide quoted text -
- Show quoted text -

Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.

class HTTPRefererProcessor(urllib2.BaseHandler):
    """Add Referer header to requests.

    This only makes sense if you use each RefererProcessor for a
single
    chain of requests only (so, for example, if you use a single
    HTTPRefererProcessor to fetch a series of URLs extracted from a
single
    page, this will break).

    There's a proper implementation of this in module mechanize.

    """
    def __init__(self):
        self.referer = None

    def http_request(self, request):
        if ((self.referer is not None) and
            not request.has_header("Referer")):
            request.add_unredirected_header("Referer", self.referer)
        return request

    def http_response(self, request, response):
        self.referer = response.geturl()
        return response

    https_request = http_request
    https_response = http_response

def main():
    cj = CookieJar()
    opener = urllib2.build_opener(
        urllib2.HTTPCookieProcessor(cj),
        HTTPRefererProcessor(),
    )
    urllib2.install_opener(opener)

    urllib2.urlopen(url1)
    urllib2.urlopen(url2)

if "__main__" == __name__:
    main()

And it's working great!

Once again, thanks everyone!

How does the class HTTPReferrerProcessor do anything useful for you?
 
E

est

Hi all,
I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.
So my problem is how to expand the urllib2 class
from cookielib import CookieJar
class SmartRequest():
    cj=CookieJar()
    def __init__(self, strUrl, strContent=None):
        self.Request    =   urllib2.Request(strUrl, strContent)
        self.cj.add_cookie_header(self.Request)
        self.Response   =   urllib2.urlopen(Request)
        self.cj.extract_cookies(self.Response, self.Request)
    def url
    def read(self, intCount):
        return self.Response.read(intCount)
    def headers(self, strHeaderName):
        return self.Response.headers[strHeaderName]
The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?
I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance
Google for urllib2.HTTPCookieProcessor.
HTH,
Rob- Hide quoted text -
- Show quoted text -
Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.
class HTTPRefererProcessor(urllib2.BaseHandler):
    """Add Referer header to requests.
    This only makes sense if you use each RefererProcessor for a
single
    chain of requests only (so, for example, if you use a single
    HTTPRefererProcessor to fetch a series of URLs extracted from a
single
    page, this will break).
    There's a proper implementation of this in module mechanize.
    """
    def __init__(self):
        self.referer = None
    def http_request(self, request):
        if ((self.referer is not None) and
            not request.has_header("Referer")):
            request.add_unredirected_header("Referer", self.referer)
        return request
    def http_response(self, request, response):
        self.referer = response.geturl()
        return response
    https_request = http_request
    https_response = http_response
def main():
    cj = CookieJar()
    opener = urllib2.build_opener(
        urllib2.HTTPCookieProcessor(cj),
        HTTPRefererProcessor(),
    )
    urllib2.install_opener(opener)
    urllib2.urlopen(url1)
    urllib2.urlopen(url2)
if "__main__" == __name__:
    main()
And it's working great!
Once again, thanks everyone!

How does the class HTTPReferrerProcessor do anything useful for you?- Hide quoted text -

- Show quoted text -

Well, it's more browser-like. Many be I should have snipped
HTTPReferrerProcessor code for this discussion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top