How to keep cookies when making http requests (Python 2.7)

L

Luca Cerone

Hi everybody,
I am trying to write a simple Python script to solve the "riddle" at:
http://quiz.gambitresearch.com/

The quiz is quite easy to solve, one needs to evaluate the expression between the curly brackets (say that the expression has value <val>)
and go to the web page:

http://quiz.gambitresearch/job/<val>

You have to be fast enough, because with the page there is an associated cookie that expires 1 sec after the first request, therefore you need to be quick to access the /job/<val> page.

[I know that this is the correct solution because with a friend we wrote a small script in JavaScript and could access the page with the email address]

As an exercise I have decided to try doing the same with Python.

First I have tried with the following code:

#START SCRIPT

import re
import urllib2

regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')
base_address = "http://quiz.gambitresearch.com/"
base_h = urllib2.urlopen(base_address)
base_page = base_h.read()

val = str(eval(regex.findall(base_page)[0]))

job_address = base_address + "job/" + val
job_h = urllib2.urlopen(job_address)
job_page = job_h.read()

print job_page
#END SCRIPT

job_page has the following content now: "WRONG! (Have you enabled cookies?)"

Trying to solve the issues with the cookies I found the "requests" module that in theory should work.
I therefore rewrote the above script to use request:

#START SCRIPT:
import re
import requests

regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')

base_address = "http://quiz.gambitresearch.com/"

s = requests.Session()

base_h = s.get('http://quiz.gambitresearch.com/')
base_page = base_h.text

val = eval( regex.findall( base_page )[0] )

job_address = base_address + "job/" + str(val)
job_h = s.get( job_address )
job_page = job_h.text

print job_page
#END SCRIPT
# print job_page produces "Wrong!".

According to the manual using Session() the cookies should be enabled and persistent for all the session. In fact the cookies in base_h.cookies and in job_h.cookies seem to be the same:

base_h.cookies == job_h.cookies
#returns True

So, why does this script fail to access the job page?
How can I change it so that I it works as intended and job_page prints
the content of the page that displays the email address to use for the job applications?

Thanks a lot in advance for the help!

Best Wishes,
Luca
 
D

dieter

Luca Cerone said:

Python has a module for cookie handling: "cookielib" ("cookiejar"
in Python 3).

"urllib2" has a standard way to integrate with this module.
However, I do not know the details (check the documentation
for the modules).

I have used "cookielib" externally to "urllib2". It looks
like this:

from urllib2 import urlopen, Request
from cookielib import CookieJar

cookies = CookieJar()
.....
r = Request(...)
cookies.add_cookie_header(r) # set the cookies
R = urlopen(r, ...) # make the request
cookies.extract_cookies(R, r) # remember the new cookies
 
L

Luca Cerone

I have used "cookielib" externally to "urllib2". It looks

like this:

from urllib2 import urlopen, Request

from cookielib import CookieJar
cookies = CookieJar()

....

r = Request(...)

cookies.add_cookie_header(r) # set the cookies

R = urlopen(r, ...) # make the request

cookies.extract_cookies(R, r) # remember the new cookies

Hi Dieter,
thanks a lot for the help.
I am sorry but your code is not very clear to me.
It seems that you are setting some cookies,
but I can't understand how you use the ones that the site
sends to you when you perform the initial request.

Have you tried this code to check if this work?
If it works as intended can you explain a bit better
what it does exactly?

Thanks again!
Luca
 
F

Fábio Santos

Hi Dieter,
thanks a lot for the help.
I am sorry but your code is not very clear to me.
It seems that you are setting some cookies,
but I can't understand how you use the ones that the site
sends to you when you perform the initial request.

This example does both. The cookie jar adds the cookies to the http request
to be sent to the server, and updates the cookies from the response, if any
were sent. It seems pretty clear, seeing that it has a lot of comments.

The cookies from the site are thus in the cookie jar object after the call
to extract_cookies() extracts them from the response.
Have you tried this code to check if this work?
If it works as intended can you explain a bit better
what it does exactly?

You should really test this yourself ;)
 
D

dieter

Luca Cerone said:
...
Have you tried this code to check if this work?

Not this code, but code like this (as I have written).
If it works as intended can you explain a bit better
what it does exactly?

Fabio already did the explanation.


Let me make an additional remark however: you should
not expect to get complete details in a list like this - but only
hints towards a solution for your problem (i.e.
there remains some work for you).
Thus, I expect you to read the "cookielib/cookiejar" documentation
(part of Python's standard documentation) in order to understand
my example code - before I would be ready to provide further details.
 
L

Luca Cerone

Dear all,
first of all thanks for the help.
As for your remark, you are right, and I usually tend to post questions in a way that is detached from the particular problem I have to solve.
In this case since I only have a limited knowledge of the cookies mechanism (in general, not only in Python), I preferred to ask for the specific case.
I am sorry if I gave you the impression I didn't appreciate your answer,
it was absolutely not my intention.

Cheers,
Luca
 
L

Luca Cerone

Ok so after reading the documentation for urllib2 and cookielib I came up with the following code:

#START
from urllib2 import urlopen , Request
from cookielib import CookieJar
import re
regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')

base_url = "http://quiz.gambitresearch.com"
job_url = base_url + "/job/"

cookies = CookieJar()
r = Request(base_url) #prepare the request object
cookies.add_cookie_header(r) #allow to have cookies
R = urlopen(r) #read the url
cookies.extract_cookies(R,r) #take the cookies from the response R and adds #them to the request object

#build the new url
t = R.read()
v = str(eval(regex.findall(t)[0]))
job_url = job_url + v


# Here I create a new request to the url containing the email address
r2 = Request(job_url)
cookies.add_cookie_header(r2) #I prepare the request for cookies adding the cookies that I extracted before.

#perform the request and print the page
R2 = urlopen(r2)
t2 = R2.read()
print job_url
print t2
#END

This still doesn't work, but I really can't understand why.
As far as I have understood first I have to instantiate a Request object
and allow it to receive and set cookies (I do this with r = Request() and cookies.add_cookie_header(r))
Next I perform the request (urlopen), save the cookies in the CookieJar (cookies.extract_cookies(R,r)).

I evaluate the new address and I create a new Request for it (r2 = Request)
I add the cookies stored in the cookiejar in my new request (cookies.add_cookie_header(r2))
Then I perform the request (R2 = urlopen(r2)) and read the page (t2 = R2.read())

What am I doing wrong? Do I misunderstand something in the process?

Thanks again in advance for the help,
Cheers,
Luca
 
D

dieter

Luca Cerone said:
...
Ok so after reading the documentation for urllib2 and cookielib I came up with the following code:

#START
from urllib2 import urlopen , Request
from cookielib import CookieJar
import re
regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')

base_url = "http://quiz.gambitresearch.com"
job_url = base_url + "/job/"

cookies = CookieJar()
r = Request(base_url) #prepare the request object
cookies.add_cookie_header(r) #allow to have cookies
R = urlopen(r) #read the url
cookies.extract_cookies(R,r) #take the cookies from the response R and adds #them to the request object

"adds them to the request object" should be "adds them to the cookie jar".
#build the new url
t = R.read()
v = str(eval(regex.findall(t)[0]))
job_url = job_url + v


# Here I create a new request to the url containing the email address
r2 = Request(job_url)
cookies.add_cookie_header(r2) #I prepare the request for cookies adding the cookies that I extracted before.

#perform the request and print the page
R2 = urlopen(r2)
t2 = R2.read()
print job_url
print t2
#END

This still doesn't work, but I really can't understand why.
As far as I have understood first I have to instantiate a Request object
and allow it to receive and set cookies (I do this with r = Request() and cookies.add_cookie_header(r))
Next I perform the request (urlopen), save the cookies in the CookieJar (cookies.extract_cookies(R,r)).

I evaluate the new address and I create a new Request for it (r2 = Request)
I add the cookies stored in the cookiejar in my new request (cookies.add_cookie_header(r2))
Then I perform the request (R2 = urlopen(r2)) and read the page (t2 = R2.read())

What am I doing wrong?

With respect to cookie handling, you do everything right.

There may be other problems with the (wider) process.
Analysing the responses of your requests (reading the status codes,
the response headers and the response bodies) may provide hints
towards the problem.
Do I misunderstand something in the process?

Not with respect to cookie handling.
 
L

Luca Cerone

Thanks Dieter,
With respect to cookie handling, you do everything right.



There may be other problems with the (wider) process.

Analysing the responses of your requests (reading the status codes,

the response headers and the response bodies) may provide hints

towards the problem.

I will try to do that and try to see if I can figure out why.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top