Use existing IE cookie

K

KB

Hi there,

Relevant versions: Python 2.5, Vista Home, IE7

I am trying to scrape a website I have browsed manually in the past,
and also manually selected my options, and now want python to use my
existing cookie from the manual browse when downloading data.

Using: http://code.activestate.com/recipes/80443/ I have found the
"name" of the relevant cookie, just after reading urllib2 docs, I
can't see how to "send" or have my python instance use "MY" existing
cookie.

Using the following:

***
import re
import urllib2, cookielib

# set things up for cookies

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

reply = urllib2.urlopen('foo.html').read()

print reply

***

This does return data, just default data, not the data from the
options I set up when manually browsing.

My sense is that I need "something" in the () part of
HTTPCookieProcessor() but I have no idea as to what... the docs say
"cookiejar" but the only code examples I have found are to create a
cookiejar for the existing Python session, not to use the cookies from
my prior manual meanderings.

Any help greatly appreciated.
 
D

Diez B. Roggisch

KB said:
Hi there,

Relevant versions: Python 2.5, Vista Home, IE7

I am trying to scrape a website I have browsed manually in the past,
and also manually selected my options, and now want python to use my
existing cookie from the manual browse when downloading data.

Using: http://code.activestate.com/recipes/80443/ I have found the
"name" of the relevant cookie, just after reading urllib2 docs, I
can't see how to "send" or have my python instance use "MY" existing
cookie.

Using the following:

***
import re
import urllib2, cookielib

# set things up for cookies

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

reply = urllib2.urlopen('foo.html').read()

print reply

***

This does return data, just default data, not the data from the
options I set up when manually browsing.

My sense is that I need "something" in the () part of
HTTPCookieProcessor() but I have no idea as to what... the docs say
"cookiejar" but the only code examples I have found are to create a
cookiejar for the existing Python session, not to use the cookies from
my prior manual meanderings.

Because this is a completely different beast. You need to find out if and
how to access IE-cookies from python - I guess some win32-road is to be
walked down for that.

Once you get a hold on them, you can build up whatever cookiejar urllib2
needs.

Diez
 
K

KB

Thanks for the prompt reply, Diez! Using the above I have found the
name of the cookie (I did google how to use IE cookies in python and
that was the best match) but it only tells me the name of the cookie,
not how to use it.

Any clues?

TIA!
 
D

Diez B. Roggisch

KB said:
Thanks for the prompt reply, Diez! Using the above I have found the
name of the cookie (I did google how to use IE cookies in python and
that was the best match) but it only tells me the name of the cookie,
not how to use it.

Ah, sorry, should have read the recipe also.

For me it looks as if findIECookie from that recipe is to be called with the
name. Then it should return the value, or None

What does you full example look like, including the
cookie-acquisition-stuff?

Diez
 
K

KB

What does you full example look like, including the
cookie-acquisition-stuff?

Diez

I ran them seperately, hoping for a clue as to what my "cookiejar"
was.

The cookie-acquisition stuff returns "screener.ashx?v=151" when I
search with my domain I am interested in. I have tried
urllib2.HTTPCookieProcessor('screener.ashx?v=151') but that failed
with attr has no cookie header.

From the HTTPCookieProcessor doco, it appears that non-IE browsers
have a cookie file (and example code) but from what I can tell IE uses
a hidden folder. (you can set your location in IE but it appends a
folder "\Temporary Internet Files" -

From: http://docs.python.org/dev/library/cookielib.html

***
This example illustrates how to open a URL using your Netscape,
Mozilla, or Lynx cookies (assumes Unix/Netscape convention for
location of the cookies file):

import os, cookielib, urllib2
cj = cookielib.MozillaCookieJar()
cj.load(os.path.join(os.environ["HOME"], ".netscape/cookies.txt"))
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
***

Not sure how to adapt this for IE.
 
D

Diez B. Roggisch

KB said:
What does you full example look like, including the
cookie-acquisition-stuff?

Diez

I ran them seperately, hoping for a clue as to what my "cookiejar"
was.

The cookie-acquisition stuff returns "screener.ashx?v=151" when I
search with my domain I am interested in. I have tried
urllib2.HTTPCookieProcessor('screener.ashx?v=151') but that failed
with attr has no cookie header.

From the HTTPCookieProcessor doco, it appears that non-IE browsers
have a cookie file (and example code) but from what I can tell IE uses
a hidden folder. (you can set your location in IE but it appends a
folder "\Temporary Internet Files" -

From: http://docs.python.org/dev/library/cookielib.html

***
This example illustrates how to open a URL using your Netscape,
Mozilla, or Lynx cookies (assumes Unix/Netscape convention for
location of the cookies file):

import os, cookielib, urllib2
cj = cookielib.MozillaCookieJar()
cj.load(os.path.join(os.environ["HOME"], ".netscape/cookies.txt"))
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
***

Not sure how to adapt this for IE.

You could create a file that resembles the cookies.txt - no idea how that
looks, but I guess it's pretty simple.

Diez
 
K

KB

I ran them seperately, hoping for a clue as to what my "cookiejar"
was.
The cookie-acquisition stuff returns "screener.ashx?v=151" when I
search with my domain I am interested in. I have tried
urllib2.HTTPCookieProcessor('screener.ashx?v=151') but that failed
with attr has no cookie header.
From the HTTPCookieProcessor doco, it appears that non-IE browsers
have a cookie file (and example code) but from what I can tell IE uses
a hidden folder. (you can set your location in IE but it appends a
folder "\Temporary Internet Files"  -

***
This example illustrates how to open a URL using your Netscape,
Mozilla, or Lynx cookies (assumes Unix/Netscape convention for
location of the cookies file):
import os, cookielib, urllib2
cj = cookielib.MozillaCookieJar()
cj.load(os.path.join(os.environ["HOME"], ".netscape/cookies.txt"))
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
***
Not sure how to adapt this for IE.

You could create a file that resembles the cookies.txt - no idea how that
looks, but I guess it's pretty simple.

Diez- Hide quoted text -

- Show quoted text -

Yeah unfortunately I just tried Firefox and it uses cookies.sqlite
now... more dead ends :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top