waiting for html to load: a followup

J

Josh

Hi - A couple days ago I posted asking for help on how to download a
pushed file. I am trying to write a script to download a bunch of links
from a page that takes a while to load.

I managed to get just about everything done using python to load IE, but
aside from not really liking that style, I couldnt figure out how to
have python download the pushed file, or how to read IE headers into
python (the headers point to the download location)

Anyway, I decided to forget IE and I am now trying to use urllib2 to
open up the page, read it, etc. My problem is the page has a built-in
refresh and I don't know how to have python re-read the page until it's
ready to hand over the links.

An example of the page is:
http://edcw2ks23.cr.usgs.gov/Websit...reaList=49.0,47.0,-122.0,-124.08&prodList=NED,

I believe I need to read the header, grab the cookie session id, and add
it back to the header. I can do all thus, but I'm stuck on probably
very simple syntax to re-read the page rather than open a new
connection, if that makes sense (I'm new to http as well as python).


My code snippets:

myreq = urllib2.Request(url)
opener = urllib2.build_opener()
headers = feeddata.info()
cookie = headers['set-cookie']
cookie = cookie[:-8]


while x < 10:
feeddata = opener.open(myreq)
data = feeddata.read()
myreq.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1)')
myreq.add_header('Cookie', cookie)
print data[1600:1650]
print '\n\n\n\n*****************Using Cookie: %s' % cookie
print '****************Header info: \n',headers
sleep(3)
x = x+1

Any help greatly appreciated. Thanks in advance, and when I know what
I'm doing I'll repay the favors.

-Josh
 
J

John J. Lee

Josh said:
Anyway, I decided to forget IE and I am now trying to use urllib2 to
open up the page, read it, etc. My problem is the page has a built-in
refresh and I don't know how to have python re-read the page until
it's ready to hand over the links.

ClientCookie does that (HTTPRefreshProcessor and HTTPEquivProcessor in
particular).

http://wwwsearch.sf.net/ClientCookie


I recommend using the alpha release. The interface will change a
little soon, but you almost certainly won't notice.

An example of the page is:
http://edcw2ks23.cr.usgs.gov/Websit...reaList=49.0,47.0,-122.0,-124.08&prodList=NED,

I believe I need to read the header, grab the cookie session id, and
add it back to the header. I can do all thus, but I'm stuck on

It'll do the cookies too :)

[...]
probably very simple syntax to re-read the page rather than open a new
connection, if that makes sense (I'm new to http as well as python).

You don't need to ensure it's the same connection. In fact, you can't
easily do that with urllib2 (or ClientCookie) as it is currently.

HTH


John
 
J

John J. Lee

Josh said:
Anyway, I decided to forget IE and I am now trying to use urllib2 to
open up the page, read it, etc. My problem is the page has a built-in
refresh and I don't know how to have python re-read the page until
it's ready to hand over the links.

An example of the page is:
http://edcw2ks23.cr.usgs.gov/Websit...reaList=49.0,47.0,-122.0,-124.08&prodList=NED,

Example, with some debugging turned on so you can see some of what's
going on:

import ClientCookie
opener = ClientCookie.build_opener(
ClientCookie.HTTPRefreshProcessor(max_time=None),
ClientCookie.HTTPResponseDebugProcessor(),
ClientCookie.HTTPRedirectDebugProcessor(),
)
ClientCookie.getLogger("ClientCookie").setLevel(ClientCookie.DEBUG)

r = opener.open('http://edcw2ks23.cr.usgs.gov/Websit...reaList=49.0,47.0,-122.0,-124.08&prodList=NED,')
f = open('out.html', 'w')
f.write(r.read())


Don't mix ClientCookie and urllib2, BTW.


John
 
J

Josh

John,

I really appreciate your reply. I actually grabbed the ClientCookie
module last night and spent a long time trying to figure out how to
write this; your snippet of code was incredibly helpful to me. Nothing
quite like being totally new to a subject, I must say.

Thanks again,

-Josh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top