Submitting forms over HTTPS with mechanize


R

Rex

Hello,

I am working on an academic research project where I need to log in to
a website (www.lexis.com) over HTTPS and execute a bunch of queries to
gather a data set. I just discovered the mechanize module, which seems
great because it's a high-level tool. However, I can't find any decent
documentation for mechanize apart from the docstrings, which are
pretty thin. So I just followed some other examples I found online, to
produce the following:

baseurl = 'http://www.lexis.com/'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-Agent', 'Firefox')]
br.open(baseurl)
br.select_form(name="formauth")
br["USER_ID"]="my_user_id"
br["PASSWORD"]="my_password"
result = br.submit()

This code hangs at br.submit(), and I can't tell what I'm doing wrong.
Typically I would inspect the HTTP data with an HTTP debugging proxy
(Fiddler), but I guess since this is HTTPS I can't do that. Any
glaring errors in my code?

By the way, does anyone have suggestions for Python modules that I
should use instead of mechanize (and that are sufficiently easy)? If
mechanize fails, I might try modifying some similar Perl code a friend
sent me that logs into lexis.com.

Thanks so much,

Rex
 
Ad

Advertisements

L

Larry Bates

Rex said:
Hello,

I am working on an academic research project where I need to log in to
a website (www.lexis.com) over HTTPS and execute a bunch of queries to
gather a data set. I just discovered the mechanize module, which seems
great because it's a high-level tool. However, I can't find any decent
documentation for mechanize apart from the docstrings, which are
pretty thin. So I just followed some other examples I found online, to
produce the following:

baseurl = 'http://www.lexis.com/'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-Agent', 'Firefox')]
br.open(baseurl)
br.select_form(name="formauth")
br["USER_ID"]="my_user_id"
br["PASSWORD"]="my_password"
result = br.submit()

This code hangs at br.submit(), and I can't tell what I'm doing wrong.
Typically I would inspect the HTTP data with an HTTP debugging proxy
(Fiddler), but I guess since this is HTTPS I can't do that. Any
glaring errors in my code?

By the way, does anyone have suggestions for Python modules that I
should use instead of mechanize (and that are sufficiently easy)? If
mechanize fails, I might try modifying some similar Perl code a friend
sent me that logs into lexis.com.

Thanks so much,

Rex

I've used mechanize quite successfully but others have suggested Twill
http://twill.idyll.org/. It seems to be at least documented.

-Larry
 
Ad

Advertisements

R

Rex

Rex said:
I am working on an academic research project where I need to log in to
a website (www.lexis.com) over HTTPS and execute a bunch of queries to
gather a data set. I just discovered the mechanize module, which seems
great because it's a high-level tool. However, I can't find any decent
documentation for mechanize apart from the docstrings, which are
pretty thin. So I just followed some other examples I found online, to
produce the following:
baseurl = 'http://www.lexis.com/'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-Agent', 'Firefox')]
br.open(baseurl)
br.select_form(name="formauth")
br["USER_ID"]="my_user_id"
br["PASSWORD"]="my_password"
result = br.submit()
This code hangs at br.submit(), and I can't tell what I'm doing wrong.
Typically I would inspect the HTTP data with an HTTP debugging proxy
(Fiddler), but I guess since this is HTTPS I can't do that. Any
glaring errors in my code?
By the way, does anyone have suggestions for Python modules that I
should use instead of mechanize (and that are sufficiently easy)? If
mechanize fails, I might try modifying some similar Perl code a friend
sent me that logs into lexis.com.
Thanks so much,

I've used mechanize quite successfully but others have suggested Twillhttp://twill.idyll.org/.  It seems to be at least documented.

-Larry

Thanks for the reply, Larry. I ran my code again and it worked; there
was probably some temporary issue with either my computer or the
server that caused it to hang.
 

Top