retrieving https pages

Eric · Jul 19, 2005

I'm using Linux - Manriva LE2005, python 2.3 (or i can also use python 2.4
on my other system just as well).
Anyways...
I want to get a web page containing my stock grants.
The initial page is an https and there is a form on it to
fill in your username and password and then click "login"
I played with python's urlopen and basically it complains "your browser
doesnt support frames" meaning the urlopen call makes it unhappy somehow.
Is it reasonable to think i can build a script to login to this secure
website, move to a different page (on that site) and download it to disk?
Or am i just looking at a ling complicated task.
I'd really like to get the page because then i can analyze it from a cron
job and email myself my current options value each week or each month.
Thanks
Eric

ncf · Jul 19, 2005

It might be checking the browser's User-agent. My best bet for you
would to be to use something to record the headers your browser sends
out, and mimic those in Python.

If you look at the source code for urlopener (I think you can press
Alt+M and type in "urlopener"), under the FancyURLopener definition,
you should see something like self.add_headers (not on a box to check
it right now, but it's in the constructer, I remember that much).

Just set all the headers to send out (like your browser would) by
setting that value from your script. i.e.:

import urlopener
urlopener = FancyURLopener()
urlopener.add_headers =
[('User-agent','blah'),('Header2','val'),('monkey','bone')]
# do the other stuff here

HTH

-Wes

Mike Meyer · Jul 20, 2005

Eric said:
I'm using Linux - Manriva LE2005, python 2.3 (or i can also use python 2.4
on my other system just as well).
Anyways...
I want to get a web page containing my stock grants.
The initial page is an https and there is a form on it to
fill in your username and password and then click "login"
I played with python's urlopen and basically it complains "your browser
doesnt support frames" meaning the urlopen call makes it unhappy somehow.
Is it reasonable to think i can build a script to login to this secure
website, move to a different page (on that site) and download it to disk?
Or am i just looking at a ling complicated task.

It's not that bad. It took me about half a day to do this for a site I
wanted scraped regularly, and what I had to do was much more
complicated than what you describe. I had to deal with an optional
second login page (a "security feature" of the site), http-equiv
redirects (which urlopen doesn't handle), and then digging the URL of
the page I wanted to get information from from the resulting page.

The complaint about your browser may be their inadequate attempt to
deal with browser portability by putting that on the resulting framed
page in the NOFRAMES element. In which case, you just need to find the
URL for the frame that's got the information you want, and get that
page. On the other hand, as Wes said, they may be browser-sniffing. In
which case you'll have to set the User-Agent to something they won't
complain about. Personally, I always try "Your Web Site Developer
Sucks" to see if they have a list of disallowed browsers. If that
fails, try the User-Agent string of a well-known browser.

For page scraping, install BeautifulSoup.

<mike

Search multiple pages	7	Sep 4, 2023
urllib2.urlopen+BadStatusLine+https	0	May 12, 2011
MDX pages not rendering in Gatsby.js	0	Oct 25, 2023
Problem in getting dashboard page from login page in python pycharm using POST command	0	Dec 24, 2022
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Change form action	0	Jul 2, 2022
First time attempt at installing scipy	1	Oct 12, 2023
Error with python 3.3.2 and https	6	May 23, 2013

retrieving https pages

Eric

ncf

Mike Meyer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads