MSIE6 Python Question

R

Ralph A. Gable

I'm a newbie at this but I need to control MSIE6 using Python. I have
read the O'Reilly win32 python books and got some hints. But I need to
Navigate to a site (which I know how to do) and then I need to get at
the source code for that site inside Python (as when one used the
View|Source drop down window). Can anyone point me to some URLs that
would help out? Or just tell me how to do it? I would be very
grateful.
 
K

Kevin T. Ryan

Ralph said:
I'm a newbie at this but I need to control MSIE6 using Python. I have
read the O'Reilly win32 python books and got some hints. But I need to
Navigate to a site (which I know how to do) and then I need to get at
the source code for that site inside Python (as when one used the
View|Source drop down window). Can anyone point me to some URLs that
would help out? Or just tell me how to do it? I would be very
grateful.

I'm not sure why you need to go through IE, but maybe this will get you into
the right direction:

You could do:
for line in f:
process(line)

just like you can with a file. Check the urllib, urllib2, and other related
modules (maybe httplib). Hope that helps.
 
G

Guido Wesdorp

Ralph said:
Can anyone point me to some URLs that
would help out? Or just tell me how to do it? I would be very
grateful.

I can't go there right now, since it only works on IE (uses Active-X for
the navigation, very nasty) but there should be a full reference of IE's
COM API on msdn.microsoft.com somewhere, check for the VB API. That can
be used from COM, as soon as you've instantiated the COM object you can
call the methods and use the attributes from this API. I assume there
are methods in there to either directly get the info you want or else to
use JavaScript to get to the info.

Good luck,

Guido Wesdorp
 
R

Ralph A. Gable

Kevin T. Ryan said:
I'm not sure why you need to go through IE, but maybe this will get you into
the right direction:


You could do:
for line in f:
process(line)

just like you can with a file. Check the urllib, urllib2, and other related
modules (maybe httplib). Hope that helps.


Sorry. I forgot to mention that I have tried that. The data I want is
being stripped out when I access the URL via urllib. I CAN see the
data when I go into IE and do view source but when I use urllib the
site intentionally blanks out the information I want. For that reason,
I would like to get it using IE6 if I can. If there are other ways to
fake out the site, I would be interested in that also. I thought that
perhaps the site was detecting the fact that I was not querying it
using a browser. I tried putting that into into the HTTP messages but
may not have done it right. At any rate couldn't get that to work. It
may be that the site is using cookies to be sure someone is not
getting the data. I haven't pursued that. Again that is another reason
I wanted to use IE6 (since I know it works). The data is on a site to
which I subscribe to a service. But the particular information is
available to anyone if he/she types in the url (as long as you are
using a browser).
 
M

Michael Geary

Ralph said:
The data I want is being stripped out when I access the URL
via urllib. I CAN see the data when I go into IE and do view
source but when I use urllib the site intentionally blanks out
the information I want. For that reason, I would like to get it
using IE6 if I can. If there are other ways to fake out the site,
I would be interested in that also.

You may be able to get urllib or urllib2 to work using some of the other
tips in this thread, such as the user agent string. Or it may have to do
with cookies, in which case the ClientCookie module may be useful:

http://wwwsearch.sourceforge.net/ClientCookie/

If you do want to use IE, it's really easy. Let's assume you have an ie
object that you've gotten with:

ie = win32com.client.Dispatch( 'InternetExplorer.Application' )

and you've navigated to your URL using ie.Navigate( url ), and you've waited
for Navigate to finish. Then, you can get the document with:

doc = ie.Document

From there, you can get to anything. If you want the entire HTML source,
it's:

doc.documentElement.outerHTML

Or better yet, you can use the IE object model to let IE do the work of
parsing the HTML for you. For example, suppose the document contains a form
named 'loginForm' with 'username' and 'password' fields, and you want to
fill in those two fields and submit the form. You could do it with:

form = doc.forms.loginForm
form.username = 'myname'
form.password = 'mypassword'
form.submit()

Basically, you can use about the same code you'd use in JavaScript or Visual
Basic inside the web page.

Here's the MSDN reference for the InternetExplorer object:

http://msdn.microsoft.com/workshop/browser/webbrowser/reference/objects/internetexplorer.asp

And here's the reference for the document object:

http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/obj_document.asp

(Sorry about the long URLs; you know what to do.)

One other note: You probably already know about this, but after you do do
the Navigate, you need to wait until IE has loaded the page. You can either
use the NavigateComplete2 event, or it may be easier to cheat a bit and use
a loop with time.sleep() and test the ie.Busy property. I like to wait until
ie.Busy is false and remains false for a couple of seconds, to avoid being
tripped up by redirects where Busy may go false momentarily and then become
true again during the redirect.

-Mike
 
R

Ralph A. Gable

Mike,
Thanks ever so much. That worked and helps tremendously.
Ralph A. Gable
 
M

Michael Geary

Ralph said:
Thanks ever so much. That worked and helps tremendously.

Great, glad to hear it, Ralph.

Internet Explorer's object model is really easy to work with once you know
how to get to it, and Python interfaces to it very nicely. I've been pleased
with how well it works out in projects I've done with it.

-Mike
 
C

calfdog

I'm a newbie at this but I need to control MSIE6 using Python. I have
read the O'Reilly win32 python books and got some hints. But I need to
Navigate to a site (which I know how to do) and then I need to get at
the source code for that site inside Python (as when one used the
View|Source drop down window). Can anyone point me to some URLs that
would help out? Or just tell me how to do it? I would be very
grateful.


Ralph,

Check out P.A.M.I.E. (Python Automation Module for Internet Explorer)
It's a class file that allows you to control IE. (automation)

You can Automate things like:
* Click and Image, Button or Link
* Entering text into fields on a form
* Select a List item
* Navigate to a site
etc....


The source is up on souceforge.net follow the links from
http://pamie.sourceforge.net.

If you check out the Class file CPamie it should help you get going!
it comes with a test file an small tutorial. Best of all it's it works
and it's free!

Later
RLM
 
T

tutu

I'm not sure why you need to go through IE, but maybe this will get you into
Sorry. I forgot to mention that I have tried that. The data I want is
being stripped out when I access the URL via urllib.


Try something like this:
It may be the site does not like urllib agent so try to pretend you are using IE.
class URLHandler(urllib2.HTTPRedirectHandler, urllib2.HTTPDefaultErrorHandler):
pass
agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
request = urllib2.Request(url)
request.add_header("User-Agent", agent)
opener = urllib2.build_opener(URLHandler())
opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent
htm = opener.open(request)
opener.close()
htm.read()

Good Look
 
R

Ralph A. Gable

I tried this bur it did not work. From that I assume they are
using cookies. Since I am not handling them (using this method),
the figure that out and fill in the page with dashes where I get
data when processing the HTML through MSIE.
Thanks for your suggestion.

Sorry. I forgot to mention that I have tried that. The data I want is
being stripped out when I access the URL via urllib.


Try something like this:
It may be the site does not like urllib agent so try to pretend you are using IE.
class URLHandler(urllib2.HTTPRedirectHandler, urllib2.HTTPDefaultErrorHandler):
pass
agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
request = urllib2.Request(url)
request.add_header("User-Agent", agent)
opener = urllib2.build_opener(URLHandler())
opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent
htm = opener.open(request)
opener.close()
htm.read()

Good Look
 
C

calfdog

I'm a newbie at this but I need to control MSIE6 using Python. I have
read the O'Reilly win32 python books and got some hints. But I need to
Navigate to a site (which I know how to do) and then I need to get at
the source code for that site inside Python (as when one used the
View|Source drop down window). Can anyone point me to some URLs that
would help out? Or just tell me how to do it? I would be very
grateful.

TRY THIS!!!!

from win32com.client import DispatchEx

import time

def wait(ie): # very important!!! you have to wait for each page to load
"Given an IE object, wait until the object is ready for input."
while ie.Busy: time.sleep(0.1)

doc = ie.Document
while doc.ReadyState != 'complete': time.sleep(0.1)
return doc

def ClickLink(ie, mylink):
#hrefs = []
for link in ie.Document.links:
if link is None: break # needed for browser bug

if link.innerText == mylink:
link.Click()


# Here is what you need
ie = DispatchEx('InternetExplorer.Application')
ie.Visible = 1
ie.Navigate ('www.python.org')

# Some extra
wait(ie)# Very important you must wait for document to finish loading
ClickLink(ie,'Search')


#later
#RLM
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top