html + javascript automations = [mechanize + ?? ] or something else?

J

John

I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j
 
J

John

I am curious about the webbrowser module. I can open up firefox
using webbrowser.open(), but can one control it? Say enter a
login / passwd on a webpage? Send keystrokes to firefox?
mouse clicks?

Thanks,
--j
 
B

Benjamin Niemann

Hello,
I am curious about the webbrowser module. I can open up firefox
using webbrowser.open(), but can one control it? Say enter a
login / passwd on a webpage? Send keystrokes to firefox?
mouse clicks?

Not with the webbrowser module - it can only launch a browser.

On the website of mechanize you will also find DOMForm
<http://wwwsearch.sourceforge.net/DOMForm/>, which is a webscraper with
basic JS support (using the SpiderMonkey engine from the Mozilla project).
But note that DOMForm is in a early state and not developed anymore
(according to the site, never used it myself).

You could try to script IE (perhaps also FF, dunno..) using COM. This can be
done using the pywin32 module <https://sourceforge.net/projects/pywin32/>.
How this is done in detail is a windows issue. You may find help and
documentation in win specific group/mailing list, msdn, ... You can usually
translate the COM calls from VB, C#, ... quite directly to Python.


HTH
 
A

Andrey Khavryuchenko

John,

"J" == John wrote:

J> I have to write a spyder for a webpage that uses html + javascript. I
J> had it written using mechanize but the authors of the webpage now use a
J> lot of javascript. Mechanize can no longer do the job. Does anyone
J> know how I could automate my spyder to understand javascript? Is there
J> a way to control a browser like firefox from python itself? How about
J> IE? That way, we do not have to go thru something like mechanize?

Up to my knowledge, there no way to test javascript but to fire up a
browser.

So, you might check Selenium (http://www.openqa.org/selenium/) and its
python module.
 
D

Diez B. Roggisch

Up to my knowledge, there no way to test javascript but to fire up a
browser.

So, you might check Selenium (http://www.openqa.org/selenium/) and its
python module.

No use in that, as to be remote-controlled by python, selenium must be run
on the server-site itself, due to JS security model restrictions.

Diez
 
D

Duncan Booth

John said:
Is there a way
to control a browser like firefox from python itself? How about IE?

IE is easy enough to control and you have full access to the DOM:
0000C05BAE0B}', 0, 1, 1)
<module 'win32com.gen_py.EAB22AC0-30C1-11CF-A7EB-0000C05BAE0Bx0x1x1' from
'C:\Python25\lib\site-packages\win32com\gen_py\EAB22AC0-30C1-11CF-A7EB-
0000C05BAE0Bx0x1x1.py'>['CLSID', 'ClientToWindow', 'ExecWB', 'GetProperty', 'GoBack', 'GoForward',
'GoHome', 'GoSearch', 'Navigate', 'Navigate2', 'PutProperty',
'QueryStatusWB', 'Quit', 'Refresh', 'Refresh2', 'ShowBrowserBar', 'Stop',
'_ApplyTypes_', '__call__', '__cmp__', '__doc__', '__getattr__',
'__init__', '__int__', '__module__', '__repr__', '__setattr__', '__str__',
'__unicode__', '_get_good_object_', '_get_good_single_object_', '_oleobj_',
'_prop_map_get_', '_prop_map_put_', 'coclass_clsid']<DT class=portletHeader><A class="feedButton link-plain"
href="feed://plone.org/news/newslisting/RSS"><IMG title="RSS subscription
feed for news items" alt=RSS src="http://plone.org/rss.gif"> </A><A
href="http://plone.org/news">News</A> </DT>

.... and so on ...


See
http://msdn.microsoft.com/workshop/browser/webbrowser/reference/objects/int
ernetexplorer.asp
for the documentation.
 
A

Andrey Khavryuchenko

Diez,


DBR> No use in that, as to be remote-controlled by python, selenium must be run
DBR> on the server-site itself, due to JS security model restrictions.

Sorry, missed 'spider' word in the original post.
 
I

ina

John said:
I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j

You want pamie, iec or ishybrowser. Pamie is probably the best choice
since it gets patches and updates on a regular basis.

http://pamie.sourceforge.net/
 
J

John

I tried to install pamie (but I have mostly used python on cygwin on
windows).
In the section " What will you need to run PAMIE", it says I will need
"Mark Hammonds Win32 All"
which I can not find. Can anyone tell me how do I install PAMIE? Do I
need python for
windows that is different from cygwin's python?

Thanks,
--j
 
J

John

My python2.5 installation on windows did not come with "win32com".
How do I install/get this module for windows?

Thanks,
--j

Duncan said:
John said:
Is there a way
to control a browser like firefox from python itself? How about IE?

IE is easy enough to control and you have full access to the DOM:
0000C05BAE0B}', 0, 1, 1)
<module 'win32com.gen_py.EAB22AC0-30C1-11CF-A7EB-0000C05BAE0Bx0x1x1' from
'C:\Python25\lib\site-packages\win32com\gen_py\EAB22AC0-30C1-11CF-A7EB-
0000C05BAE0Bx0x1x1.py'>['CLSID', 'ClientToWindow', 'ExecWB', 'GetProperty', 'GoBack', 'GoForward',
'GoHome', 'GoSearch', 'Navigate', 'Navigate2', 'PutProperty',
'QueryStatusWB', 'Quit', 'Refresh', 'Refresh2', 'ShowBrowserBar', 'Stop',
'_ApplyTypes_', '__call__', '__cmp__', '__doc__', '__getattr__',
'__init__', '__int__', '__module__', '__repr__', '__setattr__', '__str__',
'__unicode__', '_get_good_object_', '_get_good_single_object_', '_oleobj_',
'_prop_map_get_', '_prop_map_put_', 'coclass_clsid']<DT class=portletHeader><A class="feedButton link-plain"
href="feed://plone.org/news/newslisting/RSS"><IMG title="RSS subscription
feed for news items" alt=RSS src="http://plone.org/rss.gif"> </A><A
href="http://plone.org/news">News</A> </DT>

... and so on ...


See
http://msdn.microsoft.com/workshop/browser/webbrowser/reference/objects/int
ernetexplorer.asp
for the documentation.
 
G

Gabriel Genellina

My python2.5 installation on windows did not come with "win32com".
How do I install/get this module for windows?

Look for the pywin32 package at sourceforge.net
 
J

John

I tried it, didnt work with the python25 distribution msi file that is
on python.org
But activestate python worked. Now I can open IE using COM. What I am
trying
to figure out is how to click an x,y coordinate on a page in IE
automatically
using COM. How about typing something automatically...Any ideas?

Thanks,
--j
 
D

Duncan Booth

John said:
I tried it, didnt work with the python25 distribution msi file that is
on python.org
But activestate python worked. Now I can open IE using COM. What I am
trying
to figure out is how to click an x,y coordinate on a page in IE
automatically
using COM. How about typing something automatically...Any ideas?

Don't think about clicking a coordinate or typing something; think about
the actions on the page. e.g. to fill in a field on a form you'll want
something like:

ie.document.forms[formname][fieldname].value = 'whatever'

to click a button call its click method e.g.

submit = ie.document.forms[0]['submit']
submit.focus()
submit.click()

Check out the documentation at msdn.microsoft.com for the application,
document, form etc. objects. Generally speaking anything you could have
done through javascript you should be able to do through automation, plus a
few of other things that javascript might have blocked for security
reasons.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top