Trying to make a spider using mechanize

tedpottel · Sep 8, 2008

Hi,

I can read the home page using the mechanize lib. Is there a way to
load in web pages using filename.html instad of servername/
filename.html. Lots of time the links just have the file name. I'm
trying to read in the links name and then vsit those pages.

here is the sample code I am ussing.

import ClientForm
import mechanize

#get home page
request = mechanize.Request("http://www.activetechconsulting.com")
response = mechanize.urlopen(request)
print response.read()

#sub page (this does note work)
request = mechanize.Request("service.html")
response = mechanize.urlopen(request)
print response.read-Ted

James Mills · Sep 8, 2008

Hi,

Perhaps you might want to
try out using a sample spider
I wrote and base your code of
this ?

See: http://hg.shortcircuit.net.au/index.wsgi/pymills/file/b9936ae2525c/examples/spider.py

cheers
James

python urllib mechanize post problem	0	May 23, 2010
mechanize select_form issue..	0	Jul 10, 2006
Access to objects in a frame on a web page	0	Sep 12, 2013
webscrapping ringcentral.com using mechanize	3	Dec 16, 2009
At which point in a web centric project using postgres over mysql (or vice-versa) begin to make a noticiable difference?	0	Sep 12, 2024
Trying to solve a python/mechanize "error 500" http error	0	Jul 21, 2008
Help with mechanize	0	Aug 6, 2008
Need a spider library	0	Oct 12, 2005

Trying to make a spider using mechanize

tedpottel

James Mills

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads