George said:
Apparently, *you* don't understand what they're trying to tell you. It
roughly boils down to the following:
If we just step back from the brink for a moment and give the
questioner the benefit of the doubt - that the exercise merely involves
automating some kind of interactions that would otherwise require lots
of manual messing around piloting a browser, rather than performing
some kind of bulk "suck down" of an entire site's information - then it
is obviously possible to use the following techniques:
* Use a well-known mirroring or archiving tool such as wget.
* Use various testing tools, some of which are written in Python.
* Use urllib, urllib2 or httplib plus an HTML or XML parser in your
own program.
* Automate a Web browser using some off-the-shelf program.
* Use various automation mechanisms provided by your environment
(eg. COM, DCOP), possibly with Python libraries (eg. PAMIE [1],
KPart Plugins [2]).
Various sites forbid wget and friends as a rule, understandably, but
there are sometimes reasons why you might want to use various tools to
automate a procedure involving lots of data which would waste a huge
amount of time if done manually. Perhaps you might have mail residing
in a Webmail system which can't be extracted via any process other than
reading all the messages in a browser, for example, or perhaps your
favourite Internet applications don't provide decent shortcuts to the
information you need, instead believing that it's all about the
"experience": surfing around watching all the animated adverts.
Automation and related technologies can legitimately help users regain
control of their Internet-resident data and make better use of the
services around it.
Paul
[1]
http://pamie.sourceforge.net/
[2]
http://www.boddie.org.uk/python/kpartplugins.html