C
Chris Uppal
nowwho said:While the legal information is handy and can (more than likely will) be
included in the report, is there any suggestions on how to tackle the
coding of the problem or suggestions as to where I can look for further
information?
Unfortunately, it appears that Google suspended their Search API last month
(http://code.google.com/apis/soapsearch/), so you will probably have to use
some sort of screen scraping.
If you want to do it in Java (rather than, say, by using command-line tools
such as wget or curl) then you'll need an HTTP client package. Java comes with
one (start with java.net.URL), but it has been said here that Google blocks
access via that, so you may be better off using a different, and more general,
package such as the Jakarta HTTP client
http://jakarta.apache.org/commons/httpclient/
Then, once you have worked out how to download data, you will need to parse it
to find the links you want. Parsing HTML with anything like reliability is not
easy (but you may not need much reliability in this case); you may find this
page of HTML parsers useful.
http://www.java-source.net/open-source/html-parsers
-- chris