Suggestion For Useful Script -- Google Groups Search and Archive

E

EdwardATeller

The search function on Google Groups was recently broken. More info
here:

http://groups.google.com/group/Is-S...27893/4ffc98ca7b9eaca6?hl=en#4ffc98ca7b9eaca6

Made me realize how important this is. I thought someone with way
more talent than me might write a Perl script that takes as input a
Google Groups search for oranges (for example):

http://groups.google.com/groups/search?qt_s=1&q=oranges

and return most of the posts found as a series of linked HTML
documents.

Seems like a non-trivial problem, but maybe it's simple to one of you
Perl gods. Based on the level of concern when the search function
went away, I'd say people would be interested in archiving some of
their favorite searches. I know I am. Thanks for taking the time to
read this post.
 
R

Randal L. Schwartz

EdwardATeller> Made me realize how important this is. I thought someone with
EdwardATeller> way more talent than me might write a Perl script that takes as
EdwardATeller> input a Google Groups search for oranges (for example):

EdwardATeller> http://groups.google.com/groups/search?qt_s=1&q=oranges

EdwardATeller> and return most of the posts found as a series of linked HTML
EdwardATeller> documents.

You're not allowed to scrape the HTML[1]. And it looks like
<http://code.google.com/apis/ajaxsearch/documentation/> doesn't (currently)
have a Google Groups searching component.

So, it's not a matter of being talented. It's a matter of respecting Google's
permissions for you not to be a robot when you hit their site, because they
certainly respect the robots.txt you put up as well. It's about ethics.

print "Just another Perl hacker,"; # the original

[1] Section 5.3 of [http://www.google.com/accounts/TOS] says:

5.3 You agree not to access (or attempt to access) any of the Services by
any means other than through the interface that is provided by Google,
unless you have been specifically allowed to do so in a separate agreement
with Google. You specifically agree not to access (or attempt to access)
any of the Services through any automated means (including use of scripts
or web crawlers) and shall ensure that you comply with the instructions
set out in any robots.txt file present on the Services.
 
J

Jürgen Exner

EdwardATeller said:
The search function on Google Groups was recently broken. More info
here:

http://groups.google.com/group/Is-S...27893/4ffc98ca7b9eaca6?hl=en#4ffc98ca7b9eaca6

Made me realize how important this is. I thought someone with way
more talent than me might write a Perl script that takes as input a
Google Groups search for oranges (for example):

http://groups.google.com/groups/search?qt_s=1&q=oranges

and return most of the posts found as a series of linked HTML
documents.

Should be fairly simple by using LWP to get the page with the search
results, then one of the HTML parser modules to extract the links
(mostly copy-and-paste; AFAIR that task is even used as an example in
the documentation), and LWP again to get the target pages.

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top