Is there a better way to search CPAN than search.cpan.org?

U

usenet

CPAN is great, but it has become quite large (and somewhat haphazardly
organized). The "search.cpan.org" interface has not changed
significantly (except for the temporary orange color) since I first
played around with Perl five years ago. This interface is rather
crude, apparently supporting only simple (but somewhat ordered) OR
searches, without support for notations often found in other search
engines, such as phrase quoting or "+".

The CPAN FAQs mention three other search methods. One is a broken link,
and the other two offer no advantages that I can see. I've played
around with Google's "Advanced Search" against this domain, but the
signal-to-noise ratio is often quite low.

There is often a great Perl module out there that does whatever I want,
but finding it can be really dicey unless I get lucky guessing the
keywords.

Is there a better way to search CPAN??? It seems like a vast treasure
without a map!
 
U

usenet

Gunnar said:

Yeah, I've played around with that (as I mentioned). But the results
have not been very good for me - I can't see how to restrict the search
to package descriptions (which is usually what I would want to do).
Google searches EVERYTHING, including source code, bug reports, POD,
etc - even archvied versions of every package. When the search is run
against such a wide information base, keywords tend to occur and repeat
all over the place, and I can get dozens of (often irrelevant) hits on
search terms (because the search terms are usually very Perl-ish and
occur widely). If there is a way to restrict Google to search only
current module descriptions, I don't know what it is.
And don't forget that you optionally can _browse_ per category.

True, and that is helpful sometimes, but I find that the categories
seem to be generally old and poorly maintained (and not always
intuitively arranged). For example, the category "File Handle
Input/Output" has only sixteen modules (plus two sub-categories, IO and
Log). For some reason, this category includes "Expect" (which I would
have thought would be under "Control Flow Utilities"), but does not
include MANY other packages, such as IO::All (the Swiss Army Knife of
filehandle I/O).
 
A

A. Sinan Unur

(e-mail address removed) wrote in
Yeah, I've played around with that (as I mentioned). But the results
have not been very good for me - I can't see how to restrict the
search to package descriptions (which is usually what I would want to
do). Google searches EVERYTHING, including source code, bug reports,
POD, etc - even archvied versions of every package. When the search
is run against such a wide information base, keywords tend to occur
and repeat all over the place, and I can get dozens of (often
irrelevant) hits on search terms (because the search terms are usually
very Perl-ish and occur widely). If there is a way to restrict Google
to search only current module descriptions, I don't know what it is.

You can use "Advanced Search":

Occurrences Return results where my terms occur

Here, restrict matches to URL.

http://www.google.com/search?hl=en&lr=&as_qdr=all&q=allinurl:+io+site%
3Asearch.cpan.org&btnG=Search

or http://tinyurl.com/8qgqz

http://www.google.com/search?hl=en&lr=&as_qdr=all&am
p;q=allinurl%3A+www+site%3Asearch.cpan.org&btnG=Search

or http://tinyurl.com/c7zvq

Sinan
 
D

Dr.Ruud

(e-mail address removed) schreef:
If there is a way to
restrict Google to search only current module descriptions, I don't
know what it is.

Include in your search texts that are exclusively (absent) on such
pages.
Example:

allintext: Module.version Annotate.this.POD -Latest.release uri fetch
site:search.cpan.org

where 'uri fetch' is of course the dynamic part.
Leave the -Latest.release out to also find older versions.


Or look for a URL with /README in it:

allinurl: /README io all site:search.cpan.org

where 'io all' is the dynamic part. But that will also find older
versions.
 
R

Randy Kobes

Gunnar Hjalmarsson wrote: [ ... ]
And don't forget that you optionally can _browse_ per category.
True, and that is helpful sometimes, but I find that the categories
seem to be generally old and poorly maintained (and not always
intuitively arranged). For example, the category "File Handle
Input/Output" has only sixteen modules (plus two sub-categories, IO and
Log). For some reason, this category includes "Expect" (which I would
have thought would be under "Control Flow Utilities"), but does not
include MANY other packages, such as IO::All (the Swiss Army Knife of
filehandle I/O).

Although not as extensive nor as full-featured as search.cpan.org,
one thing I've implemented in our CPAN search engine at
http://cpan.uwinnipeg.ca/htdocs/faqs/cpan-search.html is
an automatic categorization, using AI::Categorizer (to within
some confidence level), of modules that don't have categories
supplied by PAUSE. For example, the "File Handle Input/Output"
categeory:
http://cpan.uwinnipeg.ca/chapter/File_Handle_Input_Output
contains the IO subcategory
http://cpan.uwinnipeg.ca/chapter/File_Handle_Input_Output/IO
which does include IO-All.

The way these categories are set up is perhaps worth explaining.
When an author uploads a package to PAUSE, there's an option
available to associate registered modules with one of the
major categories. In the above example, IO::All has been associated
with the "File_Handle_Input_Output" category, and so the IO-All
package appears in the File_Handle_Input_Output/IO subcategory.
Similarly, HTTP::Request of libwww-perl is associated with the
"World_Wide_Web_HTML_HTTP_CGI" category, so that libwww-perl
appears in the "World_Wide_Web_HTML_HTTP_CGI/HTTP" subcategory.

There has been discussions of adding a key-words field in the
META.yml file that recent distributions carry, in order to
provide better search results, but the details of this haven't
been finalized.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top