Is there a better way to search CPAN than search.cpan.org?

Discussion in 'Perl Misc' started by usenet@DavidFilmer.com, Oct 11, 2005.

  1. Guest

    CPAN is great, but it has become quite large (and somewhat haphazardly
    organized). The "search.cpan.org" interface has not changed
    significantly (except for the temporary orange color) since I first
    played around with Perl five years ago. This interface is rather
    crude, apparently supporting only simple (but somewhat ordered) OR
    searches, without support for notations often found in other search
    engines, such as phrase quoting or "+".

    The CPAN FAQs mention three other search methods. One is a broken link,
    and the other two offer no advantages that I can see. I've played
    around with Google's "Advanced Search" against this domain, but the
    signal-to-noise ratio is often quite low.

    There is often a great Perl module out there that does whatever I want,
    but finding it can be really dicey unless I get lucky guessing the
    keywords.

    Is there a better way to search CPAN??? It seems like a vast treasure
    without a map!
     
    , Oct 11, 2005
    #1
    1. Advertising

  2. Gunnar Hjalmarsson, Oct 11, 2005
    #2
    1. Advertising

  3. Guest

    Gunnar Hjalmarsson wrote:
    > One way:
    > http://www.google.com/search?q=site:search.cpan.org whatever


    Yeah, I've played around with that (as I mentioned). But the results
    have not been very good for me - I can't see how to restrict the search
    to package descriptions (which is usually what I would want to do).
    Google searches EVERYTHING, including source code, bug reports, POD,
    etc - even archvied versions of every package. When the search is run
    against such a wide information base, keywords tend to occur and repeat
    all over the place, and I can get dozens of (often irrelevant) hits on
    search terms (because the search terms are usually very Perl-ish and
    occur widely). If there is a way to restrict Google to search only
    current module descriptions, I don't know what it is.

    >
    > And don't forget that you optionally can _browse_ per category.
    >


    True, and that is helpful sometimes, but I find that the categories
    seem to be generally old and poorly maintained (and not always
    intuitively arranged). For example, the category "File Handle
    Input/Output" has only sixteen modules (plus two sub-categories, IO and
    Log). For some reason, this category includes "Expect" (which I would
    have thought would be under "Control Flow Utilities"), but does not
    include MANY other packages, such as IO::All (the Swiss Army Knife of
    filehandle I/O).
     
    , Oct 11, 2005
    #3
  4. wrote in
    news::

    > Gunnar Hjalmarsson wrote:
    >> One way:
    >> http://www.google.com/search?q=site:search.cpan.org whatever

    >
    > Yeah, I've played around with that (as I mentioned). But the results
    > have not been very good for me - I can't see how to restrict the
    > search to package descriptions (which is usually what I would want to
    > do). Google searches EVERYTHING, including source code, bug reports,
    > POD, etc - even archvied versions of every package. When the search
    > is run against such a wide information base, keywords tend to occur
    > and repeat all over the place, and I can get dozens of (often
    > irrelevant) hits on search terms (because the search terms are usually
    > very Perl-ish and occur widely). If there is a way to restrict Google
    > to search only current module descriptions, I don't know what it is.


    You can use "Advanced Search":

    Occurrences Return results where my terms occur

    Here, restrict matches to URL.

    http://www.google.com/search?hl=en&lr=&as_qdr=all&q=allinurl: io site%
    3Asearch.cpan.org&btnG=Search

    or http://tinyurl.com/8qgqz

    http://www.google.com/search?hl=en&lr=&as_qdr=all&am
    p;q=allinurl%3A+www+site%3Asearch.cpan.org&btnG=Search

    or http://tinyurl.com/c7zvq

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Oct 11, 2005
    #4
  5. Dr.Ruud Guest

    schreef:

    > If there is a way to
    > restrict Google to search only current module descriptions, I don't
    > know what it is.


    Include in your search texts that are exclusively (absent) on such
    pages.
    Example:

    allintext: Module.version Annotate.this.POD -Latest.release uri fetch
    site:search.cpan.org

    where 'uri fetch' is of course the dynamic part.
    Leave the -Latest.release out to also find older versions.


    Or look for a URL with /README in it:

    allinurl: /README io all site:search.cpan.org

    where 'io all' is the dynamic part. But that will also find older
    versions.


    --
    Affijn, Ruud <http://www.pandora.com/?sc=sh770781&cmd=tunermini>

    "Gewoon is een tijger."
     
    Dr.Ruud, Oct 12, 2005
    #5
  6. Randy Kobes Guest

    wrote:
    > Gunnar Hjalmarsson wrote:

    [ ... ]
    >
    >>And don't forget that you optionally can _browse_ per category.
    >>

    > True, and that is helpful sometimes, but I find that the categories
    > seem to be generally old and poorly maintained (and not always
    > intuitively arranged). For example, the category "File Handle
    > Input/Output" has only sixteen modules (plus two sub-categories, IO and
    > Log). For some reason, this category includes "Expect" (which I would
    > have thought would be under "Control Flow Utilities"), but does not
    > include MANY other packages, such as IO::All (the Swiss Army Knife of
    > filehandle I/O).


    Although not as extensive nor as full-featured as search.cpan.org,
    one thing I've implemented in our CPAN search engine at
    http://cpan.uwinnipeg.ca/htdocs/faqs/cpan-search.html is
    an automatic categorization, using AI::Categorizer (to within
    some confidence level), of modules that don't have categories
    supplied by PAUSE. For example, the "File Handle Input/Output"
    categeory:
    http://cpan.uwinnipeg.ca/chapter/File_Handle_Input_Output
    contains the IO subcategory
    http://cpan.uwinnipeg.ca/chapter/File_Handle_Input_Output/IO
    which does include IO-All.

    The way these categories are set up is perhaps worth explaining.
    When an author uploads a package to PAUSE, there's an option
    available to associate registered modules with one of the
    major categories. In the above example, IO::All has been associated
    with the "File_Handle_Input_Output" category, and so the IO-All
    package appears in the File_Handle_Input_Output/IO subcategory.
    Similarly, HTTP::Request of libwww-perl is associated with the
    "World_Wide_Web_HTML_HTTP_CGI" category, so that libwww-perl
    appears in the "World_Wide_Web_HTML_HTTP_CGI/HTTP" subcategory.

    There has been discussions of adding a key-words field in the
    META.yml file that recent distributions carry, in order to
    provide better search results, but the details of this haven't
    been finalized.

    --
    best regards,
    randy kobes
     
    Randy Kobes, Oct 12, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Bencsik
    Replies:
    2
    Views:
    840
  2. Ben Bullock
    Replies:
    12
    Views:
    288
    Ben Bullock
    Jul 8, 2008
  3. PerlFAQ Server
    Replies:
    0
    Views:
    697
    PerlFAQ Server
    Feb 3, 2011
  4. PerlFAQ Server
    Replies:
    0
    Views:
    700
    PerlFAQ Server
    Apr 4, 2011
  5. Replies:
    2
    Views:
    56
    Mark H Harris
    May 13, 2014
Loading...

Share This Page