"Standard" Full Text Search Engine

Discussion in 'Python' started by Martin Marcher, Oct 26, 2007.

  1. Hello,

    is there something like a standard full text search engine?

    I'm thinking of the equivalent for python like lucene is for java or
    ferret for rails. Preferrably something that isn't exactly a clone of
    one of those but more that is python friendly in terms of the API it
    provides.

    Things I'd like to have:

    * different languages are supported (it seems most FTSs do only english)
    * I'd like to be able to provide an identifier (if I index files in
    the filesystem that would be the filename, or an ID if it lives in a
    database, or whatever applies)
    * I'd like to pass it just some (user defined) keywords with content,
    the actual content (as string, or list of strings or whatever) and to
    retrieve the results by search by keyword
    * something like a priority should be assignable to different fields
    (like field: title(priority=10, content="My Draft"),
    keywords(priority=50, list_of_keywords))

    Unnecessary:

    * built-in parsing of different files

    The "standard" I'm referring to would be something with a large and
    active user base. Like... WSGI is _the_ thing to refer to when doing
    webapps it should be something like $FTS-Engine is _the_ engine to
    refer to.

    any hints?

    --
    http://noneisyours.marcher.name
    http://feeds.feedburner.com/NoneIsYours
    Martin Marcher, Oct 26, 2007
    #1
    1. Advertising

  2. Martin Marcher wrote:

    > Hello,
    >
    > is there something like a standard full text search engine?
    >
    > I'm thinking of the equivalent for python like lucene is for java or
    > ferret for rails. Preferrably something that isn't exactly a clone of
    > one of those but more that is python friendly in terms of the API it
    > provides.
    >
    > Things I'd like to have:
    >
    > * different languages are supported (it seems most FTSs do only english)
    > * I'd like to be able to provide an identifier (if I index files in
    > the filesystem that would be the filename, or an ID if it lives in a
    > database, or whatever applies)
    > * I'd like to pass it just some (user defined) keywords with content,
    > the actual content (as string, or list of strings or whatever) and to
    > retrieve the results by search by keyword
    > * something like a priority should be assignable to different fields
    > (like field: title(priority=10, content="My Draft"),
    > keywords(priority=50, list_of_keywords))
    >
    > Unnecessary:
    >
    > * built-in parsing of different files
    >
    > The "standard" I'm referring to would be something with a large and
    > active user base. Like... WSGI is _the_ thing to refer to when doing
    > webapps it should be something like $FTS-Engine is _the_ engine to
    > refer to.
    >
    > any hints?


    There are several python lucene implementations available, and recently here
    a project called NUCULAR turned up. And there is ZCatalog, the
    full-text-indexing technology used in Zope, but which should be usable
    outside of zope.

    But "the" search-technology doesn't exist. I personally would most probably
    go for the lucene-based stuff, because there you possibly get auxiliary
    tools written in java.

    Diez
    Diez B. Roggisch, Oct 26, 2007
    #2
    1. Advertising

  3. Martin Marcher wrote:

    > Hello,
    >
    > is there something like a standard full text search engine?
    >
    > I'm thinking of the equivalent for python like lucene is for java or
    > ferret for rails. Preferrably something that isn't exactly a clone of
    > one of those but more that is python friendly in terms of the API it
    > provides.
    >
    > Things I'd like to have:
    >
    > * different languages are supported (it seems most FTSs do only english)
    > * I'd like to be able to provide an identifier (if I index files in
    > the filesystem that would be the filename, or an ID if it lives in a
    > database, or whatever applies)
    > * I'd like to pass it just some (user defined) keywords with content,
    > the actual content (as string, or list of strings or whatever) and to
    > retrieve the results by search by keyword
    > * something like a priority should be assignable to different fields
    > (like field: title(priority=10, content="My Draft"),
    > keywords(priority=50, list_of_keywords))
    >
    > Unnecessary:
    >
    > * built-in parsing of different files
    >
    > The "standard" I'm referring to would be something with a large and
    > active user base. Like... WSGI is _the_ thing to refer to when doing
    > webapps it should be something like $FTS-Engine is _the_ engine to
    > refer to.
    >
    > any hints?
    >


    I'm using swish-e (swish-e.org) for all my indexing needs. I'm not sure if
    there's a python binding available, I'm using swish-e as an external
    executable and live quite happyly with that.
    Stephan Diehl, Oct 26, 2007
    #3
  4. Martin Marcher

    Guest

    On Oct 26, 8:53 am, "Diez B. Roggisch" <> wrote:
    > Martin Marcher wrote:
    > > Hello,

    >
    > > is there something like a standard full text search engine?....
    > > any hints?

    >
    > There are several python lucene implementations available, and recently here
    > a project called NUCULAR turned up. And there is ZCatalog, the
    > full-text-indexing technology used in Zope, but which should be usable
    > outside of zope.....


    Thanks for the NUCULAR mention (http://nucular.sourceforge.net). It
    certainly doesn't meet all the requirements requested (very few users
    yet, some
    features missing). Please give it a look, however. It's easy to use
    and fast. How fast it is compared to others I can't say, especially
    since
    some of the numbers I see quoted out there are really incredible (how
    can an indexer by faster than "cp"?) -- I suspect some sort of
    trickery,
    frankly.

    Anyway, if you want a feature like proximity searching or
    some sort of internationalization support (it works with unicode, but
    that's probably not enough), please let me know. I focused on
    the core indexing and retrieval functionality, and I think a lot of
    additional features can be added easily.

    fwiw, -- Aaron Watters

    ===
    % make love
    don't know how to make love. stopping.
    , Oct 26, 2007
    #4
  5. 2007/10/26, <>:
    > On Oct 26, 8:53 am, "Diez B. Roggisch" <> wrote:
    > > Martin Marcher wrote:

    > Thanks for the NUCULAR mention (http://nucular.sourceforge.net). It
    > certainly doesn't meet all the requirements requested (very few users
    > yet, some features missing). Please give it a look, however. It's easy to use
    > and fast. How fast it is compared to others I can't say, especially
    > since some of the numbers I see quoted out there are really incredible (how
    > can an indexer by faster than "cp"?) -- I suspect some sort of
    > trickery,
    > frankly.


    For starters I think I will go with nucular. It seems good enough,
    lightweight and easy to use.

    > Anyway, if you want a feature like proximity searching or
    > some sort of internationalization support (it works with unicode, but
    > that's probably not enough), please let me know. I focused on
    > the core indexing and retrieval functionality, and I think a lot of
    > additional features can be added easily.


    I don't know much about the internals of search engines but I'll
    probably report back with a few suggestions after some time of usage
    :)


    --
    http://noneisyours.marcher.name
    http://feeds.feedburner.com/NoneIsYours
    Martin Marcher, Oct 26, 2007
    #5
  6. Martin Marcher

    Paul Rubin Guest

    "Martin Marcher" <> writes:
    > is there something like a standard full text search engine?
    >
    > I'm thinking of the equivalent for python like lucene is for java or
    > ferret for rails. Preferrably something that isn't exactly a clone of
    > one of those but more that is python friendly in terms of the API it
    > provides.


    Ferret is basically a Lucene clone, originally written in Ruby but
    with the intensive parts later rewritten in C for speed since the Ruby
    version was too slow. There was something similar done in Python
    (PyLucene, I think) that was also pretty slow.

    Solr (a wrapper around Lucene) has a reasonable set of Python
    bindings. Solr has become very popular among web developers because
    it's pretty easy to set up and use. However, its flexibility is not
    all that great.

    Nucular looks promising though still in a fairly early stage.
    Suggestion for Aaron: it would be great if Nucular used the same
    directives as Solr (i.e. say <field/> instead of <fld/> and fix other
    such gratuitous differences) and implemented more Solr/Lucene features.
    Paul Rubin, Oct 26, 2007
    #6
  7. Martin Marcher

    Paul Boddie Guest

    On 26 Okt, 19:33, Paul Rubin <http://> wrote:
    >
    > Ferret is basically a Lucene clone, originally written in Ruby but
    > with the intensive parts later rewritten in C for speed since the Ruby
    > version was too slow. There was something similar done in Python
    > (PyLucene, I think) that was also pretty slow.


    You're thinking of Lupy, whose authors/supporters then seemed to
    switch to Xapian:

    http://www.divmod.org/projects/lupy

    Meanwhile, PyLucene doesn't seem particularly slow to me. Provided you
    can build the software (it requires gcj), it seems to work rapidly and
    reliably - the only problem I've ever had was related to a threading
    bug in Python 2.3 which was subsequently fixed by the Python core
    developers.

    Paul
    Paul Boddie, Oct 26, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ray Dixon [MVP]
    Replies:
    0
    Views:
    920
    Ray Dixon [MVP]
    Jul 21, 2003
  2. Brian Henry

    full text search from ado.net with access

    Brian Henry, Oct 5, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    407
    Brian Henry
    Oct 5, 2003
  3. Samuel R. Neff
    Replies:
    2
    Views:
    575
    bradley
    Jun 10, 2005
  4. Sasha
    Replies:
    3
    Views:
    573
    Sasha
    May 22, 2007
  5. pandi
    Replies:
    5
    Views:
    436
    pandi
    Dec 14, 2009
Loading...

Share This Page