[ANN] Ferret 0.10.6 released (and some benchmarks)

Discussion in 'Ruby' started by David Balmain, Sep 21, 2006.

  1. Hey folks,


    ** Description **

    Firstly for those who don't know, Ferret is a full-text search library
    which makes adding search to your application a breeze. It's much
    faster than MySQL full-text search as well most other search libraries
    out there. It allows you to do Boolean (+ruby + rails -jewelry) and
    phrase queries ("the quick brown fox") as well as some more unusual
    queries like fuzzy queries (misspelling~ matches mispeling or
    misspellng), wildcard queries (Aus?ral*), range queries
    (date:<=20050601) and a lot more. Ferret also now offers query result
    highlighting and excerpting.

    ** Announcement **

    This is the first Ferret announcement I've put up for a while, the
    reason being, the most recent releases of Ferret have been alpha
    releases. I completely rewrote Ferret from the ground up so that it
    no-longer uses Lucene's file format and I was able to gain so great
    performance improvements in the process.

    On the topic of performance, it has recently been brought to my
    attention that some people are aware of Ferret but avoid it because
    they think it is slow. Just to put that myth to rest, here are the
    outputs for a simple benchmark, indexing the reuters corpus available
    at:

    http://www.daviddlewis.com/resources/testcollections/reuters21578/

    First Apache Lucene. (Yes Java users, as you can see, I did warm up
    the JVM (with 6 repetitions of the test) and I used the options
    -server -Xmx500M -XX:CompileThreshold=100 so this is a fair test).

    ---------------------------------------------------
    1 Secs: 47.09 Docs: 19043
    2 Secs: 46.46 Docs: 19043
    3 Secs: 44.07 Docs: 19043
    4 Secs: 45.92 Docs: 19043
    5 Secs: 45.97 Docs: 19043
    6 Secs: 47.06 Docs: 19043
    ---------------------------------------------------
    Lucene 1.9-rc1-dev
    JVM 1.5.0_06 (Sun Microsystems Inc.)
    Linux 2.6.15-27-386 i386
    Mean: 46.10 secs
    Truncated mean (4 kept, 2 discarded): 46.35 secs
    ---------------------------------------------------

    And now Ferret:

    ------------------------------------------------------------
    0 Secs: 8.03 Docs: 19043
    1 Secs: 10.15 Docs: 19043
    2 Secs: 9.78 Docs: 19043
    3 Secs: 10.31 Docs: 19043
    4 Secs: 9.78 Docs: 19043
    5 Secs: 10.13 Docs: 19043
    ------------------------------------------------------------
    Mean 9.70 secs
    Truncated Mean (4 kept, 2 discarded): 9.96 secs
    ------------------------------------------------------------

    So as you can see, performance is no longer a problem. (incidentally,
    the pure C version can index the reuters corpus in under 3 seconds, an
    order of magnitude faster than Lucene).

    One new addition in the 0.10.* series of Ferret is a win32 gem so all
    those windows users out there can now get the super speed searches
    too.

    There have also be a lot of other changes in the Ferret API. You may
    want to check out the documentation for a refresher:

    http://ferret.davebalmain.com/api
    http://ferret.davebalmain.com/api/files/TUTORIAL.html

    ** Now Accepting Donations **

    Ferret has been a labour of love but it has taken up a lot more of my
    life than I ever expected. At in excess of 50,000 lines of code, I
    believe it is one of the largest Ruby projects, especially with only a
    single developer. (previous version before rewrite had >70,000 LOC so
    added together that is a lot of work). I would love to keep pushing
    Ferret forward at the rate it has been going but other things are
    going to have to start taking priority (like putting food on the
    table). If you find Ferret useful in your application and you aren't
    able to contribute with the development, please consider making a
    donation at the Ferret website:

    http://ferret.davebalmain.com/trac

    So where do I see Ferret going in the future? I'd really like to build
    an object-database based on Ferret, with ActiveRecord and Og bindings.
    Why?:

    * Fixes the current DRY problems with Ferret. ie, should you store
    data in the Ferret index to take advantage or highlighting? Or build
    your own highlighter so that the data isn't stored in two places.
    * Simplifies things. You'll be able to forget about IndexReaders,
    IndexWriters, file-locking, etcetera. Just create the database as you
    usually would and you have Ferret full-text search built in.
    * Range queries just work. No need to pad numbers or format dates correctly.
    * Sort just works. And it won't take forever to build the
    sort-index (currently a problem on very large indexes).
    * Performance, performance, performance. As people are often
    pointing out, the bottle neck in many applications falls in the data
    access layer. Mapping relational database schemas to Ruby objects (or
    any OO language for that matter) can be very expensive at run-time. A
    good object database should easily outperform even SQLite. (and I'm
    being very cautious here)

    Right now, I'd need to raise at least 5 figures before I'd consider
    this undertaking please send some encouragement my way if you would be
    interested in something like this. Otherwise I'd appreciate any kind
    of contribution, financial or assisting with development. In the mean
    time I will continue to improve test coverage and Ferret
    documentation, fix bugs and help people on the Ferret mailing list.

    Happy Ferreting.
    Dave
     
    David Balmain, Sep 21, 2006
    #1
    1. Advertising

  2. On Sep 20, 2006, at 7:48 PM, David Balmain wrote:

    > Hey folks,
    >
    >
    > ** Description **
    >
    > Firstly for those who don't know, Ferret is a full-text search library
    > which makes adding search to your application a breeze. It's much
    > faster than MySQL full-text search as well most other search libraries
    > out there. It allows you to do Boolean (+ruby + rails -jewelry) and
    > phrase queries ("the quick brown fox") as well as some more unusual
    > queries like fuzzy queries (misspelling~ matches mispeling or
    > misspellng), wildcard queries (Aus?ral*), range queries
    > (date:<=20050601) and a lot more. Ferret also now offers query result
    > highlighting and excerpting.
    >
    > ** Announcement **
    >
    > This is the first Ferret announcement I've put up for a while, the
    > reason being, the most recent releases of Ferret have been alpha
    > releases. I completely rewrote Ferret from the ground up so that it
    > no-longer uses Lucene's file format and I was able to gain so great
    > performance improvements in the process.
    >
    > On the topic of performance, it has recently been brought to my
    > attention that some people are aware of Ferret but avoid it because
    > they think it is slow. Just to put that myth to rest, here are the
    > outputs for a simple benchmark, indexing the reuters corpus available
    > at:
    > <snip good stuff>
    >
    > Happy Ferreting.
    > Dave
    >


    Hey Dave-

    Thank you for your continuing hard work on ferret. I am using it
    heavily in quite a few production rails applications and a few pure
    ruby projects too. This looks like a nice improvement over the last
    version and the benchmarks look great.

    Thanks
    -Ezra
     
    Ezra Zygmuntowicz, Sep 21, 2006
    #2
    1. Advertising

  3. David Balmain

    Max Muermann Guest

    On 9/21/06, David Balmain <> wrote:
    > Hey folks,
    >
    >
    > ** Description **
    >
    > Firstly for those who don't know, Ferret is a full-text search library
    > which makes adding search to your application a breeze. It's much
    > faster than MySQL full-text search as well most other search libraries
    > out there. It allows you to do Boolean (+ruby + rails -jewelry) and
    > phrase queries ("the quick brown fox") as well as some more unusual
    > queries like fuzzy queries (misspelling~ matches mispeling or
    > misspellng), wildcard queries (Aus?ral*), range queries
    > (date:<=20050601) and a lot more. Ferret also now offers query result
    > highlighting and excerpting.
    >


    Dave, congratulations on the great work you've done. I've been using
    Ferret a lot and it has never disappointed me. The benchmarks are
    fantastic, looking forward to what's coming next.

    Thanks,
    Max
     
    Max Muermann, Sep 21, 2006
    #3
  4. On Thursday 21 September 2006 03:48, David Balmain wrote:
    > ** Now Accepting Donations **
    >
    > Ferret has been a labour of love but it has taken up a lot more of my
    > life than I ever expected. At in excess of 50,000 lines of code, I
    > believe it is one of the largest Ruby projects, especially with only a
    > single developer. (previous version before rewrite had >70,000 LOC so
    > added together that is a lot of work). I would love to keep pushing
    > Ferret forward at the rate it has been going but other things are
    > going to have to start taking priority (like putting food on the
    > table). If you find Ferret useful in your application and you aren't
    > able to contribute with the development, please consider making a
    > donation at the Ferret website:
    >
    > http://ferret.davebalmain.com/trac
    >
    > So where do I see Ferret going in the future? I'd really like to build
    > an object-database based on Ferret, with ActiveRecord and Og bindings.

    <snip awesome ideas for future direction>

    Wow, I'm completely astounded by the work you've done with ferret. You're a
    one man coding machine. Especially considering the number of projects or
    attempts to port lucene to C or other languages that have floundered. Might I
    suggest you post this announcement/call for donations to the rails mailing
    list? I think people might be *very* interested in your idea for an object
    database built on Ferret with AR bindings. That would be an incredibly
    exciting development, and hopefully some of the big Rails users will realise
    that.

    Regards,

    Alex
     
    A. S. Bradbury, Sep 21, 2006
    #4
  5. On 9/21/06, A. S. Bradbury <> wrote:
    > On Thursday 21 September 2006 03:48, David Balmain wrote:
    > > ** Now Accepting Donations **
    > >
    > > Ferret has been a labour of love but it has taken up a lot more of my
    > > life than I ever expected. At in excess of 50,000 lines of code, I
    > > believe it is one of the largest Ruby projects, especially with only a
    > > single developer. (previous version before rewrite had >70,000 LOC so
    > > added together that is a lot of work). I would love to keep pushing
    > > Ferret forward at the rate it has been going but other things are
    > > going to have to start taking priority (like putting food on the
    > > table). If you find Ferret useful in your application and you aren't
    > > able to contribute with the development, please consider making a
    > > donation at the Ferret website:
    > >
    > > http://ferret.davebalmain.com/trac
    > >
    > > So where do I see Ferret going in the future? I'd really like to build
    > > an object-database based on Ferret, with ActiveRecord and Og bindings.

    > <snip awesome ideas for future direction>
    >
    > Wow, I'm completely astounded by the work you've done with ferret. You're a
    > one man coding machine. Especially considering the number of projects or
    > attempts to port lucene to C or other languages that have floundered. Might I
    > suggest you post this announcement/call for donations to the rails mailing
    > list? I think people might be *very* interested in your idea for an object
    > database built on Ferret with AR bindings. That would be an incredibly
    > exciting development, and hopefully some of the big Rails users will realise
    > that.
    >
    > Regards,
    >
    > Alex


    Thanks Alex. I did, in fact, announce this on the rails list as you
    suggested. I agree that it would be very useful for a lot of Rails
    developers, especially the way many are currently using relational
    databases with AR (ie no foreign key constraints, all access to the
    database through the model, one database per application). This
    definitely something I'm very keen to do and I will get around to it
    eventually, with or without support. It's more a matter of whether
    I'll be able to do it in the next 6 months or the next 5 years. :)

    Thanks again for your support.

    Dave
     
    David Balmain, Sep 21, 2006
    #5
  6. On Thursday 21 September 2006 15:48, David Balmain wrote:
    > Thanks Alex. I did, in fact, announce this on the rails list as you
    > suggested. I agree that it would be very useful for a lot of Rails
    > developers, especially the way many are currently using relational
    > databases with AR (ie no foreign key constraints, all access to the
    > database through the model, one database per application). This
    > definitely something I'm very keen to do and I will get around to it
    > eventually, with or without support. It's more a matter of whether
    > I'll be able to do it in the next 6 months or the next 5 years. :)


    I hadn't actually noticed your post was also sent to
    (should have checked). However, that address is
    now defunct as the RoR mailing list has moved to google groups.
    http://groups.google.com/group/rubyonrails-talk

    Your message seems to have made it to
    http://groups.google.com/group/railinglist - which is some how subscribed on
    the old list and seems to get a post every few days or so. I was trying to
    work out why my client hadn't picked up your message to the rails list....

    Regards,

    Alex
     
    A. S. Bradbury, Sep 21, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Balmain
    Replies:
    22
    Views:
    250
    David Balmain
    Oct 28, 2005
  2. David Balmain
    Replies:
    5
    Views:
    130
    David Balmain
    Nov 15, 2005
  3. David Balmain
    Replies:
    2
    Views:
    101
    David Balmain
    Dec 3, 2005
  4. David Balmain

    [ANN] Ferret 0.10.7 released

    David Balmain, Sep 24, 2006, in forum: Ruby
    Replies:
    0
    Views:
    101
    David Balmain
    Sep 24, 2006
  5. David Balmain

    [ANN] Ferret 0.11.0-rc1

    David Balmain, Feb 25, 2007, in forum: Ruby
    Replies:
    0
    Views:
    103
    David Balmain
    Feb 25, 2007
Loading...

Share This Page