[ANN] Ferret 0.2.1 (port of Apache Lucene to pure ruby)

D

David Balmain

Hi Folks,

I've just released version 0.2.1. Since my last announcement there
have been quit a few changes, mostly to the Index::Index interface. We
also have a great new logo thanks to Jan Prill. You can check it all
out here;

http://ferret.davebalmain.com/trac/

Dave Balmain

=3D=3D Description

Ferret is a full port of the Java Lucene searching and indexing
library. It's available as a gem so try it out! To get started quickly
read the quick start at the project homepage;

http://ferret.davebalmain.com/api
http://ferret.davebalmain.com/api/files/TUTORIAL.html

=3D=3D Changes

=3D=3D=3D Multifield searches

You can now do multi field searches using the query parser.

# search the title and content fields for ruby
index.search_each("title|content:ruby") {|doc, score| puts
"#{doc}:#{score}"}

# search all fields for ruby
index.search_each("*:ruby") {|doc, score| puts "#{doc}:#{score}"}

=3D=3D=3D Compound file support and Apache Lucene index reading

You can now store your index in compound files which reduces the
number of files used by the index. This is useful as your index gets
bigger to prevent a too many files open index. It is also handy for
reading Apache Lucene indexes as Apache Lucene uses compound file
format by default.

=3D=3D=3D Merging indexes

You can now merge two or more existing indexes into one. The is useful
if you want to have indexers working in parallel to create your index
and then merge all the indexes together create one final index.

# add indexes 1 to 10 to the final index
index.add_indexes([index1, index2, ... , index10])

=3D=3D=3D Persisting in Memory index.

You can gain a little in performance by using an in memory index for
your indexing and then persisting it to your file system when you are
finished.

index =3D Index::Index.new()

# do all your indexing

index.persist("/path/to/your/index/directory")

=3D=3D=3D Thread safety

Ferret is now threadsafe so feel safe to use it in a multithreaded
environment. Check out the thread tests in the test/functional
directory in the latest distribution.

=3D=3D=3D Easy update and delete

You can now use a query to do a delete;

index.query_delete("content:java or content:perl")

And you can now easily update documents;

index.update(34, doc)
index.query_update('author:"David Balmain"', {:author =3D> "Dave Balmai=
n"})

=3D=3D=3D Primary Key

The latest addition is a primary key to the index. Note that this only
works through the Index::Index class and should only be used if you
know what you are doing.

index =3D Index::Index.new:)key =3D> ["id", "table"])
index << {:id =3D> 1123, :table =3D> "product", :product =3D "Jacket"}
# ...
# The following will replace the Jacket product with a t-shirt
index << {:id =3D> 1123, :table =3D> "product", :product =3D "T-Shirt"}



Have fun and let me know what you think.
 
S

Sascha Ebach

David said:
Have fun and let me know what you think.

Thank you for this awesome library. I just wanted to tell you that you work
is much appreciated. I don't actually use it right now, but I most
certainly will in the future. Having such a nice and powerful search engine
is really beneficial for Ruby, too, I think.

Sascha Ebach
 
K

Kris

Does it support indexing PDFs, Docs and PPT files? If I remember
correctly this feature is provided in Java Lucene via a project called
Jakarta POI. It is not a big deal since you already started the ball
rolling and someone might add these features in time. Kudos to your
efforts.
 
D

David Balmain

Hi Kris,

If you want to index these you'll need to write (or acquire) specific
analyzers for the document type. That's how it works in Lucene too.
One solution may be to index the documents with Lucene and use Ferret
to search the indexes.

Cheers,
Dave
 
J

jallen

I'm really excited about this library. However, after testing it out
I'm a little puzzled by the behavior. To test it out I added about 20
documents, each containing the same 5 fields (with different field
values in each doc). When I then try to query, the results I get seem
random - that is, they don't always return documents that I'd expect
should be matching. Example:

doc = Document.new
....
doc << Field.new("name", "foobar", Field::Store::NO,
Field::Index::UNTOKENIZED)
....
index << doc

Now when i call search_each("foobar"), i dont see a result (with some i
do, others i don't). However, if I call search_each("foobar~"), then it
seems to reliably return the expected matches. Any tips?


-jay
(running ruby 1.8.2 on OS X 10.3.8)
 
D

David Balmain

Hi Jay,

You've got me puzzled too. Would it be possible for you to send me a
full example of this strange behaviour. It's possible that it only
happens on OS X. :( I really need to get my hands on a Mac for a day
because there seems to be a few problems with that environment.
Hopefully we'll have this all sorted out soon.

Thanks,
Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top