Writing to ferret index from multiple processes

Andreas S. · Dec 18, 2005

Hi,

what do I have to do to be able to write a ferret index from multiple
processes at the same time?

I was indexing a lot of documents with a script when another process
made a change to the index; suddenly all of the imported data was gone
from the index, and the import script quit with the exception
"Errno::ENOENT: No such file or directory - ./ferret_index/_1ah.fnm".

auto_flush => true didn't help. Is there something else?

Andreas

David Balmain · Dec 19, 2005

Hi Andreas,

Can you show me some more code? How are you creating the index?
Perhaps you are setting :create =3D> true in which case it will
overwrite the old index.

Dave

Andreas S. · Dec 19, 2005

David said:
Hi Andreas,

Can you show me some more code? How are you creating the index?
Perhaps you are setting :create => true in which case it will
overwrite the old index.

Dave

Oops. I am indeed using :create => true. I forgot that I set it because
create_if_missing did not work.

Sorry for the noise.

Andreas

David Balmain · Dec 19, 2005

I'm not to sure about this one. Are you by any chance explicitely
deleting the lock files when your app starts up? I've seen a few
people do that. The only way I can see doc numbers getting out of
order is if you delete the lock files. Any chance I could look at more
of your code? Is this for RForum? Perhaps I could check it out of svn.
Anyway, I hope I can help you out with this.

Dave

PS: If you are interested you should join the Ferret mailing list. You
seem to be doing some more advanced stuff judging from the bugs you're
finding. ;-)

Andreas S. · Dec 19, 2005

David said:
I'm not to sure about this one. Are you by any chance explicitely
deleting the lock files when your app starts up?
No.

I've seen a few
people do that. The only way I can see doc numbers getting out of
order is if you delete the lock files. Any chance I could look at more
of your code? Is this for RForum? Perhaps I could check it out of svn.

It is for RForum. You can see the the code here:
http://rforum.andreas-s.net/trac/file/trunk/app/models/search_ferret.rb

My indexing script simply fetches all the posts from the database and
calls Post.search_handler.update(post) for each one. If another process
calls the update method while this script is running, I am getting the
exception. If you need more information to reproduce the problem, please
let me know.

PS: If you are interested you should join the Ferret mailing list. You
seem to be doing some more advanced stuff judging from the bugs you're
finding. ;-)

I didn't know there was a list. I will definetely join it.

Thanks for fixing the other bugs so quickly.

Andreas

David Balmain · Dec 19, 2005

Hey Andreas,

The latest version of RForum still has :create =3D> true so I'm guessing
you haven't checked in your latest changes. Could you let me know when
you have?

Cheers,
Dave

Andreas S. · Dec 19, 2005

David said:
Hey Andreas,

The latest version of RForum still has :create => true so I'm guessing
you haven't checked in your latest changes. Could you let me know when
you have?

I have checked it in.

Andreas S. · Dec 19, 2005

Andreas said:
I have checked it in.

Btw, I tried it again on another machine, and couldn't reproduce the
"docs out of order" exception, but instead I got
RuntimeError: could not obtain lock:
/ferret_index/ferret-f62496686e637eca67e933a9cdc5eb21write.lock

David Balmain · Dec 20, 2005

Hi Andreas,

This is what I would expect to happen. What machine where you running
it on the first time. Whatever it was, Ferret's locking mechanism must
not work.

Anyway, to avoid this problem you need to make sure the batch process
doesn't keep the lock for too long (about 5 seconds). I would change
the rebuild index method to use an IndexWriter or switch auto_flush to
false. This should speed the reindexing up. I'd also add a pause in
there so other processes can get a hold of the lock if they need to.
Since you are flushing explicitly you may as well set auto_flush to
false anyway.

def index
@index ||=3D Index::Index.new

path =3D> @path,
#:auto_flush =3D>true <=3D don't use this a=
nymore
:default_search_field =3D> ['subject'],
:key =3D> ['id', 'class'])
end

# update will continue to work, handling the flushing explicitly
def update(post)
index << create_doc(post)
index.flush
end

# batch_update will keep the IndexWriter open between updates
# so it will run much faster
def batch_update(post)
index << create_doc(post)
end

# define a flush method for use with the batch_update method
def flush
index.flush
end

Then in your process that is doing the reindex I'd use the
batch_update method and I might even add some pauses in there.
Something like this;
MAX_ADDS_BEFORE_FLUSH =3D 10
def rebuild_index
i =3D 0
Post.find_all_by_deleted(0).each do |post|
self.update(post)
i +=3D 1
if (i % MAX_ADDS_BEFORE_FLUSH) =3D=3D 0
self.flush
sleep(0.5)
end
end
end

These are just ideas. You'll probably come up with something better. I
think the best solution is just to keep the Ferret index in sync with
the database so that you don't need to reindex everything.

Let me know what kind of system you were running it on the first time
to get the documents out of order error. I'll see if I can find out
why the locking wasn't working.

Cheers,
Dave

Communicating between processes	0	May 14, 2023
Errno::EEXIST File Exists error when installing 'ferret' gem fromlocal .gem file	0	Dec 22, 2010
Fuzzy searching using Ferret and KirbyBase?	1	Nov 26, 2005
Help with importing from multiple files and printing lines in designated spot to spit out one file.	1	Jan 16, 2023
[ANN] Ferret 0.10.6 released (and some benchmarks)	5	Sep 21, 2006
Can't execute php to delete multiple rows in database	3	May 14, 2023
[ANN] Ferret 0.2.1 (port of Apache Lucene to pure ruby)	5	Nov 14, 2005
Sqlite, multiple processes and blocking.	0	Sep 25, 2008

Writing to ferret index from multiple processes

Andreas S.

David Balmain

Andreas S.

David Balmain

Andreas S.

David Balmain

Andreas S.

Andreas S.

David Balmain

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads