Writing to ferret index from multiple processes

Discussion in 'Ruby' started by Andreas S., Dec 18, 2005.

  1. Andreas S.

    Andreas S. Guest

    Hi,

    what do I have to do to be able to write a ferret index from multiple
    processes at the same time?

    I was indexing a lot of documents with a script when another process
    made a change to the index; suddenly all of the imported data was gone
    from the index, and the import script quit with the exception
    "Errno::ENOENT: No such file or directory - ./ferret_index/_1ah.fnm".

    auto_flush => true didn't help. Is there something else?

    Andreas

    --
    Posted via http://www.ruby-forum.com/.
     
    Andreas S., Dec 18, 2005
    #1
    1. Advertising

  2. Hi Andreas,

    Can you show me some more code? How are you creating the index?
    Perhaps you are setting :create =3D> true in which case it will
    overwrite the old index.

    Dave

    On 12/19/05, Andreas S. <> wrote:
    > Hi,
    >
    > what do I have to do to be able to write a ferret index from multiple
    > processes at the same time?
    >
    > I was indexing a lot of documents with a script when another process
    > made a change to the index; suddenly all of the imported data was gone
    > from the index, and the import script quit with the exception
    > "Errno::ENOENT: No such file or directory - ./ferret_index/_1ah.fnm".
    >
    > auto_flush =3D> true didn't help. Is there something else?
    >
    > Andreas
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
     
    David Balmain, Dec 19, 2005
    #2
    1. Advertising

  3. Andreas S.

    Andreas S. Guest

    David Balmain wrote:
    > Hi Andreas,
    >
    > Can you show me some more code? How are you creating the index?
    > Perhaps you are setting :create => true in which case it will
    > overwrite the old index.
    >
    > Dave


    Oops. I am indeed using :create => true. I forgot that I set it because
    create_if_missing did not work.

    Sorry for the noise.

    Andreas

    --
    Posted via http://www.ruby-forum.com/.
     
    Andreas S., Dec 19, 2005
    #3
  4. I'm not to sure about this one. Are you by any chance explicitely
    deleting the lock files when your app starts up? I've seen a few
    people do that. The only way I can see doc numbers getting out of
    order is if you delete the lock files. Any chance I could look at more
    of your code? Is this for RForum? Perhaps I could check it out of svn.
    Anyway, I hope I can help you out with this.

    Dave

    PS: If you are interested you should join the Ferret mailing list. You
    seem to be doing some more advanced stuff judging from the bugs you're
    finding. ;-)

    On 12/19/05, Andreas S. <> wrote:
    > David Balmain wrote:
    > > Hi Andreas,
    > >
    > > Can you show me some more code? How are you creating the index?
    > > Perhaps you are setting :create =3D> true in which case it will
    > > overwrite the old index.
    > >
    > > Dave

    >
    > Oops. I am indeed using :create =3D> true. I forgot that I set it because
    > create_if_missing did not work.
    >
    > I removed it, but now there is a different problem. When I change the
    > index while the indexing script is running, it quits, but with another
    > error message:
    >
    > 316
    > 317
    > 318
    > RuntimeError: docs out of order curent doc =3D 9 and previous doc =3D 17
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_m=

    erger.rb:276:in
    > `append_postings'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_m=

    erger.rb:262:in
    > `append_postings'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_m=

    erger.rb:240:in
    > `merge_term_info'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_m=

    erger.rb:215:in
    > `merge_term_infos'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_m=

    erger.rb:176:in
    > `merge_terms'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_m=

    erger.rb:48:in
    > `merge'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_wri=

    ter.rb:403:in
    > `merge_segments'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_wri=

    ter.rb:371:in
    > `maybe_merge_segments'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_wri=

    ter.rb:161:in
    > `add_document'
    > from /usr/local/lib/ruby/1.8/monitor.rb:229:in `synchronize'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_wri=

    ter.rb:159:in
    > `add_document'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index.rb:=

    270:in
    > `<<'
    > from /usr/local/lib/ruby/1.8/monitor.rb:229:in `synchronize'
    > from
    > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index.rb:=

    238:in
    > `<<'
    > from ./app/models/search_ferret.rb:38:in `update'
    > from (irb):1
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
     
    David Balmain, Dec 19, 2005
    #4
  5. Andreas S.

    Andreas S. Guest

    David Balmain wrote:
    > I'm not to sure about this one. Are you by any chance explicitely
    > deleting the lock files when your app starts up?


    No.

    > I've seen a few
    > people do that. The only way I can see doc numbers getting out of
    > order is if you delete the lock files. Any chance I could look at more
    > of your code? Is this for RForum? Perhaps I could check it out of svn.


    It is for RForum. You can see the the code here:
    http://rforum.andreas-s.net/trac/file/trunk/app/models/search_ferret.rb

    My indexing script simply fetches all the posts from the database and
    calls Post.search_handler.update(post) for each one. If another process
    calls the update method while this script is running, I am getting the
    exception. If you need more information to reproduce the problem, please
    let me know.

    > PS: If you are interested you should join the Ferret mailing list. You
    > seem to be doing some more advanced stuff judging from the bugs you're
    > finding. ;-)


    I didn't know there was a list. I will definetely join it.

    Thanks for fixing the other bugs so quickly.

    Andreas

    --
    Posted via http://www.ruby-forum.com/.
     
    Andreas S., Dec 19, 2005
    #5
  6. Hey Andreas,

    The latest version of RForum still has :create =3D> true so I'm guessing
    you haven't checked in your latest changes. Could you let me know when
    you have?

    Cheers,
    Dave

    On 12/19/05, Andreas S. <> wrote:
    > David Balmain wrote:
    > > I'm not to sure about this one. Are you by any chance explicitely
    > > deleting the lock files when your app starts up?

    >
    > No.
    >
    > > I've seen a few
    > > people do that. The only way I can see doc numbers getting out of
    > > order is if you delete the lock files. Any chance I could look at more
    > > of your code? Is this for RForum? Perhaps I could check it out of svn.

    >
    > It is for RForum. You can see the the code here:
    > http://rforum.andreas-s.net/trac/file/trunk/app/models/search_ferret.rb
    >
    > My indexing script simply fetches all the posts from the database and
    > calls Post.search_handler.update(post) for each one. If another process
    > calls the update method while this script is running, I am getting the
    > exception. If you need more information to reproduce the problem, please
    > let me know.
    >
    > > PS: If you are interested you should join the Ferret mailing list. You
    > > seem to be doing some more advanced stuff judging from the bugs you're
    > > finding. ;-)

    >
    > I didn't know there was a list. I will definetely join it.
    >
    > Thanks for fixing the other bugs so quickly.
    >
    > Andreas
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
     
    David Balmain, Dec 19, 2005
    #6
  7. Andreas S.

    Andreas S. Guest

    David Balmain wrote:
    > Hey Andreas,
    >
    > The latest version of RForum still has :create => true so I'm guessing
    > you haven't checked in your latest changes. Could you let me know when
    > you have?


    I have checked it in.

    --
    Posted via http://www.ruby-forum.com/.
     
    Andreas S., Dec 19, 2005
    #7
  8. Andreas S.

    Andreas S. Guest

    Andreas S. wrote:
    > David Balmain wrote:
    >> Hey Andreas,
    >>
    >> The latest version of RForum still has :create => true so I'm guessing
    >> you haven't checked in your latest changes. Could you let me know when
    >> you have?

    >
    > I have checked it in.


    Btw, I tried it again on another machine, and couldn't reproduce the
    "docs out of order" exception, but instead I got
    RuntimeError: could not obtain lock:
    /ferret_index/ferret-f62496686e637eca67e933a9cdc5eb21write.lock

    --
    Posted via http://www.ruby-forum.com/.
     
    Andreas S., Dec 19, 2005
    #8
  9. Hi Andreas,

    This is what I would expect to happen. What machine where you running
    it on the first time. Whatever it was, Ferret's locking mechanism must
    not work.

    Anyway, to avoid this problem you need to make sure the batch process
    doesn't keep the lock for too long (about 5 seconds). I would change
    the rebuild index method to use an IndexWriter or switch auto_flush to
    false. This should speed the reindexing up. I'd also add a pause in
    there so other processes can get a hold of the lock if they need to.
    Since you are flushing explicitly you may as well set auto_flush to
    false anyway.

    def index
    @index ||=3D Index::Index.new:)path =3D> @path,
    #:auto_flush =3D>true <=3D don't use this a=
    nymore
    :default_search_field =3D> ['subject'],
    :key =3D> ['id', 'class'])
    end

    # update will continue to work, handling the flushing explicitly
    def update(post)
    index << create_doc(post)
    index.flush
    end

    # batch_update will keep the IndexWriter open between updates
    # so it will run much faster
    def batch_update(post)
    index << create_doc(post)
    end

    # define a flush method for use with the batch_update method
    def flush
    index.flush
    end

    Then in your process that is doing the reindex I'd use the
    batch_update method and I might even add some pauses in there.
    Something like this;
    MAX_ADDS_BEFORE_FLUSH =3D 10
    def rebuild_index
    i =3D 0
    Post.find_all_by_deleted(0).each do |post|
    self.update(post)
    i +=3D 1
    if (i % MAX_ADDS_BEFORE_FLUSH) =3D=3D 0
    self.flush
    sleep(0.5)
    end
    end
    end

    These are just ideas. You'll probably come up with something better. I
    think the best solution is just to keep the Ferret index in sync with
    the database so that you don't need to reindex everything.

    Let me know what kind of system you were running it on the first time
    to get the documents out of order error. I'll see if I can find out
    why the locking wasn't working.

    Cheers,
    Dave

    On 12/20/05, Andreas S. <> wrote:
    > Andreas S. wrote:
    > > David Balmain wrote:
    > >> Hey Andreas,
    > >>
    > >> The latest version of RForum still has :create =3D> true so I'm guessi=

    ng
    > >> you haven't checked in your latest changes. Could you let me know when
    > >> you have?

    > >
    > > I have checked it in.

    >
    > Btw, I tried it again on another machine, and couldn't reproduce the
    > "docs out of order" exception, but instead I got
    > RuntimeError: could not obtain lock:
    > ./ferret_index/ferret-f62496686e637eca67e933a9cdc5eb21write.lock
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
     
    David Balmain, Dec 20, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Balmain
    Replies:
    22
    Views:
    237
    David Balmain
    Oct 28, 2005
  2. David Balmain
    Replies:
    5
    Views:
    119
    David Balmain
    Nov 15, 2005
  3. jennyw
    Replies:
    1
    Views:
    221
    David Balmain
    Nov 27, 2005
  4. John Pritchard-williams

    Trying to open a Lucene-built index with Ferret...

    John Pritchard-williams, Nov 2, 2008, in forum: Ruby
    Replies:
    4
    Views:
    114
    Hugh Sasse
    Nov 3, 2008
  5. Tomasz Chmielewski

    sorting index-15, index-9, index-110 "the human way"?

    Tomasz Chmielewski, Mar 4, 2008, in forum: Perl Misc
    Replies:
    4
    Views:
    298
    Tomasz Chmielewski
    Mar 4, 2008
Loading...

Share This Page