Mechanize MySQL and threads - deadlock?

Discussion in 'Ruby' started by Marc Weber, Feb 26, 2010.

  1. Marc Weber

    Marc Weber Guest

    First of all: I'm still new to Ruby.

    So pointing me to documentation or books is fine.

    Use case:

    Use mechanize to gather information. Because there are many pages I'd
    like to run multiple threads each fetching pages. The fetched data
    should be written to a MySQL database.

    Can you point me to information telling me how to do this?

    The failure looks like this now:

    /pr/tasks/get_data_ruby/tasks.rb:364:in `join': deadlock detected (fatal)
    from /pr/tasks/get_data_ruby/tasks.rb:364:in `block in run_tasks_wait'
    from /pr/tasks/get_data_ruby/tasks.rb:364:in `each'
    from /pr/tasks/get_data_ruby/tasks.rb:364:in `run_tasks_wait'
    from get-data.rb:37:in `<mai

    What is causing such deadlocks at all?

    Details about my implementation:
    =================================
    Ruby version: ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux]
    sequel-3.8.0
    mysqlplus-0.1.1

    Because things always go wrong I'd like store state in database to
    resume work where the script failed.

    To keep things simple I tried giving each thread it's own agent and DB
    connection:


    def newDBConnection
    Sequel.connect(
    :adapter => 'mysql',
    :user => 'root',
    :host => 'localhost',
    :database => 'get_data',
    :password=>'XXX')
    end

    # share one agent and db connection per thread
    class MyThread < Thread
    def agent
    if !@agent
    @agent = Mechanize.new
    @agent.max_history =1
    end
    @agent
    end

    def db
    @dbCache ||= newDBConnection
    end
    end

    next I defined a task which reuses the db and Mechanize agent from the
    thread which is running the task:

    class Task
    def run
    # override
    @thread = Thread.current
    task
    end

    def agent
    @agent ||= @thread.agent
    end

    def db
    @dbCache ||= @thread.db
    end
    end



    Next I wrote a simple function taking a list of tasks and a thread class
    MyThread. it spawns parallel threads each getting a task from the task
    list (Queue). They all may add more tasks to the queue.
    The script should run until all tasks are done.

    # t: class extending Thread
    # tasks: type Queue.new
    # parallel: num of threads used to run those tasks
    def run_tasks_wait(t, tasks, parallel)
    working = 0
    threads = []
    # run 3 threads
    (1..parallel).each {|i|
    threads << t.new {
    firstTime = true
    while working > 0 || firstTime
    firstTime = false
    while task = tasks.pop
    working += 1
    $log.debug("starting task #{task.to_s}")
    $log.catchAndLog "caught exception in main worker thread" do
    task.run if !task.nil?
    end
    $log.debug("finished task #{task.to_s} threads-working: #{working}")
    working -= 1
    end
    # even if there is nothing left in queue keep thread running if there is one thread running
    # this thread may push additional tasks to the queue
    sleep 1
    end
    } }
    # wait for threads
    threads.each {|t| t.join() }
    end


    Thanks for any pointers
    Marc Weber
     
    Marc Weber, Feb 26, 2010
    #1
    1. Advertising

  2. Marc Weber

    Marc Weber Guest

    > # t: class extending Thread
    > # tasks: type Queue.new
    > # parallel: num of threads used to run those tasks
    > def run_tasks_wait(t, tasks, parallel)

    Replacing the Queue by an Array seems to fix the issue.

    Marc
     
    Marc Weber, Feb 26, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. lc
    Replies:
    0
    Views:
    448
  2. Michael Lim

    jboss/mysql cmp deadlock?

    Michael Lim, Sep 23, 2004, in forum: Java
    Replies:
    4
    Views:
    2,070
    Heiko W. Rupp
    Mar 17, 2005
  3. Richard Conroy
    Replies:
    3
    Views:
    162
    Richard Conroy
    Dec 12, 2006
  4. Andrew Arrow

    mysql gem reporting deadlock

    Andrew Arrow, Aug 17, 2007, in forum: Ruby
    Replies:
    0
    Views:
    101
    Andrew Arrow
    Aug 17, 2007
  5. Rod Dik
    Replies:
    6
    Views:
    127
    Luis Lavena
    Jun 20, 2009
Loading...

Share This Page