Help with Excel (win32ole) and threads!?

Discussion in 'Ruby' started by Alex Ciarlillo, Mar 23, 2007.

  1. I have an application which is supposed to create a set of excel
    documents to display sales information for dining areas. The core of it
    is a class called SaleReport which is initialized with a sqlite database
    connection. The main function takes an excel worksheet object, a row
    number and some info about the location and queries the database to
    populate the excel sheet. Since none of the locations rely on each
    others data, I thought it would be pretty useful to thread that part, so
    that each location report is run in its own thread. The problem is, even
    though this seems to work, it has not improved performance at all and I
    am not sure where the bottleneck is. Here are my theories and the
    example code is at the bottom:

    1) My first theory was that using a single excel application instance
    was blocking the threads so that only one could have access at a time,
    but now I rewrote it to use multiple excel instances and still no dice.

    2) The connection to the database is limiting access to a single thread
    at a time. This shouldnt be the case since each instance of the
    SaleReport class gets it own connection, and SQLite is threadsafe.

    3) I'm flat out using the threads incorrectly.

    The example code:
    #################################################################################
    require 'win32ole'
    require 'SaleReport.rb'

    startTime = Time.now

    excel_1 = WIN32OLE::new('excel.Application')
    WIN32OLE.const_load(excel_1)
    excel_2 = WIN32OLE::new('excel.Application')

    book1 = excel_1.Workbooks.add
    book2 = excel_2.Workbooks.add

    books = {'book1' => book1, 'book2' => book2}

    locations = [ {'name' => 'location1', 'num' => 841},
    {'name' => 'location2', 'num' => 842},
    {'name' => 'location3', 'num' => 843} ]

    books.each do |name, book|
    puts "Working on #{name}."

    threads = []
    locations.each do |loc|
    threads << Thread.new(loc, name, book) { |l, n, b|
    row = 1
    worksheet = b.Worksheets(1)
    sales = SaleReport.new
    puts "Starting thread - #{l} || #{n} || #{b}"
    sales.outputLocation(ws, row, l['num'], l['name'], 2007)
    }
    end
    puts "Joining..."
    threads.each { |t| t.join }

    book.SaveAs "#{name}.xls"
    book.Close(1)
    end
    excel_1.quit
    excel_2.quit
    excel_1 = excel_2 = nil
    GC.start #garbage collect

    endTime = Time.now

    puts "Time to run: #{endTime-startTime} seconds."
    ###############################################################################

    Running this with 2 locations without any threading gave a running time
    of ~240 seconds. When I added the threading it was almost exactly the
    same. The running time increases linearly with each location I add even
    with multiple threads.

    Thanks,
    -Alex

    --
    Posted via http://www.ruby-forum.com/.
     
    Alex Ciarlillo, Mar 23, 2007
    #1
    1. Advertising

  2. Alex Ciarlillo

    Guest

    On Fri, 23 Mar 2007, Alex Ciarlillo wrote:

    > I have an application which is supposed to create a set of excel
    > documents to display sales information for dining areas. The core of it
    > is a class called SaleReport which is initialized with a sqlite database
    > connection. The main function takes an excel worksheet object, a row
    > number and some info about the location and queries the database to
    > populate the excel sheet. Since none of the locations rely on each
    > others data, I thought it would be pretty useful to thread that part, so
    > that each location report is run in its own thread. The problem is, even
    > though this seems to work, it has not improved performance at all and I
    > am not sure where the bottleneck is. Here are my theories and the
    > example code is at the bottom:
    >
    > 1) My first theory was that using a single excel application instance
    > was blocking the threads so that only one could have access at a time,
    > but now I rewrote it to use multiple excel instances and still no dice.
    >
    > 2) The connection to the database is limiting access to a single thread
    > at a time. This shouldnt be the case since each instance of the
    > SaleReport class gets it own connection, and SQLite is threadsafe.


    sqlite is threadsafe, but supports access by only one thread at a time. eg
    it's not concurrent at the c level. the only level of concurancy sqlite
    provides is at the process level.

    > 3) I'm flat out using the threads incorrectly.


    it's easy to do on windows - anything which blocks one thread as the os level
    will block all threads. this is suprisingly easy to do. your code looks
    fine. i'm not on windows, but if i were you i'd write some code that proves
    to myself that concurent access to an excel doc by threads does not end up
    blocking the whole process as i suspect it does. same goes for your
    SaleReport object.

    cheers.

    -a
    --
    be kind whenever possible... it is always possible.
    - the dalai lama
     
    , Mar 23, 2007
    #2
    1. Advertising

  3. Alex Ciarlillo

    Jano Svitok Guest

    On 3/23/07, <> wrote:
    > On Fri, 23 Mar 2007, Alex Ciarlillo wrote:
    >
    > > I have an application which is supposed to create a set of excel
    > > documents to display sales information for dining areas. The core of it
    > > is a class called SaleReport which is initialized with a sqlite database
    > > connection. The main function takes an excel worksheet object, a row
    > > number and some info about the location and queries the database to
    > > populate the excel sheet. Since none of the locations rely on each
    > > others data, I thought it would be pretty useful to thread that part, so
    > > that each location report is run in its own thread. The problem is, even
    > > though this seems to work, it has not improved performance at all and I
    > > am not sure where the bottleneck is. Here are my theories and the
    > > example code is at the bottom:
    > >
    > > 1) My first theory was that using a single excel application instance
    > > was blocking the threads so that only one could have access at a time,
    > > but now I rewrote it to use multiple excel instances and still no dice.
    > >
    > > 2) The connection to the database is limiting access to a single thread
    > > at a time. This shouldnt be the case since each instance of the
    > > SaleReport class gets it own connection, and SQLite is threadsafe.

    >
    > sqlite is threadsafe, but supports access by only one thread at a time. eg
    > it's not concurrent at the c level. the only level of concurancy sqlite
    > provides is at the process level.
    >
    > > 3) I'm flat out using the threads incorrectly.

    >
    > it's easy to do on windows - anything which blocks one thread as the os level
    > will block all threads. this is suprisingly easy to do. your code looks
    > fine. i'm not on windows, but if i were you i'd write some code that proves
    > to myself that concurent access to an excel doc by threads does not end up
    > blocking the whole process as i suspect it does. same goes for your
    > SaleReport object.


    I suppose that as well. As ruby threads are only interpreter threads,
    I assume that each call to OLE blocks the entire interpreter.
    Therefore it should not make any difference if you call it in threads
    or not. Threads may be even slower, due to more overhead. You should
    be able to check this by printing something to screen repeatedly in
    one thread (remember setting $stdout.sync=true), and doing a long OLE
    operation in another. The hyphothesis is that the printing will stop
    while OLE is running.

    To parallelize this you'd probably use more processes, either using
    Win32::process from win32utils or manually spawning some worker
    processes and communicating with the main process using drb or
    similar.
     
    Jano Svitok, Mar 24, 2007
    #3
  4. Jano Svitok wrote:
    > To parallelize this you'd probably use more processes, either using
    > Win32::process from win32utils or manually spawning some worker
    > processes and communicating with the main process using drb or
    > similar.


    I did a test in the way you suggested and the WIN32OLE is blocking I'm
    pretty sure. I don't think its blocking continuously until the thread is
    finished, but when I output 1000 rows of data to excel in one thread and
    print the numbers 1-50 to the screen in another, there is a noticeable
    pause between each number being printed (as opposed to them printing
    almost instantly w/o win32ole). I figure this means that the OLE is
    blocking on each output call as you suggested.

    I did figured out how to spawn the new process with win32-process and
    found out about Marshaling. I figured I could just create a process so
    that it takes a Worksheet object (marshaled from a file) and does all
    the output to that sheet. The only problem is WIN32OLE objects have no
    marshal_dump method defined, so I have no way to pass the sheets around
    to different processes. I did not try DRb but I figured it would have
    the same problem in this case. I could write my own marshal_dump but I
    doubt that writing such a method for a win32ole object is within my
    scope of knowledge.

    I have pretty much eliminated SQLite as the reason for the threads not
    running in parallel since I read up more about read/write locks and how
    they are used in sqlite3. All my database connections are reading and
    should be able to do so concurrently with no problems.

    So anymore ideas on how to parallelize excel output over multiple
    worksheets?

    Thanks for all the help!
    -Alex

    --
    Posted via http://www.ruby-forum.com/.
     
    Alex Ciarlillo, Mar 28, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Masaki Suketa
    Replies:
    4
    Views:
    393
    Dave Burt
    Mar 27, 2006
  2. Li Chen

    win32ole and excel and copy

    Li Chen, Nov 8, 2006, in forum: Ruby
    Replies:
    2
    Views:
    149
    Li Chen
    Nov 8, 2006
  3. Li Chen
    Replies:
    3
    Views:
    106
    Li Chen
    Nov 12, 2006
  4. zxem
    Replies:
    1
    Views:
    246
  5. Mandeep Baruah
    Replies:
    0
    Views:
    136
    Mandeep Baruah
    Feb 6, 2009
Loading...

Share This Page