Process or Thread?

Discussion in 'Ruby' started by Pito Salas, Aug 22, 2009.

  1. Pito Salas

    Pito Salas Guest

    I have a parent application (which I think of as a test harness) that
    wants to invoke a fairly intensive image processing application against
    a directory full of image files. Each image is processed independently.

    So, to get performance, I wanted to get the work happening on each of
    those images in parallel. So I could divide the files in the directory
    into two sets, and submit one set for processing in one process/thread
    and the other set in another process/thread. Note that the
    sub-process/threads are almost totally separate from the parent app, so
    relatively little information needs to go back and forth.

    Here is what I've learned so far from reading two books and lots of
    googling:

    One point is that there's no process support on Windows, which isn't a
    deal killer for me.

    Another point is the operation on multi-core CPUs: processes will, and
    threads will not use the mutliple cores. This too is fairly "don't care"
    for me.

    I am interested in ease of implementation and debugging. And I am also
    very interested in getting the cpu and disk active at the same time as
    there is a fairly large amount of data to be read form the disk.

    What are your recommendations?
    --
    Posted via http://www.ruby-forum.com/.
    Pito Salas, Aug 22, 2009
    #1
    1. Advertising

  2. Pito Salas

    7stud -- Guest

    Pito Salas wrote:
    > I have a parent application (which I think of as a test harness) that
    > wants to invoke a fairly intensive image processing application against
    > a directory full of image files. Each image is processed independently.
    >


    It doesn't sound like your situation will result in improved performance
    with threads. Things don't actually get done at the same time with
    threads--that's an illusion. What happens is that there is very fast
    switching between different tasks. However, if your tasks do not have
    dead time during the processing, then using threads won't improve
    performance. For instance, suppose you have two tasks that each take 3
    minutes to complete. The processing might happen in this order with
    threads:

    task1: 1 minute
    task2: 1 minute
    task1: 1 minute
    task2: 1 minute
    task1: 1 minute
    task2: 1 minute
    --------------
    total = 6 minutes

    But if you just ran each task sequentially without using threads, the
    total time would also be 6 minutes. Using threads will only speed up
    processing time if your tasks have idle time when they are doing
    nothing. During that down time, if you switch to another task in
    another thread, then total processing time will be lower.




    --
    Posted via http://www.ruby-forum.com/.
    7stud --, Aug 22, 2009
    #2
    1. Advertising

  3. Pito Salas

    Mario Camou Guest

    On Sat, Aug 22, 2009 at 16:47, 7stud -- <> wrote:

    > Pito Salas wrote:
    >

    It doesn't sound like your situation will result in improved performance
    > with threads. Things don't actually get done at the same time with
    > threads--that's an illusion. What happens is that there is very fast
    > switching between different tasks. However, if your tasks do not have
    > dead time during the processing, then using threads won't improve
    > performance. For instance, suppose you have two tasks that each take 3
    > minutes to complete. The processing might happen in this order with
    > threads:
    >
    > task1: 1 minute
    > task2: 1 minute
    > task1: 1 minute
    > task2: 1 minute
    > task1: 1 minute
    > task2: 1 minute
    > --------------
    > total =3D 6 minutes
    >


    That's true if you=B4re running MRI, since it uses "green" threads (i.e., =
    you
    really have a single OS-level thread that gets task-switched by Ruby
    itself). However, if you run on JRuby, the Ruby Thread support gets mapped
    onto the Java Thread support, which *does* map to OS-level threads and
    therefore will take advantage of multiple cores if you have them. In that
    case you *would* get faster processing.

    Hope this helps,
    -Mario.
    Mario Camou, Aug 22, 2009
    #3
  4. Pito Salas

    Gary Wright Guest

    On Aug 22, 2009, at 1:43 PM, Mario Camou wrote:
    >
    > That's true if you=B4re running MRI, since it uses "green" threads =20=


    > (i.e., you
    > really have a single OS-level thread that gets task-switched by Ruby
    > itself). However, if you run on JRuby, the Ruby Thread support gets =20=


    > mapped
    > onto the Java Thread support, which *does* map to OS-level threads and
    > therefore will take advantage of multiple cores if you have them. In =20=


    > that
    > case you *would* get faster processing.


    For a CPU intensive task (image processing), i doubt that two OS
    threads running on two core's is going to be any more efficient
    than two processes running on two cores. Multi-threading introduces
    complications that are neatly avoided by using multiple processes.
    I'd much rather deal with a multi-process architecture than a
    multi-threaded architecture.

    Gary Wright
    Gary Wright, Aug 22, 2009
    #4
  5. Pito Salas

    Pito Salas Guest

    Gary Wright wrote:
    > For a CPU intensive task (image processing), i doubt that two OS
    > threads running on two core's is going to be any more efficient
    > than two processes running on two cores. Multi-threading introduces
    > complications that are neatly avoided by using multiple processes.
    > I'd much rather deal with a multi-process architecture than a
    > multi-threaded architecture.
    >
    > Gary Wright


    Thanks all for your responses.

    A note: he files being processed are quite large and numerous. So
    there's also plenty of file IO that has to happen. In the vanilla 'green
    thread' case, would you expect performance improvements, because while
    one thread was blocked for IO the other one could run?

    Thanks again,

    Pito



    --
    Posted via http://www.ruby-forum.com/.
    Pito Salas, Aug 22, 2009
    #5
  6. Pito Salas

    Kent Friis Guest

    Den Sat, 22 Aug 2009 09:30:36 -0500 skrev Pito Salas:
    > I have a parent application (which I think of as a test harness) that
    > wants to invoke a fairly intensive image processing application against
    > a directory full of image files. Each image is processed independently.
    >
    > So, to get performance, I wanted to get the work happening on each of
    > those images in parallel. So I could divide the files in the directory
    > into two sets, and submit one set for processing in one process/thread
    > and the other set in another process/thread. Note that the
    > sub-process/threads are almost totally separate from the parent app, so
    > relatively little information needs to go back and forth.
    >
    > Here is what I've learned so far from reading two books and lots of
    > googling:
    >
    > One point is that there's no process support on Windows, which isn't a
    > deal killer for me.


    Not quite. Look in Task Manager, there is a list of processes running.
    What Windows possibly lacks is fork(), the unix way of creating
    processes. It does however have CreateProcess (I think that's what
    is called), which behaves like fork+exec.

    If you split the "controller" process and the "worker" process into
    two different programs, it won't be a problem. If you insist on
    having them as one program, you'll need to do a bit more work
    (add a comamnd line argument telling the new process that it's a
    worker process).

    > Another point is the operation on multi-core CPUs: processes will, and
    > threads will not use the mutliple cores. This too is fairly "don't care"
    > for me.


    Native threads will, Ruby green threads won't.

    > I am interested in ease of implementation and debugging.


    Debugging is lots easier with processes, as one process cannot
    accidentally overwrite data of another (shared memory is possible,
    but needs to be allocated explicitly).

    That may not be as big a problem with Ruby green threads, as the
    runtime knows what each thread is up to.

    > And I am also
    > very interested in getting the cpu and disk active at the same time as
    > there is a fairly large amount of data to be read form the disk.
    >
    > What are your recommendations?


    I would go for processes. But that's coming from C, where there is no
    runtime keeping track of what each thread is doing. With processes,
    the OS will prevent one OS from overwriting the data of another.

    /Kent
    --
    "The Brothers are History"
    Kent Friis, Aug 23, 2009
    #6
  7. Pito Salas

    Gary Wright Guest

    On Aug 22, 2009, at 5:02 PM, Pito Salas wrote:
    >
    > A note: he files being processed are quite large and numerous. So
    > there's also plenty of file IO that has to happen. In the vanilla
    > 'green
    > thread' case, would you expect performance improvements, because while
    > one thread was blocked for IO the other one could run?


    Whether you use threads or processes your CPU-bound tasks will run while
    your IO-bound tasks are waiting for the disk.

    Gary Wright
    Gary Wright, Aug 23, 2009
    #7
  8. On 23.08.2009 01:28, Kent Friis wrote:

    >> I am interested in ease of implementation and debugging.

    >
    > Debugging is lots easier with processes, as one process cannot
    > accidentally overwrite data of another (shared memory is possible,
    > but needs to be allocated explicitly).


    IMHO a multitude of processes does not necessarily ease debugging. If
    you need to find out which process is running berserk or exhibiting a
    bug that may be more difficult than debugging of a single interpreter
    process. Also, if there are communication issues between two processes
    that may be difficult to debug as well.

    Having said that, both approaches are pretty easy to implement, given
    that DRb is a full fledged remote object call feature (similar to RMI
    and CORBA).

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Aug 23, 2009
    #8
  9. On Sat, Aug 22, 2009 at 2:07 PM, Gary Wright<> wrote:
    > For a CPU intensive task (image processing), i doubt that two OS
    > threads running on two core's is going to be any more efficient
    > than two processes running on two cores. =C2=A0Multi-threading introduces
    > complications that are neatly avoided by using multiple processes.
    > I'd much rather deal with a multi-process architecture than a
    > multi-threaded architecture.


    You're correct, if the processes don't talk to each other. But if you
    have to pass information across processes, things suddenly get a lot
    more tangled and IPC-bound than with threads. It's a tradeoff, as
    always.

    - Charlie
    Charles Oliver Nutter, Sep 3, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rithesh Pai
    Replies:
    1
    Views:
    5,596
  2. walala
    Replies:
    3
    Views:
    734
    Allan Herriman
    Sep 9, 2003
  3. rtm
    Replies:
    0
    Views:
    797
  4. Jerry
    Replies:
    4
    Views:
    5,968
    Marina
    Dec 15, 2003
  5. Michael Johnson Sr.

    System.Diagnostic.Process and disfunctional process

    Michael Johnson Sr., Feb 17, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    1,953
    Michael Johnson Sr.
    Feb 17, 2004
Loading...

Share This Page