Multiple threads

Discussion in 'Python' started by Eduardo Oliva, Nov 16, 2011.

  1. Hello, I have a py script that reads for all "m2ts" video files and convert them to "mpeg" using ffmpeg with command line.

    What I want to do is:

    I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
    I know that Semaphores would help with that.
    But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.

    Any help would be great.

    Thank you in advance
    Eduardo Oliva, Nov 16, 2011
    #1
    1. Advertising

  2. On Thu, Nov 17, 2011 at 12:48 AM, Eduardo Oliva <> wrote:
    > Hello, I have a py script that reads for all "m2ts" video files and convert them to "mpeg" using ffmpeg with command line.
    >
    > What I want to do is:
    >
    >  I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
    >  I know that Semaphores would help with that.
    >  But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.


    First off, it's better in CPython (the most popular Python) to use
    multiple processes than multiple threads. That aside, what you're
    looking at is a pretty common model - a large number of tasks being
    served by a pool of workers.

    Have a look at the multiprocessing module, specifically Pool:
    Version 2: http://docs.python.org/library/multiprocessing.html
    Version 3: http://docs.python.org/py3k/library/multiprocessing.html

    Should be fairly straightforward.

    ChrisA
    Chris Angelico, Nov 16, 2011
    #2
    1. Advertising

  3. Eduardo Oliva

    Henrik Faber Guest

    On 16.11.2011 14:48, Eduardo Oliva wrote:

    > I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
    > I know that Semaphores would help with that.
    > But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.


    Absolute standard request, has nothing to do with Python. The way to go
    (in Cish pseudocode) is:

    thread() {
    /* do work */
    [...]

    /* finished! */
    semaphore++;
    }

    semaphore = 2
    while (jobs) {
    semaphore--; // will block if pool exhausted
    thread();
    }

    // in the end, collect remaining two workers
    semaphore -= 2 // will block until all are finished


    Best regards,
    Henrik
    Henrik Faber, Nov 16, 2011
    #3
  4. Am 16.11.2011 14:48 schrieb Eduardo Oliva:
    > Hello, I have a py script that reads for all "m2ts" video files and convert them to "mpeg" using ffmpeg with command line.
    >
    > What I want to do is:
    >
    > I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
    > I know that Semaphores would help with that.
    > But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.
    >
    > Any help would be great.


    I'm not sure if you need threads at all: if you launch a process with
    subprocess, it runs and you only would have to wait() for it. The same
    can be done with two processes.

    Pseudocode:

    LIMIT = 2

    processes = []


    def do_waiting(limit):
    while len(processes) >= limit:
    % take the first one...
    sp = processes.pop(0)
    % wait for it...
    st = sp.wait(100)
    if is None:
    % timeout, not finished yet, push back.
    processes.append(sp)
    else:
    % finished - don't push back, let outer for loop continue.
    print sp, "has finished with", st

    for fname in list:
    % launch process ...
    sp = subprocess.Popen(...)
    % ... and register it.
    processes.append(sp)
    % If we are on the limit, wait for process to finish.
    do_waiting(LIMIT)

    do_waiting(1)


    Thomas
    Thomas Rachel, Nov 16, 2011
    #4
  5. Eduardo Oliva

    Dave Angel Guest

    On 11/16/2011 12:00 PM, Jack Keegan wrote:
    > Hi Chris,
    >
    > On Wed, Nov 16, 2011 at 1:55 PM, Chris Angelico<> wrote:
    >
    >> First off, it's better in CPython (the most popular Python) to use
    >> multiple processes than multiple threads.

    >
    > I had been looking into treads and process/subprocess myself a while ago
    > and couldn't decide which would suit what I needed to do best. I'm still
    > very confused about the whole thing. Can you elaborate on the above a bit
    > please?
    >
    > Cheers,
    >
    > Jack

    Threads and processes are a concept that exists in your operating
    system, and Python can use either of them to advantage, depending on the
    problem. Note that different OS also handle them differently, so code
    that's optimal on one system might not be as optimal on another. Still,
    some generalities can be made.

    Each process is a separate program, with its own address space and its
    own file handles, etc. You can examine them separately with task
    manager, for example. If you launch multiple processes, they might not
    even all have to be python, so if one problem can be handled by an
    existing program, just run it as a separate process. Processes are
    generally very protected from each other, and the OS is generally better
    at scheduling them than it is at scheduling threads within a single
    process. If you have multiple cores, the processes can really run
    simultaneously, frequently with very small overhead. The downside is
    that you cannot share variables between processes without extra work, so
    if the two tasks are very interdependent, it's more of a pain to use
    separate processes.

    Within one process, you can have multiple threads. On some OS, and in
    some languages, this can be extremely efficient. Some programs launch
    hundreds of threads, and use them to advantage. By default, it's easy
    to share data between threads, since they're in the same address space.
    But the downsides are 1) it's very easy to trash another thread by
    walking on its variables. 2) Python does a lousy job of letting threads
    work independently. For CPU-bound tasks, using separate threads is
    likely to be slower than just doing it all in one thread.

    --

    DaveA
    Dave Angel, Nov 16, 2011
    #5
  6. On Wed, Nov 16, 2011 at 9:27 AM, Dave Angel <> wrote:
    > On 11/16/2011 12:00 PM, Jack Keegan wrote:
    >[...] Processes [...] and the OS is generally better at scheduling them than it is at
    > scheduling threads within a single process.  If you have multiple cores, the
    > processes can really run simultaneously, frequently with very small
    > overhead.  [...]


    Maybe you are trying to simplify things but in a lot of cases this is
    just false. In at least some operating systems these days a thread is
    the basic unit that is scheduled. Processes are thread containers
    that provide other things (fds, separate address space, etc.). The
    comment about multiple cores can be extended to multiple threads on a
    core (CMT) but applies to threads as well as processes. Switching
    between processes tends to be heavier weight then switching between
    threads in a process because of the needs to change the address space.

    Just because Python sucks at threads doesn't make them heavier for the OS.

    That doesn't mean you shouldn't use multiprocessing. The problem
    asked about seems a good fit to me to a single python process starting
    and managing a set of external converter processes.

    Michael
    Michael Hunter, Nov 16, 2011
    #6
  7. Eduardo Oliva

    Dave Angel Guest

    On 11/16/2011 12:55 PM, Michael Hunter wrote:
    > On Wed, Nov 16, 2011 at 9:27 AM, Dave Angel<> wrote:
    >> On 11/16/2011 12:00 PM, Jack Keegan wrote:
    >> [...] Processes [...] and the OS is generally better at scheduling them than it is at
    >> scheduling threads within a single process. If you have multiple cores, the
    >> processes can really run simultaneously, frequently with very small
    >> overhead. [...]

    >
    > Maybe you are trying to simplify things but in a lot of cases this is
    > just false. In at least some operating systems these days a thread is
    > the basic unit that is scheduled. Processes are thread containers
    > that provide other things (fds, separate address space, etc.). The
    > comment about multiple cores can be extended to multiple threads on a
    > core (CMT) but applies to threads as well as processes. Switching
    > between processes tends to be heavier weight then switching between
    > threads in a process because of the needs to change the address space.
    >
    > Just because Python sucks at threads doesn't make them heavier for the OS.
    >
    > That doesn't mean you shouldn't use multiprocessing. The problem
    > asked about seems a good fit to me to a single python process starting
    > and managing a set of external converter processes.
    >
    > Michael
    >


    No response is deserved.

    --

    DaveA
    Dave Angel, Nov 16, 2011
    #7
  8. Eduardo Oliva

    Dave Angel Guest

    On 11/16/2011 01:22 PM, Dave Angel wrote:
    > (You're top-posting. Put your remarks AFTER what you're quoting)
    >
    > On 11/16/2011 12:52 PM, Jack Keegan wrote:
    >> Ok, I thought that processes would do the same job as threads. So
    >> would the
    >> general rule be some thing like so:
    >>
    >> If I want another piece of work to run (theoretically) along side my
    >> main
    >> script, and I want to share data between them, I should use a thread and
    >> share data with the thread-safe queue.
    >> If the work I want done can function and complete on its own, go for a
    >> process.
    >>
    >> Would that be about right?
    >>

    >
    > Yes, with all the caveats I mentioned before. With some language
    > implementations, and with some operating systems, and on some
    > CPU-systems, the guidelines could be different. They all trade off in
    > ways too complex to describe here.
    >
    > For example, if a thread is mostly doing I/O, it may be just as
    > efficient as a separate process, even if sharing data isn't an issue.
    >
    > And in some languages, sharing data between processes isn't all that
    > tough, either.
    >
    >

    Well, you sent me a mail without including the list (just use
    Reply-All), and I tried to add the list in. Unfortunately, I picked the
    wrong one, so i sent this to Tutor by mistake. I'll try to fix that
    now, sorry.




    --

    DaveA
    Dave Angel, Nov 16, 2011
    #8
  9. Eduardo Oliva

    Miki Tebeka Guest

    Miki Tebeka, Nov 16, 2011
    #9
  10. On Wed, 16 Nov 2011 17:45:29 +0100, Thomas Rachel
    <>
    declaimed the following in gmane.comp.python.general:

    > Am 16.11.2011 14:48 schrieb Eduardo Oliva:
    > > Hello, I have a py script that reads for all "m2ts" video files and convert them to "mpeg" using ffmpeg with command line.
    > >
    > > What I want to do is:
    > >
    > > I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
    > > I know that Semaphores would help with that.
    > > But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.
    > >
    > > Any help would be great.

    >
    > I'm not sure if you need threads at all: if you launch a process with
    > subprocess, it runs and you only would have to wait() for it. The same
    > can be done with two processes.
    >

    In the larger problem description, though, it is mentioned that the
    actual conversion is done by spawning an "ffmpeg" command... So why run
    a Python process whose only activity is to spawn a process that does the
    real work.


    Using a pair (or however many) worker threads which feed off a
    shared Queue for the files to convert, and then spawn the ffmpeg
    process(es) and wait, is likely simpler than trying to feed data to
    external processes.
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
    Dennis Lee Bieber, Nov 17, 2011
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Smegly
    Replies:
    1
    Views:
    1,096
    Mitchell
    May 19, 2004
  2. yoda
    Replies:
    2
    Views:
    426
    =?utf-8?Q?Bj=C3=B6rn_Lindstr=C3=B6m?=
    Aug 1, 2005
  3. threads without threads

    , Aug 27, 2004, in forum: C Programming
    Replies:
    4
    Views:
    391
    William Ahern
    Aug 27, 2004
  4. Pedro Pinto

    Java Threads - Get running threads

    Pedro Pinto, Apr 8, 2008, in forum: Java
    Replies:
    2
    Views:
    1,420
    Arne Vajhøj
    Apr 9, 2008
  5. Une bévue
    Replies:
    0
    Views:
    140
    Une bévue
    Jun 14, 2006
Loading...

Share This Page