Multiple threads

E

Eduardo Oliva

Hello, I have a py script that reads for all "m2ts" video files and convert them to "mpeg" using ffmpeg with command line.

What I want to do is:

I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
I know that Semaphores would help with that.
But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.

Any help would be great.

Thank you in advance
 
C

Chris Angelico

Hello, I have a py script that reads for all "m2ts" video files and convert them to "mpeg" using ffmpeg with command line.

What I want to do is:

 I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
 I know that Semaphores would help with that.
 But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.

First off, it's better in CPython (the most popular Python) to use
multiple processes than multiple threads. That aside, what you're
looking at is a pretty common model - a large number of tasks being
served by a pool of workers.

Have a look at the multiprocessing module, specifically Pool:
Version 2: http://docs.python.org/library/multiprocessing.html
Version 3: http://docs.python.org/py3k/library/multiprocessing.html

Should be fairly straightforward.

ChrisA
 
H

Henrik Faber

I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
I know that Semaphores would help with that.
But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.

Absolute standard request, has nothing to do with Python. The way to go
(in Cish pseudocode) is:

thread() {
/* do work */
[...]

/* finished! */
semaphore++;
}

semaphore = 2
while (jobs) {
semaphore--; // will block if pool exhausted
thread();
}

// in the end, collect remaining two workers
semaphore -= 2 // will block until all are finished


Best regards,
Henrik
 
T

Thomas Rachel

Am 16.11.2011 14:48 schrieb Eduardo Oliva:
Hello, I have a py script that reads for all "m2ts" video files and convert them to "mpeg" using ffmpeg with command line.

What I want to do is:

I need my script to run 2 separated threads, and then when the first has finished, starts the next one....but no more than 2 threads.
I know that Semaphores would help with that.
But the problem here is to know when the thread has finished its job, to release the semaphore and start another thread.

Any help would be great.

I'm not sure if you need threads at all: if you launch a process with
subprocess, it runs and you only would have to wait() for it. The same
can be done with two processes.

Pseudocode:

LIMIT = 2

processes = []


def do_waiting(limit):
while len(processes) >= limit:
% take the first one...
sp = processes.pop(0)
% wait for it...
st = sp.wait(100)
if is None:
% timeout, not finished yet, push back.
processes.append(sp)
else:
% finished - don't push back, let outer for loop continue.
print sp, "has finished with", st

for fname in list:
% launch process ...
sp = subprocess.Popen(...)
% ... and register it.
processes.append(sp)
% If we are on the limit, wait for process to finish.
do_waiting(LIMIT)

do_waiting(1)


Thomas
 
D

Dave Angel

Hi Chris,



I had been looking into treads and process/subprocess myself a while ago
and couldn't decide which would suit what I needed to do best. I'm still
very confused about the whole thing. Can you elaborate on the above a bit
please?

Cheers,

Jack
Threads and processes are a concept that exists in your operating
system, and Python can use either of them to advantage, depending on the
problem. Note that different OS also handle them differently, so code
that's optimal on one system might not be as optimal on another. Still,
some generalities can be made.

Each process is a separate program, with its own address space and its
own file handles, etc. You can examine them separately with task
manager, for example. If you launch multiple processes, they might not
even all have to be python, so if one problem can be handled by an
existing program, just run it as a separate process. Processes are
generally very protected from each other, and the OS is generally better
at scheduling them than it is at scheduling threads within a single
process. If you have multiple cores, the processes can really run
simultaneously, frequently with very small overhead. The downside is
that you cannot share variables between processes without extra work, so
if the two tasks are very interdependent, it's more of a pain to use
separate processes.

Within one process, you can have multiple threads. On some OS, and in
some languages, this can be extremely efficient. Some programs launch
hundreds of threads, and use them to advantage. By default, it's easy
to share data between threads, since they're in the same address space.
But the downsides are 1) it's very easy to trash another thread by
walking on its variables. 2) Python does a lousy job of letting threads
work independently. For CPU-bound tasks, using separate threads is
likely to be slower than just doing it all in one thread.
 
M

Michael Hunter

On 11/16/2011 12:00 PM, Jack Keegan wrote:
[...] Processes [...] and the OS is generally better at scheduling them than it is at
scheduling threads within a single process.  If you have multiple cores, the
processes can really run simultaneously, frequently with very small
overhead.  [...]

Maybe you are trying to simplify things but in a lot of cases this is
just false. In at least some operating systems these days a thread is
the basic unit that is scheduled. Processes are thread containers
that provide other things (fds, separate address space, etc.). The
comment about multiple cores can be extended to multiple threads on a
core (CMT) but applies to threads as well as processes. Switching
between processes tends to be heavier weight then switching between
threads in a process because of the needs to change the address space.

Just because Python sucks at threads doesn't make them heavier for the OS.

That doesn't mean you shouldn't use multiprocessing. The problem
asked about seems a good fit to me to a single python process starting
and managing a set of external converter processes.

Michael
 
D

Dave Angel

On 11/16/2011 12:00 PM, Jack Keegan wrote:
[...] Processes [...] and the OS is generally better at scheduling them than it is at
scheduling threads within a single process. If you have multiple cores, the
processes can really run simultaneously, frequently with very small
overhead. [...]

Maybe you are trying to simplify things but in a lot of cases this is
just false. In at least some operating systems these days a thread is
the basic unit that is scheduled. Processes are thread containers
that provide other things (fds, separate address space, etc.). The
comment about multiple cores can be extended to multiple threads on a
core (CMT) but applies to threads as well as processes. Switching
between processes tends to be heavier weight then switching between
threads in a process because of the needs to change the address space.

Just because Python sucks at threads doesn't make them heavier for the OS.

That doesn't mean you shouldn't use multiprocessing. The problem
asked about seems a good fit to me to a single python process starting
and managing a set of external converter processes.

Michael

No response is deserved.
 
D

Dave Angel

(You're top-posting. Put your remarks AFTER what you're quoting)



Yes, with all the caveats I mentioned before. With some language
implementations, and with some operating systems, and on some
CPU-systems, the guidelines could be different. They all trade off in
ways too complex to describe here.

For example, if a thread is mostly doing I/O, it may be just as
efficient as a separate process, even if sharing data isn't an issue.

And in some languages, sharing data between processes isn't all that
tough, either.
Well, you sent me a mail without including the list (just use
Reply-All), and I tried to add the list in. Unfortunately, I picked the
wrong one, so i sent this to Tutor by mistake. I'll try to fix that
now, sorry.
 
D

Dennis Lee Bieber

Am 16.11.2011 14:48 schrieb Eduardo Oliva:

I'm not sure if you need threads at all: if you launch a process with
subprocess, it runs and you only would have to wait() for it. The same
can be done with two processes.
In the larger problem description, though, it is mentioned that the
actual conversion is done by spawning an "ffmpeg" command... So why run
a Python process whose only activity is to spawn a process that does the
real work.


Using a pair (or however many) worker threads which feed off a
shared Queue for the files to convert, and then spawn the ffmpeg
process(es) and wait, is likely simpler than trying to feed data to
external processes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,139
Latest member
JamaalCald
Top