Multi-threading in Python vs Java

P

Peter Cacioppi

Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.

I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc).

There is quite a significant performance improvement when multithreading here.

I'd like to port the project to Python, partly because Python is a better language (IMHO) and partly because Python plays well with Amazon Web Services.

But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.

Any advice much appreciated. Thanks.
 
C

Cameron Simpson

Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.

I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc).

There is quite a significant performance improvement when multithreading here.

I'd like to port the project to Python, [...]
But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.

A couple of random items:

A Java process will happily use multiple cores and hyperthreading.
It makes no thread safety guarentees in the language itself,
though as you say it has a host of thread safe tools to make all
this easy to do safely.

As you expect, CPython has the GIL and will only use one CPU-level
thread of execution _for the purely Python code_. No two python
instructions run in parallel. Functions that block or call thread
safe libraries can (and usually do) release the GIL, allowing
other Python code to execute while native non-Python code does
stuff; that will use multiple cores etc.

Other Python implementations may be more aggressive. I'd suppose
Jypthon could multithread like Java, but really I have no experience
with them.

The standard answer with CPython is that if you want to use multiple
cores to run Python code (versus using Python code to orchestrate
native code) you should use the multiprocessing stuff to fork the
interpreter, and then farm out jobs using queues.

Regarding "concurrent off the shelf data structures", I have a bunch
of Python multithreaded stuff and find the stdlib Queues and Locks
(and Semaphores and so on) sufficient. The Queues (including things
like deque) are thread safe, so a lot of the coordination is pretty
easy.

And of course context managers make Locks and Semaphores very easy
and reliable to use:

L = Lock()
.......
with L:
... do locked stuff ...
...
...

I'm sure you'll get longer and more nuanced replies too.

Cheers,
 
P

Peter Cacioppi

Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.



I have a fairly sophisticated algorithm that I developed as both a singlethreaded and multi-threaded Java application. The multi-threading port wasfairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc).



There is quite a significant performance improvement when multithreading here.



I'd like to port the project to Python, partly because Python is a betterlanguage (IMHO) and partly because Python plays well with Amazon Web Services.



But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.



Any advice much appreciated. Thanks.

I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle racecondition I'm glossing over).

while len(jobs) :
job = jobs.pop()
model = Model(job) # Model is py interface for a lib written in C
newJobs = model.solve() # This will take a long time
for each newJob in newJobs :
jobs.add(newJob)

Here jobs is a thread safe object that is shared across each worker thread.It holds a priority queue of jobs that can be solved in parallel.

Model is a py class that provides the API to a 3rd party library written inC.I know model.solve() will be the bottleneck operation for all but trivial problems.

So, my hope is that the GIL restrictions won't be problematic here. That isto say, I don't need **Python** code to ever run concurrently. I just needPython to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.

It's a nice algorithm for high level languages. Java worked well here, I'm hoping py can be nearly as fast with a much more elegant and readable code.
 
C

Chris Angelico

So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all theother worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.

Sounds like Python will serve you just fine! Check out the threading
module, knock together a quick test, and spin it up!

ChrisA
 
S

Steven D'Aprano

Other Python implementations may be more aggressive. I'd suppose Jypthon
could multithread like Java, but really I have no experience with them.

Neither Jython nor IronPython have a GIL.

The standard answer with CPython is that if you want to use multiple
cores to run Python code (versus using Python code to orchestrate native
code) you should use the multiprocessing stuff to fork the interpreter,
and then farm out jobs using queues.

Note that this really only applies to CPU-bound tasks. For tasks that
depend on file IO (reading and writing files), CPython threads will
operate in parallel as independently and (almost) as efficiently as those
in other languages. That is to say, they will be constrained by the
underlying operating system's ability to do file IO, not by the number of
cores in your CPU.
 
P

Piet van Oostrum

Chris Angelico said:
Sounds like Python will serve you just fine! Check out the threading
module, knock together a quick test, and spin it up!

But it only works if the external C library has been written to release
the GIL around the long computations. If not, then the OP could try to
write a wrapper around them that does this.
 
T

Terry Reedy

I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle race condition I'm glossing over).

while len(jobs) :
job = jobs.pop()
model = Model(job) # Model is py interface for a lib written in C
newJobs = model.solve() # This will take a long time
for each newJob in newJobs :
jobs.add(newJob)

Here jobs is a thread safe object that is shared across each worker thread. It holds a priority queue of jobs that can be solved in parallel.

Model is a py class that provides the API to a 3rd party library written in C.I know model.solve() will be the bottleneck operation for all but trivial problems.

So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.

It's a nice algorithm for high level languages. Java worked well here, I'm hoping py can be nearly as fast with a much more elegant and readable code.

Given that model.solve takes a 'long time' (seconds, at least), the
extra time to start a process over the time to start a thread will be
inconsequential. I would therefore look at the multiprocessing module.
 
P

Peter Cacioppi

Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.



I have a fairly sophisticated algorithm that I developed as both a singlethreaded and multi-threaded Java application. The multi-threading port wasfairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc).



There is quite a significant performance improvement when multithreading here.



I'd like to port the project to Python, partly because Python is a betterlanguage (IMHO) and partly because Python plays well with Amazon Web Services.



But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.



Any advice much appreciated. Thanks.

"Sounds like Python will serve you just fine! Check out the threading
module, knock together a quick test, and spin it up!"

Thanks, that was my assessment as well, just wanted a double check. At the time of posting I was mentally blocked on how to set up a quick proof of concept, but of course writing the post cleared that up ;)

Along with "batteries included" and "we're all adults", I think Python needs a pithy phrase summarizing how well thought out it is. That is to say, the major design decisions were all carefully considered, and as a result things that might appear to be problematic are actually not barriers in practice. My suggestion for this phrase is "Guido was here".

So in this case, I thought the GIL would be a fly in the ointment, but on reflection it turned out not to be the case. Guido was here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top