Please help with Threading

Jurgens de Bruin · May 18, 2013

This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneouslyand creating the graphs in "parallel".
I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor theresource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.

My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm

Any help would be create!!!

Peter Otten · May 18, 2013

Jurgens said:
This is my first script where I want to use the python threading module. I
have a large dataset which is a list of dict this can be as much as 200
dictionaries in the list. The final goal is a histogram for each dict 16
histograms on a page ( 4x4 ) - this already works.
What I currently do is a create a nested list [ [ {} ], [ {} ] ] each
inner list contains 16 dictionaries, thus each inner list is a single page
of 16 histograms. Iterating over the outer-list and creating the graphs
takes to long. So I would like multiple inner-list to be processes
simultaneously and creating the graphs in "parallel".
I am trying to use the python threading for this. I create 4 threads loop
over the outer-list and send a inner-list to the thread. This seems to
work if my nested lists only contains 2 elements - thus less elements than
threads. Currently the scripts runs and then seems to get hung up. I
monitor the resource on my mac and python starts off good using 80% and
when the 4-thread is created the CPU usages drops to 0%.

My thread creating is based on the following :
http://www.tutorialspoint.com/python/python_multithreading.htm

Any help would be create!!!

Can you show us the code?

Jurgens de Bruin · May 18, 2013

I will post code - the entire scripts is 1000 lines of code - can I post the threading functions only?

Peter Otten · May 18, 2013

Jurgens said:
I will post code - the entire scripts is 1000 lines of code - can I post
the threading functions only?

Try to condense it to the relevant parts, but make sure that it can be run
by us.

As a general note, when you add new stuff to an existing longish script it
is always a good idea to write it in such a way that you can test it
standalone so that you can have some confidence that it will work as
designed once you integrate it with your old code.

Dave Angel · May 18, 2013

This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.

My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm

Any help would be create!!!

CPython, and apparently (all of?) the other current Python
implementations, uses a GIL to prevent multi-threaded applications from
shooting themselves in the foot.

However the practical effect of the GIL is that CPU-bound applications
do not multi-thread efficiently; the single-threaded version usually
runs faster.

The place where CPython programs gain from multithreading is where each
thread spends much of its time waiting for some external trigger.

(More specifically, if such a wait is inside well-written C code, it
releases the GIL so other threads can get useful work done. Example is
a thread waiting for internet activity, and blocks inside a system call)

Dennis Lee Bieber · May 18, 2013

This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.

The odds are good that this is just going to run slower...

One: The common Python implementation uses a global interpreter lock
to prevent interpreted code from interfering with itself in multiple
threads. So "number cruncher" applications don't gain any speed from
being partitioned into thread -- even on a multicore processor, only one
thread can have the GIL at a time. On top of that, you have the overhead
of the interpreter switching between threads (GIL release on one thread,
GIL acquire for the next thread).

Python threads work fine if the threads either rely on intelligent
DLLs for number crunching (instead of doing nested Python loops to
process a numeric array you pass it to something like NumPy which
releases the GIL while crunching a copy of the array) or they do lots of
I/O and have to wait for I/O devices (while one thread is waiting for
the write/read operation to complete, another thread can do some number
crunching).

If you really need to do this type of number crunching in Python
level code, you'll want to look into the multiprocessing library
instead. That will create actual OS processes (each with a copy of the
interpreter, and not sharing memory) and each of those can run on a core
without conflicting on the GIL.

Carlos Nepomuceno · May 19, 2013

----------------------------------------

To: (e-mail address removed)
From: (e-mail address removed)
Subject: Re: Please help with Threading
Date: Sat, 18 May 2013 15:28:56 -0400

This is my first script where I want to use the python threading module.I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.

Click to expand...

The odds are good that this is just going to run slower...

Just been told that GIL doesn't make things slower, but as I didn't know that such a thing even existed I went out looking for more info and found that document: http://www.dabeaz.com/python/UnderstandingGIL.pdf

Is it current? I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel executionon multi-cores.

What's the catch on making Python threads preemptive? Are there any ongoingprojects to make that?

One: The common Python implementation uses a global interpreter lock
to prevent interpreted code from interfering with itself in multiple
threads. So "number cruncher" applications don't gain any speed from
being partitioned into thread -- even on a multicore processor, only one
thread can have the GIL at a time. On top of that, you have the overhead
of the interpreter switching between threads (GIL release on one thread,
GIL acquire for the next thread).

Python threads work fine if the threads either rely on intelligent
DLLs for number crunching (instead of doing nested Python loops to
process a numeric array you pass it to something like NumPy which
releases the GIL while crunching a copy of the array) or they do lots of
I/O and have to wait for I/O devices (while one thread is waiting for
the write/read operation to complete, another thread can do some number
crunching).

If you really need to do this type of number crunching in Python
level code, you'll want to look into the multiprocessing library
instead. That will create actual OS processes (each with a copy of the
interpreter, and not sharing memory) and each of those can run on a core
without conflicting on the GIL.

Which library do you suggest?

Chris Angelico · May 19, 2013

I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores.

What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?

Preemption isn't really the issue here. On the C level, preemptive vs
cooperative usually means the difference between a stalled thread
locking everyone else out and not doing so. Preemption is done at a
lower level than user code (eg the operating system or the CPU),
meaning that user code can't retain control of the CPU.

With interpreted code eg in CPython, it's easy to implement preemption
in the interpreter. I don't know how it's actually done, but one easy
implementation would be "every N bytecode instructions, context
switch". It's still done at a lower level than user code (N bytecode
instructions might all actually be a single tight loop that the
programmer didn't realize was infinite), but it's not at the OS level.

But none of that has anything to do with multiple core usage. The
problem there is that shared data structures need to be accessed
simultaneously, and in CPython, there's a Global Interpreter Lock to
simplify that; but the consequence of the GIL is that no two threads
can simultaneously execute user-level code. There have been
GIL-removal proposals at various times, but the fact remains that a
global lock makes a huge amount of sense and gives pretty good
performance across the board. There's always multiprocessing when you
need multiple CPU-bound threads; it's an explicit way to separate the
shared data (what gets transferred) from local (what doesn't).

ChrisA

Dennis Lee Bieber · May 19, 2013

With interpreted code eg in CPython, it's easy to implement preemption
in the interpreter. I don't know how it's actually done, but one easy
implementation would be "every N bytecode instructions, context
switch". It's still done at a lower level than user code (N bytecode

Which IS how the common Python interpreter does it -- barring the
thread making some system call that triggers a preemption ahead of time
(even time.sleep(0.0) triggers scheduling). Forget if the default is 20
or 100 byte-code instructions -- as I recall, it DID change a few
versions back.

Part of the context switch is to transfer the GIL from the preempted
thread to the new thread.

So, overall, on a SINGLE CORE processor running multiple CPU bound
threads takes a bit longer just due to the overhead of thread swapping.

On a multi-core processor, the effect is the same, since -- even
though one may have a thread running on each core -- the GIL is only
assigned to one thread, and other threads get blocked when trying to
access runtime data structures. And you may have even more overhead from
processor cache misses if the a thread gets assigned to a different
core.

(yes -- I'm restating the same thing as I had just trimmed below
this point... but the target is really the OP, where repetition may be
helpful in understanding)

Chris Angelico · May 19, 2013

Which IS how the common Python interpreter does it -- barring the
thread making some system call that triggers a preemption ahead of time
(even time.sleep(0.0) triggers scheduling). Forget if the default is 20
or 100 byte-code instructions -- as I recall, it DID change a few
versions back.

Incidentally, is the context-switch check the same as the check for
interrupt signal raising KeyboardInterrupt? ISTR that was another
"every N instructions" check.

ChrisA

Dave Angel · May 20, 2013

Which IS how the common Python interpreter does it -- barring the
thread making some system call that triggers a preemption ahead of time
(even time.sleep(0.0) triggers scheduling). Forget if the default is 20
or 100 byte-code instructions -- as I recall, it DID change a few
versions back.

Part of the context switch is to transfer the GIL from the preempted
thread to the new thread.

So, overall, on a SINGLE CORE processor running multiple CPU bound
threads takes a bit longer just due to the overhead of thread swapping.

On a multi-core processor, the effect is the same, since -- even
though one may have a thread running on each core -- the GIL is only
assigned to one thread, and other threads get blocked when trying to
access runtime data structures. And you may have even more overhead from
processor cache misses if the a thread gets assigned to a different
core.

(yes -- I'm restating the same thing as I had just trimmed below
this point... but the target is really the OP, where repetition may be
helpful in understanding)

So what's the mapping between real (OS) threads, and the fake ones
Python uses? The OS keeps track of a separate stack and context for
each thread it knows about; are they one-to-one with the ones you're
describing here? If so, then any OS thread that gets scheduled will
almost always find it can't get the GIL, and spend time thrashing. But
the change that CPython does intentionally would be equivalent to a
sleep(0).

On the other hand, if these threads are distinct from the OS threads, is
it done with some sort of thread pool, where CPython has its own stack,
and doesn't really use the one managed by the OS?

Understand the only OS threading I really understand is the one in
Windows (which I no longer use). So assuming Linux has some form of
lightweight threading, the distinction above may not map very well.

Dennis Lee Bieber · May 20, 2013

Incidentally, is the context-switch check the same as the check for
interrupt signal raising KeyboardInterrupt? ISTR that was another
"every N instructions" check.

That I couldn't say -- it would be the obvious spot for the
interpreter to check some global flag, said flag perhaps being set by an
interrupt handler, signal bits, or whatever the underlying OS uses.

OTOH, KeyboardInterrupt may be something passed up through the I/O
system and only checked when a thread performs I/O on stdin (which would
explain how number crunchers can be "unstoppable"). And in this case,
the invocation of the I/O triggers a context switch.

Dennis Lee Bieber · May 20, 2013

So what's the mapping between real (OS) threads, and the fake ones
Python uses? The OS keeps track of a separate stack and context for
each thread it knows about; are they one-to-one with the ones you're
describing here? If so, then any OS thread that gets scheduled will
almost always find it can't get the GIL, and spend time thrashing. But
the change that CPython does intentionally would be equivalent to a
sleep(0).

No. The first time that thread attempts to gain the GIL it will be
blocked. It will not be made ready again until the current owner of the
GIL frees it (at which point it competes with all other threads that
were blocked).

No thrashing -- but a lot of threads blocked waiting for the GIL,
and why multicore processors won't see a speed up in number crunching
applications using threads. Multiprocessing creates copies of the
interpreter, and each copy has its own GIL which won't conflict with the
others -- so intense number crunchers can benefit even with the overhead
of creating a new/independent process. I/O bound tasks don't gain as
much from multiprocessing as you have the overhead of creating a system
process, only to spend most of the time waiting for an I/O operation to
complete. Threads work well for that situation (and Twisted even gets by
without threading -- though my mind just can't work with the Twisted
architecture <G>; even one number cruncher in Twisted has to be
cooperative, working in chunks and returning so the dispatcher can
handle events).

On the other hand, if these threads are distinct from the OS threads, is
it done with some sort of thread pool, where CPython has its own stack,
and doesn't really use the one managed by the OS?

Even the common GNAT Ada releases rely upon the OS for tasking.

One pretty much has to use the OS task scheduler, otherwise one
thread inside the interpreter/runtime that blocks on an OS system call
will block the entire interpreter/runtime and hence any other thread
would be blocked too.

Understand the only OS threading I really understand is the one in
Windows (which I no longer use). So assuming Linux has some form of
lightweight threading, the distinction above may not map very well.

And I'm most familiar with the Amiga, even though I've not used it
in 20 years. In it, the OS scheduled "tasks" -- but above tasks were
"processes". A process contained structures for stdin/stdout/stderr,
current directory, environment variables. These structures were just
extensions of the task control block holding signal bits, register
contents (when not running) etc.

Windows "lightweight" threads would probably be "fibers" -- which
require the Windows application itself to schedule them, rather than the
OS. IOW, they are closer to co-routines (and run within a thread that is
controlling them).

Jurgens de Bruin · Jun 3, 2013

This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.

What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".

I am trying to use the python threading for this. I create 4 threads loopover the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.

My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm

Any help would be create!!!

Thanks to all for the discussion/comments on threading, although I have notbeen commenting I have been following. I have learnt a lot and I am stillreading up on everything mentioned. Thanks again
Will see how I am going to solve my senario.

Trouble with Multi-threading	8	Dec 10, 2013
embedded python and threading	7	Jul 26, 2013
threading question	0	Sep 11, 2009
Threading issue (using alsaaudio)	0	Dec 19, 2011
Help with my responsive home page	2	Dec 14, 2022
Threading with queues	7	Dec 21, 2009
Threading problem	2	Apr 25, 2010
Threading	2	Nov 30, 2006

Please help with Threading

Jurgens de Bruin

Peter Otten

Jurgens de Bruin

Peter Otten

Dave Angel

Dennis Lee Bieber

Carlos Nepomuceno

Chris Angelico

Dennis Lee Bieber

Chris Angelico

Dave Angel

Dennis Lee Bieber

Dennis Lee Bieber

Jurgens de Bruin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads