Please help with Threading

Discussion in 'Python' started by Jurgens de Bruin, May 18, 2013.

  1. This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
    What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneouslyand creating the graphs in "parallel".
    I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor theresource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.

    My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm

    Any help would be create!!!
     
    Jurgens de Bruin, May 18, 2013
    #1
    1. Advertising

  2. Jurgens de Bruin

    Peter Otten Guest

    Jurgens de Bruin wrote:

    > This is my first script where I want to use the python threading module. I
    > have a large dataset which is a list of dict this can be as much as 200
    > dictionaries in the list. The final goal is a histogram for each dict 16
    > histograms on a page ( 4x4 ) - this already works.
    > What I currently do is a create a nested list [ [ {} ], [ {} ] ] each
    > inner list contains 16 dictionaries, thus each inner list is a single page
    > of 16 histograms. Iterating over the outer-list and creating the graphs
    > takes to long. So I would like multiple inner-list to be processes
    > simultaneously and creating the graphs in "parallel".
    > I am trying to use the python threading for this. I create 4 threads loop
    > over the outer-list and send a inner-list to the thread. This seems to
    > work if my nested lists only contains 2 elements - thus less elements than
    > threads. Currently the scripts runs and then seems to get hung up. I
    > monitor the resource on my mac and python starts off good using 80% and
    > when the 4-thread is created the CPU usages drops to 0%.
    >
    > My thread creating is based on the following :
    > http://www.tutorialspoint.com/python/python_multithreading.htm
    >
    > Any help would be create!!!


    Can you show us the code?
     
    Peter Otten, May 18, 2013
    #2
    1. Advertising

  3. I will post code - the entire scripts is 1000 lines of code - can I post the threading functions only?
     
    Jurgens de Bruin, May 18, 2013
    #3
  4. Jurgens de Bruin

    Peter Otten Guest

    Jurgens de Bruin wrote:

    > I will post code - the entire scripts is 1000 lines of code - can I post
    > the threading functions only?


    Try to condense it to the relevant parts, but make sure that it can be run
    by us.

    As a general note, when you add new stuff to an existing longish script it
    is always a good idea to write it in such a way that you can test it
    standalone so that you can have some confidence that it will work as
    designed once you integrate it with your old code.
     
    Peter Otten, May 18, 2013
    #4
  5. Jurgens de Bruin

    Dave Angel Guest

    On 05/18/2013 04:58 AM, Jurgens de Bruin wrote:
    > This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
    > What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
    > I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.
    >
    > My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm
    >
    > Any help would be create!!!
    >


    CPython, and apparently (all of?) the other current Python
    implementations, uses a GIL to prevent multi-threaded applications from
    shooting themselves in the foot.

    However the practical effect of the GIL is that CPU-bound applications
    do not multi-thread efficiently; the single-threaded version usually
    runs faster.

    The place where CPython programs gain from multithreading is where each
    thread spends much of its time waiting for some external trigger.

    (More specifically, if such a wait is inside well-written C code, it
    releases the GIL so other threads can get useful work done. Example is
    a thread waiting for internet activity, and blocks inside a system call)


    --
    DaveA
     
    Dave Angel, May 18, 2013
    #5
  6. On Sat, 18 May 2013 01:58:13 -0700 (PDT), Jurgens de Bruin
    <> declaimed the following in
    gmane.comp.python.general:

    > This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
    > What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
    > I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.
    >


    The odds are good that this is just going to run slower...

    One: The common Python implementation uses a global interpreter lock
    to prevent interpreted code from interfering with itself in multiple
    threads. So "number cruncher" applications don't gain any speed from
    being partitioned into thread -- even on a multicore processor, only one
    thread can have the GIL at a time. On top of that, you have the overhead
    of the interpreter switching between threads (GIL release on one thread,
    GIL acquire for the next thread).

    Python threads work fine if the threads either rely on intelligent
    DLLs for number crunching (instead of doing nested Python loops to
    process a numeric array you pass it to something like NumPy which
    releases the GIL while crunching a copy of the array) or they do lots of
    I/O and have to wait for I/O devices (while one thread is waiting for
    the write/read operation to complete, another thread can do some number
    crunching).

    If you really need to do this type of number crunching in Python
    level code, you'll want to look into the multiprocessing library
    instead. That will create actual OS processes (each with a copy of the
    interpreter, and not sharing memory) and each of those can run on a core
    without conflicting on the GIL.
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, May 18, 2013
    #6
  7. ----------------------------------------
    > To:
    > From:
    > Subject: Re: Please help with Threading
    > Date: Sat, 18 May 2013 15:28:56 -0400
    >
    > On Sat, 18 May 2013 01:58:13 -0700 (PDT), Jurgens de Bruin
    > <> declaimed the following in
    > gmane.comp.python.general:
    >
    >> This is my first script where I want to use the python threading module.I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
    >> What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
    >> I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.
    >>

    >
    > The odds are good that this is just going to run slower...


    Just been told that GIL doesn't make things slower, but as I didn't know that such a thing even existed I went out looking for more info and found that document: http://www.dabeaz.com/python/UnderstandingGIL.pdf

    Is it current? I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel executionon multi-cores.

    What's the catch on making Python threads preemptive? Are there any ongoingprojects to make that?

    > One: The common Python implementation uses a global interpreter lock
    > to prevent interpreted code from interfering with itself in multiple
    > threads. So "number cruncher" applications don't gain any speed from
    > being partitioned into thread -- even on a multicore processor, only one
    > thread can have the GIL at a time. On top of that, you have the overhead
    > of the interpreter switching between threads (GIL release on one thread,
    > GIL acquire for the next thread).
    >
    > Python threads work fine if the threads either rely on intelligent
    > DLLs for number crunching (instead of doing nested Python loops to
    > process a numeric array you pass it to something like NumPy which
    > releases the GIL while crunching a copy of the array) or they do lots of
    > I/O and have to wait for I/O devices (while one thread is waiting for
    > the write/read operation to complete, another thread can do some number
    > crunching).
    >
    > If you really need to do this type of number crunching in Python
    > level code, you'll want to look into the multiprocessing library
    > instead. That will create actual OS processes (each with a copy of the
    > interpreter, and not sharing memory) and each of those can run on a core
    > without conflicting on the GIL.


    Which library do you suggest?

    > --
    > Wulfraed Dennis Lee Bieber AF6VN
    > HTTP://wlfraed.home.netcom.com/
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
     
    Carlos Nepomuceno, May 19, 2013
    #7
  8. On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno
    <> wrote:
    > I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores.
    >
    > What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?


    Preemption isn't really the issue here. On the C level, preemptive vs
    cooperative usually means the difference between a stalled thread
    locking everyone else out and not doing so. Preemption is done at a
    lower level than user code (eg the operating system or the CPU),
    meaning that user code can't retain control of the CPU.

    With interpreted code eg in CPython, it's easy to implement preemption
    in the interpreter. I don't know how it's actually done, but one easy
    implementation would be "every N bytecode instructions, context
    switch". It's still done at a lower level than user code (N bytecode
    instructions might all actually be a single tight loop that the
    programmer didn't realize was infinite), but it's not at the OS level.

    But none of that has anything to do with multiple core usage. The
    problem there is that shared data structures need to be accessed
    simultaneously, and in CPython, there's a Global Interpreter Lock to
    simplify that; but the consequence of the GIL is that no two threads
    can simultaneously execute user-level code. There have been
    GIL-removal proposals at various times, but the fact remains that a
    global lock makes a huge amount of sense and gives pretty good
    performance across the board. There's always multiprocessing when you
    need multiple CPU-bound threads; it's an explicit way to separate the
    shared data (what gets transferred) from local (what doesn't).

    ChrisA
     
    Chris Angelico, May 19, 2013
    #8
  9. On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico <>
    declaimed the following in gmane.comp.python.general:

    > On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno
    > <> wrote:
    > > I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores.
    > >
    > > What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?

    >

    <snip>

    > With interpreted code eg in CPython, it's easy to implement preemption
    > in the interpreter. I don't know how it's actually done, but one easy
    > implementation would be "every N bytecode instructions, context
    > switch". It's still done at a lower level than user code (N bytecode


    Which IS how the common Python interpreter does it -- barring the
    thread making some system call that triggers a preemption ahead of time
    (even time.sleep(0.0) triggers scheduling). Forget if the default is 20
    or 100 byte-code instructions -- as I recall, it DID change a few
    versions back.

    Part of the context switch is to transfer the GIL from the preempted
    thread to the new thread.

    So, overall, on a SINGLE CORE processor running multiple CPU bound
    threads takes a bit longer just due to the overhead of thread swapping.

    On a multi-core processor, the effect is the same, since -- even
    though one may have a thread running on each core -- the GIL is only
    assigned to one thread, and other threads get blocked when trying to
    access runtime data structures. And you may have even more overhead from
    processor cache misses if the a thread gets assigned to a different
    core.

    (yes -- I'm restating the same thing as I had just trimmed below
    this point... but the target is really the OP, where repetition may be
    helpful in understanding)
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, May 19, 2013
    #9
  10. On Mon, May 20, 2013 at 7:46 AM, Dennis Lee Bieber
    <> wrote:
    > On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico <>
    > declaimed the following in gmane.comp.python.general:
    >> With interpreted code eg in CPython, it's easy to implement preemption
    >> in the interpreter. I don't know how it's actually done, but one easy
    >> implementation would be "every N bytecode instructions, context
    >> switch". It's still done at a lower level than user code (N bytecode

    >
    > Which IS how the common Python interpreter does it -- barring the
    > thread making some system call that triggers a preemption ahead of time
    > (even time.sleep(0.0) triggers scheduling). Forget if the default is 20
    > or 100 byte-code instructions -- as I recall, it DID change a few
    > versions back.


    Incidentally, is the context-switch check the same as the check for
    interrupt signal raising KeyboardInterrupt? ISTR that was another
    "every N instructions" check.

    ChrisA
     
    Chris Angelico, May 19, 2013
    #10
  11. Jurgens de Bruin

    Dave Angel Guest

    On 05/19/2013 05:46 PM, Dennis Lee Bieber wrote:
    > On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico <>
    > declaimed the following in gmane.comp.python.general:
    >
    >> On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno
    >> <> wrote:
    >>> I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores.
    >>>
    >>> What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?

    >>

    > <snip>
    >
    >> With interpreted code eg in CPython, it's easy to implement preemption
    >> in the interpreter. I don't know how it's actually done, but one easy
    >> implementation would be "every N bytecode instructions, context
    >> switch". It's still done at a lower level than user code (N bytecode

    >
    > Which IS how the common Python interpreter does it -- barring the
    > thread making some system call that triggers a preemption ahead of time
    > (even time.sleep(0.0) triggers scheduling). Forget if the default is 20
    > or 100 byte-code instructions -- as I recall, it DID change a few
    > versions back.
    >
    > Part of the context switch is to transfer the GIL from the preempted
    > thread to the new thread.
    >
    > So, overall, on a SINGLE CORE processor running multiple CPU bound
    > threads takes a bit longer just due to the overhead of thread swapping.
    >
    > On a multi-core processor, the effect is the same, since -- even
    > though one may have a thread running on each core -- the GIL is only
    > assigned to one thread, and other threads get blocked when trying to
    > access runtime data structures. And you may have even more overhead from
    > processor cache misses if the a thread gets assigned to a different
    > core.
    >
    > (yes -- I'm restating the same thing as I had just trimmed below
    > this point... but the target is really the OP, where repetition may be
    > helpful in understanding)
    >


    So what's the mapping between real (OS) threads, and the fake ones
    Python uses? The OS keeps track of a separate stack and context for
    each thread it knows about; are they one-to-one with the ones you're
    describing here? If so, then any OS thread that gets scheduled will
    almost always find it can't get the GIL, and spend time thrashing. But
    the change that CPython does intentionally would be equivalent to a
    sleep(0).

    On the other hand, if these threads are distinct from the OS threads, is
    it done with some sort of thread pool, where CPython has its own stack,
    and doesn't really use the one managed by the OS?

    Understand the only OS threading I really understand is the one in
    Windows (which I no longer use). So assuming Linux has some form of
    lightweight threading, the distinction above may not map very well.



    --
    DaveA
     
    Dave Angel, May 20, 2013
    #11
  12. On Mon, 20 May 2013 07:52:23 +1000, Chris Angelico <>
    declaimed the following in gmane.comp.python.general:

    > Incidentally, is the context-switch check the same as the check for
    > interrupt signal raising KeyboardInterrupt? ISTR that was another
    > "every N instructions" check.
    >

    That I couldn't say -- it would be the obvious spot for the
    interpreter to check some global flag, said flag perhaps being set by an
    interrupt handler, signal bits, or whatever the underlying OS uses.

    OTOH, KeyboardInterrupt may be something passed up through the I/O
    system and only checked when a thread performs I/O on stdin (which would
    explain how number crunchers can be "unstoppable"). And in this case,
    the invocation of the I/O triggers a context switch.

    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, May 20, 2013
    #12
  13. On Sun, 19 May 2013 21:04:42 -0400, Dave Angel <>
    declaimed the following in gmane.comp.python.general:

    > So what's the mapping between real (OS) threads, and the fake ones
    > Python uses? The OS keeps track of a separate stack and context for
    > each thread it knows about; are they one-to-one with the ones you're
    > describing here? If so, then any OS thread that gets scheduled will
    > almost always find it can't get the GIL, and spend time thrashing. But
    > the change that CPython does intentionally would be equivalent to a
    > sleep(0).
    >

    No. The first time that thread attempts to gain the GIL it will be
    blocked. It will not be made ready again until the current owner of the
    GIL frees it (at which point it competes with all other threads that
    were blocked).

    No thrashing -- but a lot of threads blocked waiting for the GIL,
    and why multicore processors won't see a speed up in number crunching
    applications using threads. Multiprocessing creates copies of the
    interpreter, and each copy has its own GIL which won't conflict with the
    others -- so intense number crunchers can benefit even with the overhead
    of creating a new/independent process. I/O bound tasks don't gain as
    much from multiprocessing as you have the overhead of creating a system
    process, only to spend most of the time waiting for an I/O operation to
    complete. Threads work well for that situation (and Twisted even gets by
    without threading -- though my mind just can't work with the Twisted
    architecture <G>; even one number cruncher in Twisted has to be
    cooperative, working in chunks and returning so the dispatcher can
    handle events).


    > On the other hand, if these threads are distinct from the OS threads, is
    > it done with some sort of thread pool, where CPython has its own stack,
    > and doesn't really use the one managed by the OS?
    >


    Even the common GNAT Ada releases rely upon the OS for tasking.

    One pretty much has to use the OS task scheduler, otherwise one
    thread inside the interpreter/runtime that blocks on an OS system call
    will block the entire interpreter/runtime and hence any other thread
    would be blocked too.

    > Understand the only OS threading I really understand is the one in
    > Windows (which I no longer use). So assuming Linux has some form of
    > lightweight threading, the distinction above may not map very well.


    And I'm most familiar with the Amiga, even though I've not used it
    in 20 years. In it, the OS scheduled "tasks" -- but above tasks were
    "processes". A process contained structures for stdin/stdout/stderr,
    current directory, environment variables. These structures were just
    extensions of the task control block holding signal bits, register
    contents (when not running) etc.

    Windows "lightweight" threads would probably be "fibers" -- which
    require the Windows application itself to schedule them, rather than the
    OS. IOW, they are closer to co-routines (and run within a thread that is
    controlling them).


    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, May 20, 2013
    #13
  14. On Saturday, 18 May 2013 10:58:13 UTC+2, Jurgens de Bruin wrote:
    > This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
    >
    > What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
    >
    > I am trying to use the python threading for this. I create 4 threads loopover the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.
    >
    >
    >
    > My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm
    >
    >
    >
    > Any help would be create!!!


    Thanks to all for the discussion/comments on threading, although I have notbeen commenting I have been following. I have learnt a lot and I am stillreading up on everything mentioned. Thanks again
    Will see how I am going to solve my senario.
     
    Jurgens de Bruin, Jun 3, 2013
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. KK
    Replies:
    2
    Views:
    711
    Big Brian
    Oct 14, 2003
  2. Replies:
    9
    Views:
    1,126
    Mark Space
    Dec 29, 2007
  3. silkenpy
    Replies:
    1
    Views:
    361
    Dennis Lee Bieber
    Feb 15, 2008
  4. Steven Woody
    Replies:
    0
    Views:
    475
    Steven Woody
    Jan 9, 2009
  5. Steven Woody
    Replies:
    0
    Views:
    489
    Steven Woody
    Jan 9, 2009
Loading...

Share This Page