Parallel processing on shared data structures

Discussion in 'Python' started by psaffrey@googlemail.com, Mar 19, 2009.

  1. Guest

    I'm filing 160 million data points into a set of bins based on their
    position. At the moment, this takes just over an hour using interval
    trees. I would like to parallelise this to take advantage of my quad
    core machine. I have some experience of Parallel Python, but PP seems
    to only really work for problems where you can do one discrete bit of
    processing and recombine these results at the end.

    I guess I could thread my code and use mutexes to protect the shared
    lists that everybody is filing into. However, my understanding is that
    Python is still only using one process so this won't give me multi-
    core.

    Does anybody have any suggestions for this?

    Peter
    , Mar 19, 2009
    #1
    1. Advertising

  2. MRAB Guest

    wrote:
    > I'm filing 160 million data points into a set of bins based on their
    > position. At the moment, this takes just over an hour using interval
    > trees. I would like to parallelise this to take advantage of my quad
    > core machine. I have some experience of Parallel Python, but PP seems
    > to only really work for problems where you can do one discrete bit of
    > processing and recombine these results at the end.
    >
    > I guess I could thread my code and use mutexes to protect the shared
    > lists that everybody is filing into. However, my understanding is that
    > Python is still only using one process so this won't give me multi-
    > core.
    >
    > Does anybody have any suggestions for this?
    >

    Could you split your data set and run multiple instances of the script
    at the same time and then merge the corresponding lists?
    MRAB, Mar 19, 2009
    #2
    1. Advertising

  3. <> wrote:



    > I'm filing 160 million data points into a set of bins based on their
    > position. At the moment, this takes just over an hour using interval


    So why do you not make four sets of bins - one for each core of your quad,
    and split the points into quarters, and run four processes, and merge the
    results
    later?

    This assumes that it is the actual filing process that is the bottle neck,
    and that the bins are just sets, where position, etc does not matter.

    If it takes an hour just to read the input, then nothing you can do
    will make it better.

    - Hendrik
    Hendrik van Rooyen, Mar 20, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alfonso Morra
    Replies:
    11
    Views:
    703
    Emmanuel Delahaye
    Sep 24, 2005
  2. Soren
    Replies:
    4
    Views:
    1,241
    c d saunter
    Feb 14, 2008
  3. James
    Replies:
    6
    Views:
    336
    Daniel Pitts
    Apr 2, 2008
  4. harshu010
    Replies:
    0
    Views:
    284
    harshu010
    May 29, 2008
  5. Valery
    Replies:
    9
    Views:
    1,445
    Klauss
    Jan 7, 2010
Loading...

Share This Page