Re: threading support in python

Discussion in 'Python' started by sjdevnull@yahoo.com, Sep 5, 2006.

  1. Guest

    Sandra-24 wrote:
    > > You seem to be confused about the nature of multiple-process
    > > programming.
    > >
    > > If you're on a modern Unix/Linux platform and you have static read-only
    > > data, you can just read it in before forking and it'll be shared
    > > between the processes..

    >
    > Not familiar with *nix programming, but I'll take your word on it.


    You can do the same on Windows if you use CreateProcessEx to create the
    new processes and pass a NULL SectionHandle. I don't think this helps
    in your case, but I was correcting your impression that "you'd have to
    physically double the computer's memory for a dual core, or quadruple
    it for a quadcore". That's just not even near true.

    > > Threads are way overused in modern multiexecution programming. The

    >
    > <snip>
    >
    > > It used to run on windows with multiple processes. If it really won't
    > > now, use an older version or contribute a fix.

    >
    > First of all I'm not in control of spawning processes or threads.
    > Apache does that, and apache has no MPM for windows that uses more than
    > 1 process.


    As I said, Apache used to run on Windows with multiple processes; using
    a version that supports that is one option. There are good reasons not
    to do that, though, so you could be stuck with threads.

    > Secondly "Superior" is definately a matter of opinion. Let's
    > see how you would define superior.


    Having memory protection is superior to not having it--OS designers
    spent years implementing it, why would you toss out a fair chunk of it?
    Being explicit about what you're sharing is generally better than not.


    But as I said, threads are a better solution if you're sharing the vast
    majority of your memory and have complex data structures to share.
    When you're starting a new project, really think about whether they're
    worth the considerable tradeoffs, though, and consider the merits of a
    multiprocess solution.

    > 3) Rewrite my codebase to use some form of shared memory. This would be
    > a terrible nightmare that would take at least a month of development
    > time and a lot of heavy rewriting. It would be very difficult, but I'll
    > grant that it may work if done properly with only small performance
    > losses.


    It's almost certainly not worth rewriting a large established
    codebasen.


    > I would find an easier time, I think, porting mod_python to .net and
    > leaving that GIL behind forever. Thankfully, I'm not considering such
    > drastic measures - yet.


    The threads vs. processes thing isn't strongly related to the
    implementation language (though a few languages like Java basically
    take the decision out of your hands). Moving to .NET leaves you with
    the same questions to consider before making the decision--just working
    in C# doesn't somehow make threads the right choice all the time.

    > Why on earth would I want to do all of that work? Just because you want
    > to keep this evil thing called a GIL?


    No, I agreed that the GIL is a bad thing for some applications.

    > My suggestion is in python 3
    > ditch the ref counting, use a real garbage collector


    I disagree with this, though. The benefits of deterministic GC are
    huge and I'd like to see ref-counting semantics as part of the language
    definition. That's a debate I just had in another thread, though, and
    don't want to repeat.

    > > Now, the GIL is independent of this; if you really need threading in
    > > your situation (you share almost everything and have hugely complex
    > > data structures that are difficult to maintain in shm) then you're
    > > still going to run into GIL serialization. If you're doing a lot of
    > > work in native code extensions this may not actually be a big
    > > performance hit, if not it can be pretty bad.

    >
    > Actually, I'm not sure I understand you correctly. You're saying that
    > in an environment like apache (with 250 threads or so) and my hugely
    > complex shared data structures, that the GIL is going to cause a huge
    > performance hit?


    I didn't say that. It can be a big hit or it can be unnoticeable. It
    depends on your application. You have to benchmark to know for sure.

    But if you're trying to make a guess: if you're doing a lot of heavy
    lifting in native modules then the GIL may be released during those
    calls, and you might get good multithreading performance. If you're
    doing lots of I/O requests the GIL is generally released during those
    and things will be fine. If you're doing lots of heavy crunching in
    Python, the GIL is probably held and can be a big performance issue.

    Since your app sounds like it's basically written, there's not much
    cause to guess; benchmark it and see if it's fast enough or not. If
    so, don't spend time and effort optimizing.
    , Sep 5, 2006
    #1
    1. Advertising

  2. Paul Rubin Guest

    "" <> writes:
    > Having memory protection is superior to not having it--OS designers
    > spent years implementing it, why would you toss out a fair chunk of it?
    > Being explicit about what you're sharing is generally better than not.


    Part of the win of programming in Python instead of C is having the
    language do memory management for you--no more null pointers
    dereferences or malloc/free errors. Using shared memory puts all that
    squarely back in your lap.

    > I disagree with this, though. The benefits of deterministic GC are
    > huge and I'd like to see ref-counting semantics as part of the language
    > definition. That's a debate I just had in another thread, though, and
    > don't want to repeat.


    That's ok, it can be summarized quickly: it lets you keep saying

    def func(filename):
    f = open(filename)
    do_something_with(f)
    # exit from function scope causes f to automagically get closed,
    # unless the "do_something_with" didn't know about this expectation
    # and saved a reference for some reason.

    instead of using the Python 2.5 construction

    def func(filename):
    with open(filename) as f:
    do_something_with(f)
    # f definitely gets closed when the "with" block exits

    which more explicitly shows the semantics actually desired. Not that
    "huge" a benefit as far as I can tell. Lisp programmers have gotten
    along fine without it for 40+ years...
    Paul Rubin, Sep 6, 2006
    #2
    1. Advertising

  3. Guest

    Paul Rubin wrote:
    > "" <> writes:
    > > Having memory protection is superior to not having it--OS designers
    > > spent years implementing it, why would you toss out a fair chunk of it?
    > > Being explicit about what you're sharing is generally better than not.

    >
    > Part of the win of programming in Python instead of C is having the
    > language do memory management for you--no more null pointers
    > dereferences or malloc/free errors. Using shared memory puts all that
    > squarely back in your lap.


    Huh? Why couldn't you use garbage collection with objects allocated in
    shm? The worst theoretical case is about the same programatically as
    having garbage collected objects in a multithreaded program.

    Python doesn't actually support that as of yet, but it could. In the
    interim, if the memory you're sharing is array-like then you can
    already take full advantage of multiprocess solutions in Python.
    , Sep 6, 2006
    #3
  4. Paul Rubin Guest

    "" <> writes:
    > > Part of the win of programming in Python instead of C is having the
    > > language do memory management for you--no more null pointers
    > > dereferences or malloc/free errors. Using shared memory puts all that
    > > squarely back in your lap.

    >
    > Huh? Why couldn't you use garbage collection with objects allocated in
    > shm? The worst theoretical case is about the same programatically as
    > having garbage collected objects in a multithreaded program.


    I'm talking about using a module like mmap or the now-AWOL shm module,
    which gives you a big shared byte array that you have to do your own
    memory management in. POSH is a slight improvement over this, since
    it does its own ref counting, but that is slightly leaky, and POSH has
    to marshal every object into the shared area.

    > Python doesn't actually support that as of yet, but it could.


    Well, yeah, with a radically different memory system that's even
    more pie in the sky than the GIL and refcount removal that we've
    been discussing.

    > In the interim, if the memory you're sharing is array-like then you
    > can already take full advantage of multiprocess solutions in Python.


    But then you're back to doing your own memory management within that
    array. Sure, that's tolerable for some applications (C programmers do
    it for everything), but not exactly joy.

    And as already mentioned, the stdlib currently gives no way to
    implement shared memory locks (file locks aren't the same thing).
    POSH and the old shm library do, but POSH is apparently not that
    reliable, and nobody knows what happened to shm.
    Paul Rubin, Sep 6, 2006
    #4
  5. Steve Holden Guest

    wrote:
    > Paul Rubin wrote:
    >
    >>"" <> writes:
    >>
    >>>Having memory protection is superior to not having it--OS designers
    >>>spent years implementing it, why would you toss out a fair chunk of it?
    >>> Being explicit about what you're sharing is generally better than not.

    >>
    >>Part of the win of programming in Python instead of C is having the
    >>language do memory management for you--no more null pointers
    >>dereferences or malloc/free errors. Using shared memory puts all that
    >>squarely back in your lap.

    >
    >
    > Huh? Why couldn't you use garbage collection with objects allocated in
    > shm? The worst theoretical case is about the same programatically as
    > having garbage collected objects in a multithreaded program.
    >
    > Python doesn't actually support that as of yet, but it could. In the
    > interim, if the memory you're sharing is array-like then you can
    > already take full advantage of multiprocess solutions in Python.
    >

    Ah, right. So then we end up with processes that have to suspend because
    they can't collect garbage? "Could" covers a multitude of sins, and
    distributed garbage collection across shard memory is by no means a
    trivial problem.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
    Steve Holden, Sep 6, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. km
    Replies:
    41
    Views:
    1,114
    lcaamano
    Sep 6, 2006
  2. Jean-Paul Calderone

    Re: threading support in python

    Jean-Paul Calderone, Sep 6, 2006, in forum: Python
    Replies:
    10
    Views:
    498
    Antoon Pardon
    Sep 7, 2006
  3. Replies:
    9
    Views:
    1,001
    Mark Space
    Dec 29, 2007
  4. Steven Woody
    Replies:
    0
    Views:
    393
    Steven Woody
    Jan 9, 2009
  5. Jure Erznožnik
    Replies:
    51
    Views:
    1,595
    Hendrik van Rooyen
    Jun 22, 2009
Loading...

Share This Page