help! multi-threading problem on hyperthreading smp linux server

Discussion in 'Python' started by Garry Hodgson, Feb 10, 2004.

  1. a colleague of mine has seen an odd problem in some code of ours.
    we initially noticed it on webware, but in distilling a test case it seems
    to be strictly a python issue. in the real system, it manifests as
    webware just locking up, for no apparent reason, until we kill it.
    we've also had the python interpreter running webware die on occasion.
    it works fine on our desktop linux and windows machines, but fails
    on the production hardware. the main difference being that the
    production hardware is a dual Xenon machine with
    hyperthreading enabled.

    has anyone run into problems like this, or have any clues what
    the problem might be? we pushed pretty hard on a skeptical
    project manager to get python and webware accepted for this
    project, and it's embarrassing to have it acting flaky.

    i'd really appreciate any insight anyone's got on this. we need
    to resolve it quickly. mike's description of the test case follows.

    thanks


    "Michael W. Balk" wrote:

    > You asked me yesterday to give you some info on the multi-threading problem
    > so that you could post a query on comp.lang.python.
    >
    > Here is what I know so far.
    >
    > The machine has two Xenon CPUs, both with hyper-threading enabled.
    > The linux kernel running is: 2.4.20-8smp
    >
    > The pure python test I ran did the following:
    >
    > Starts up 50 threads (using the threading module).
    > Each thread calculates 500 random numbers and writes them to a file,
    > repeating this 50 times to generate 50 output files. So there are 2500
    > files expected once all 50 threads complete.
    >
    > The observation is that only a few threads actually produce output files,
    > and none of those threads produce all 50 of the files they are to generate.
    >
    > The main thread does a join on each of the 50 threads sequentially, so that
    > after the last thread completes, then the program should exit. However, the
    > observation is that the main thread never exits, presumably since it has
    > joined to a child thread whose run() method never returns.
    >
    > Now if I change the test so that a single thread runs the test 50 times
    > sequentially, all expected files are produced and the program terminates normally.



    ----
    Garry Hodgson, Technology Consultant, AT&T Labs

    Be happy for this moment.
    This moment is your life.
     
    Garry Hodgson, Feb 10, 2004
    #1
    1. Advertising

  2. Garry Hodgson

    max khesin Guest

    GIL = Global interpreter lock. It shold cause trouble on SMP. As someone
    recently mentioned, Jython does not have it, so you have options.


    Garry Hodgson wrote:

    > a colleague of mine has seen an odd problem in some code of ours.
    > we initially noticed it on webware, but in distilling a test case it seems
    > to be strictly a python issue. in the real system, it manifests as
    > webware just locking up, for no apparent reason, until we kill it.
    > we've also had the python interpreter running webware die on occasion.
    > it works fine on our desktop linux and windows machines, but fails
    > on the production hardware. the main difference being that the
    > production hardware is a dual Xenon machine with
    > hyperthreading enabled.
    >
    > has anyone run into problems like this, or have any clues what
    > the problem might be? we pushed pretty hard on a skeptical
    > project manager to get python and webware accepted for this
    > project, and it's embarrassing to have it acting flaky.
    >
    > i'd really appreciate any insight anyone's got on this. we need
    > to resolve it quickly. mike's description of the test case follows.
    >
    > thanks
    >
    >
    > "Michael W. Balk" wrote:
    >
    >
    >>You asked me yesterday to give you some info on the multi-threading problem
    >>so that you could post a query on comp.lang.python.
    >>
    >>Here is what I know so far.
    >>
    >>The machine has two Xenon CPUs, both with hyper-threading enabled.
    >>The linux kernel running is: 2.4.20-8smp
    >>
    >>The pure python test I ran did the following:
    >>
    >>Starts up 50 threads (using the threading module).
    >>Each thread calculates 500 random numbers and writes them to a file,
    >>repeating this 50 times to generate 50 output files. So there are 2500
    >>files expected once all 50 threads complete.
    >>
    >>The observation is that only a few threads actually produce output files,
    >>and none of those threads produce all 50 of the files they are to generate.
    >>
    >>The main thread does a join on each of the 50 threads sequentially, so that
    >>after the last thread completes, then the program should exit. However, the
    >>observation is that the main thread never exits, presumably since it has
    >>joined to a child thread whose run() method never returns.
    >>
    >>Now if I change the test so that a single thread runs the test 50 times
    >>sequentially, all expected files are produced and the program terminates normally.

    >
    >
    >
    > ----
    > Garry Hodgson, Technology Consultant, AT&T Labs
    >
    > Be happy for this moment.
    > This moment is your life.
    >
     
    max khesin, Feb 10, 2004
    #2
    1. Advertising

  3. > GIL = Global interpreter lock. It shold cause trouble on SMP. As someone
    > recently mentioned, Jython does not have it, so you have options.


    Are you sure it is the GIL? I run quite a few threaded applications on
    my SMP machine, and have had no isses with the GIL killing it...well,
    except for that one wxPython thing. Maybe the GIL is the cause.

    - Josiah
     
    Josiah Carlson, Feb 10, 2004
    #3
  4. Garry Hodgson

    Aahz Guest

    In article <>,
    Garry Hodgson <> wrote:
    >
    >a colleague of mine has seen an odd problem in some code of ours. we
    >initially noticed it on webware, but in distilling a test case it seems
    >to be strictly a python issue. in the real system, it manifests as
    >webware just locking up, for no apparent reason, until we kill it.
    >we've also had the python interpreter running webware die on occasion.
    >it works fine on our desktop linux and windows machines, but fails on
    >the production hardware. the main difference being that the production
    >hardware is a dual Xenon machine with hyperthreading enabled.


    What version of Python? What happens if you enable both CPUs without
    hyperthreading? What happens if you only start ten threads (fifty
    threads isn't huge, but it's definitely large)? The last time I saw or
    heard of significant problems like this was in the Python 1.5.1 days, and
    Python 2.x fixed a few more bugs.

    Have you applied all relevant OS patches and/or tried upgrading to a
    newer kernel? Can you test the production system (or an equivalent) with
    Windows? What happens if you run an equivalent test using C code?
    --
    Aahz () <*> http://www.pythoncraft.com/

    "The joy of coding Python should be in seeing short, concise, readable
    classes that express a lot of action in a small amount of clear code --
    not in reams of trivial code that bores the reader to death." --GvR
     
    Aahz, Feb 11, 2004
    #4
  5. Re: Re: help! multi-threading problem on hyperthreading smp linuxserver

    (Aahz) wrote:

    > What version of Python?


    2.2.2, on all machines but the windows one, which runs 2.3.

    > What happens if you enable both CPUs without
    > hyperthreading?


    same thing. it fails on the one machine, works on the rest,
    including another (older) dual processor machine running
    linux smp kernel.

    > What happens if you only start ten threads (fifty
    > threads isn't huge, but it's definitely large)?


    same thing. works fine for 1 thread :).

    > Have you applied all relevant OS patches and/or tried upgrading to a
    > newer kernel?


    it's looking like this may be the problem, since the machine it fails on
    is running the oldest linux kernel of all of them. all are 2.4.20's, but
    with different minor release numbers.

    > Can you test the production system (or an equivalent) with
    > Windows?


    nope. we do have another machine that's identical hardware,
    though with slightly newer kernel patches. we intend to test
    on that tomorrow, during a maintenance window (that's a
    production machine).

    > What happens if you run an equivalent test using C code?


    C? yuk!

    ----
    Garry Hodgson, Technology Consultant, AT&T Labs

    Be happy for this moment.
    This moment is your life.
     
    Garry Hodgson, Feb 11, 2004
    #5
  6. Garry Hodgson

    Aahz Guest

    Re: Re: help! multi-threading problem on hyperthreading smp linuxserver

    In article <>,
    Garry Hodgson <> wrote:
    > (Aahz) wrote:
    >>
    >> What happens if you only start ten threads (fifty
    >> threads isn't huge, but it's definitely large)?

    >
    >same thing. works fine for 1 thread :).
    >
    >> Have you applied all relevant OS patches and/or tried upgrading to a
    >> newer kernel?

    >
    >it's looking like this may be the problem, since the machine it fails on
    >is running the oldest linux kernel of all of them. all are 2.4.20's, but
    >with different minor release numbers.


    Those two combined make me agree.

    >> Can you test the production system (or an equivalent) with
    >> Windows?

    >
    >nope. we do have another machine that's identical hardware,
    >though with slightly newer kernel patches. we intend to test
    >on that tomorrow, during a maintenance window (that's a
    >production machine).


    One thing I learned during my threading problems, five years ago:
    *always* have a QA machine with the same specs as the production machine.
    Among other things, I found a bug with MS SQL Server that only showed up
    on SMP machines. :-(
    --
    Aahz () <*> http://www.pythoncraft.com/

    "Argue for your limitations, and sure enough they're yours." --Richard Bach
     
    Aahz, Feb 14, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    1,876
    Scott Allen
    Dec 9, 2005
  2. Brad Grier

    Hyperthreading Problem

    Brad Grier, Jan 9, 2004, in forum: Java
    Replies:
    8
    Views:
    4,537
    Danny Woods
    Jan 11, 2004
  3. Gardner Pomper
    Replies:
    0
    Views:
    516
    Gardner Pomper
    Nov 12, 2003
  4. Replies:
    38
    Views:
    1,320
    Dennis Lee Bieber
    Feb 15, 2005
  5. none

    ithreads & linux SMP

    none, Feb 18, 2006, in forum: Perl Misc
    Replies:
    3
    Views:
    176
Loading...

Share This Page