help! multi-threading problem on hyperthreading smp linux server

G

Garry Hodgson

a colleague of mine has seen an odd problem in some code of ours.
we initially noticed it on webware, but in distilling a test case it seems
to be strictly a python issue. in the real system, it manifests as
webware just locking up, for no apparent reason, until we kill it.
we've also had the python interpreter running webware die on occasion.
it works fine on our desktop linux and windows machines, but fails
on the production hardware. the main difference being that the
production hardware is a dual Xenon machine with
hyperthreading enabled.

has anyone run into problems like this, or have any clues what
the problem might be? we pushed pretty hard on a skeptical
project manager to get python and webware accepted for this
project, and it's embarrassing to have it acting flaky.

i'd really appreciate any insight anyone's got on this. we need
to resolve it quickly. mike's description of the test case follows.

thanks
 
M

max khesin

GIL = Global interpreter lock. It shold cause trouble on SMP. As someone
recently mentioned, Jython does not have it, so you have options.
 
J

Josiah Carlson

GIL = Global interpreter lock. It shold cause trouble on SMP. As someone
recently mentioned, Jython does not have it, so you have options.

Are you sure it is the GIL? I run quite a few threaded applications on
my SMP machine, and have had no isses with the GIL killing it...well,
except for that one wxPython thing. Maybe the GIL is the cause.

- Josiah
 
A

Aahz

a colleague of mine has seen an odd problem in some code of ours. we
initially noticed it on webware, but in distilling a test case it seems
to be strictly a python issue. in the real system, it manifests as
webware just locking up, for no apparent reason, until we kill it.
we've also had the python interpreter running webware die on occasion.
it works fine on our desktop linux and windows machines, but fails on
the production hardware. the main difference being that the production
hardware is a dual Xenon machine with hyperthreading enabled.

What version of Python? What happens if you enable both CPUs without
hyperthreading? What happens if you only start ten threads (fifty
threads isn't huge, but it's definitely large)? The last time I saw or
heard of significant problems like this was in the Python 1.5.1 days, and
Python 2.x fixed a few more bugs.

Have you applied all relevant OS patches and/or tried upgrading to a
newer kernel? Can you test the production system (or an equivalent) with
Windows? What happens if you run an equivalent test using C code?
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
G

Garry Hodgson

What version of Python?

2.2.2, on all machines but the windows one, which runs 2.3.
What happens if you enable both CPUs without
hyperthreading?

same thing. it fails on the one machine, works on the rest,
including another (older) dual processor machine running
linux smp kernel.
What happens if you only start ten threads (fifty
threads isn't huge, but it's definitely large)?

same thing. works fine for 1 thread :).
Have you applied all relevant OS patches and/or tried upgrading to a
newer kernel?

it's looking like this may be the problem, since the machine it fails on
is running the oldest linux kernel of all of them. all are 2.4.20's, but
with different minor release numbers.
Can you test the production system (or an equivalent) with
Windows?

nope. we do have another machine that's identical hardware,
though with slightly newer kernel patches. we intend to test
on that tomorrow, during a maintenance window (that's a
production machine).
What happens if you run an equivalent test using C code?

C? yuk!
 
A

Aahz

same thing. works fine for 1 thread :).


it's looking like this may be the problem, since the machine it fails on
is running the oldest linux kernel of all of them. all are 2.4.20's, but
with different minor release numbers.

Those two combined make me agree.
nope. we do have another machine that's identical hardware,
though with slightly newer kernel patches. we intend to test
on that tomorrow, during a maintenance window (that's a
production machine).

One thing I learned during my threading problems, five years ago:
*always* have a QA machine with the same specs as the production machine.
Among other things, I found a bug with MS SQL Server that only showed up
on SMP machines. :-(
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top