The reliability of python threads

  • Thread starter Carl J. Van Arsdall
  • Start date
C

Carl J. Van Arsdall

Hey everyone, I have a question about python threads. Before anyone
goes further, this is not a debate about threads vs. processes, just a
question.

With that, are python threads reliable? Or rather, are they safe? I've
had some strange errors in the past, I use threading.lock for my
critical sections, but I wonder if that is really good enough.

Does anyone have any conclusive evidence that python threads/locks are
safe or unsafe?

Thanks,

Carl

--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
N

Nick Maclaren

|> Hey everyone, I have a question about python threads. Before anyone
|> goes further, this is not a debate about threads vs. processes, just a
|> question.
|>
|> With that, are python threads reliable? Or rather, are they safe? I've
|> had some strange errors in the past, I use threading.lock for my
|> critical sections, but I wonder if that is really good enough.
|>
|> Does anyone have any conclusive evidence that python threads/locks are
|> safe or unsafe?

Unsafe. They are built on top of unsafe primitives (POSIX, Microsoft
etc.) Python will shield you from some problems, but not all.

There is precious little that you can do, because the root cause is
that the standards and specifications are hopelessly flawed.


Regards,
Nick Maclaren.
 
C

Chris Mellon

|> Hey everyone, I have a question about python threads. Before anyone
|> goes further, this is not a debate about threads vs. processes, just a
|> question.
|>
|> With that, are python threads reliable? Or rather, are they safe? I've
|> had some strange errors in the past, I use threading.lock for my
|> critical sections, but I wonder if that is really good enough.
|>
|> Does anyone have any conclusive evidence that python threads/locks are
|> safe or unsafe?

Unsafe. They are built on top of unsafe primitives (POSIX, Microsoft
etc.) Python will shield you from some problems, but not all.

There is precious little that you can do, because the root cause is
that the standards and specifications are hopelessly flawed.

This is sufficiently inaccurate that I would call it FUD. Using
threads from Python, as from any other language, requires knowledge of
the tradeoffs and limitations of threading, but claiming that their
usage is *inherently* unsafe isn't true. It is almost certain that
your code and locking are flawed, not that the threads underneath you
are buggy.
 
N

Nick Maclaren

|> > |>
|> > |> Does anyone have any conclusive evidence that python threads/locks are
|> > |> safe or unsafe?
|> >
|> > Unsafe. They are built on top of unsafe primitives (POSIX, Microsoft
|> > etc.) Python will shield you from some problems, but not all.
|> >
|> > There is precious little that you can do, because the root cause is
|> > that the standards and specifications are hopelessly flawed.
|>
|> This is sufficiently inaccurate that I would call it FUD. Using
|> threads from Python, as from any other language, requires knowledge of
|> the tradeoffs and limitations of threading, but claiming that their
|> usage is *inherently* unsafe isn't true. It is almost certain that
|> your code and locking are flawed, not that the threads underneath you
|> are buggy.

I suggest that you find out rather more about the ill-definition of
POSIX threading memory model, to name one of the better documented
aspects. A Web search should provide you with more information on
the ghastly mess than any sane person wants to know.

And that is only one of many aspects :-(


Regards,
Nick Maclaren.
 
C

Chris Mellon

|> > |>
|> > |> Does anyone have any conclusive evidence that python threads/locks are
|> > |> safe or unsafe?
|> >
|> > Unsafe. They are built on top of unsafe primitives (POSIX, Microsoft
|> > etc.) Python will shield you from some problems, but not all.
|> >
|> > There is precious little that you can do, because the root cause is
|> > that the standards and specifications are hopelessly flawed.
|>
|> This is sufficiently inaccurate that I would call it FUD. Using
|> threads from Python, as from any other language, requires knowledge of
|> the tradeoffs and limitations of threading, but claiming that their
|> usage is *inherently* unsafe isn't true. It is almost certain that
|> your code and locking are flawed, not that the threads underneath you
|> are buggy.

I suggest that you find out rather more about the ill-definition of
POSIX threading memory model, to name one of the better documented
aspects. A Web search should provide you with more information on
the ghastly mess than any sane person wants to know.

And that is only one of many aspects :-(

I'm aware of the issues with the POSIX threading model. I still stand
by my statement - bringing up the problems with the provability of
correctness in the POSIX model amounts to FUD in a discussion of
actual problems with actual code.

Logic and programming errors in user code are far more likely to be
the cause of random errors in a threaded program than theoretical
(I've never come across a case in practice) issues with the POSIX
standard.

Emphasizing this means that people will tend to ignore bugs as being
"the fault of POSIX" rather than either auditing their code more
carefully, or avoiding threads entirely (the second being what I
suspect your goal is).

As a last case, I should point out that while the POSIX memory model
can't be proven safe, concrete implementations do not necessarily
suffer from this problem.
 
C

Carl J. Van Arsdall

Chris said:

I'm aware of the issues with the POSIX threading model. I still stand
by my statement - bringing up the problems with the provability of
correctness in the POSIX model amounts to FUD in a discussion of
actual problems with actual code.

Logic and programming errors in user code are far more likely to be
the cause of random errors in a threaded program than theoretical
(I've never come across a case in practice) issues with the POSIX
standard.
Yea, typically I would think that. The problem I am seeing is
incredibly intermittent. Like a simple pyro server that gives me a
problem maybe every three or four months. Just something funky will
happen to the state of the whole thing, some bad data, i'm having an
issue tracking it down and some more experienced programmers mentioned
that its most likely a race condition. THe thing is, I'm really not
doing anything too crazy, so i'm having difficult tracking it down. I
had heard in the past that there may be issues with threads, so I
thought to investigate this side of things.

It still proves difficult, but reassurance of the threading model helps
me focus my efforts.
Emphasizing this means that people will tend to ignore bugs as being
"the fault of POSIX" rather than either auditing their code more
carefully, or avoiding threads entirely (the second being what I
suspect your goal is).

As a last case, I should point out that while the POSIX memory model
can't be proven safe, concrete implementations do not necessarily
suffer from this problem.
Would you consider the Linux implementation of threads to be concrete?

-carl

--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
N

Nick Maclaren

|> >
|> > Logic and programming errors in user code are far more likely to be
|> > the cause of random errors in a threaded program than theoretical
|> > (I've never come across a case in practice) issues with the POSIX
|> > standard.
|> >
|> Yea, typically I would think that. The problem I am seeing is
|> incredibly intermittent. Like a simple pyro server that gives me a
|> problem maybe every three or four months. Just something funky will
|> happen to the state of the whole thing, some bad data, i'm having an
|> issue tracking it down and some more experienced programmers mentioned
|> that its most likely a race condition. THe thing is, I'm really not
|> doing anything too crazy, so i'm having difficult tracking it down. I
|> had heard in the past that there may be issues with threads, so I
|> thought to investigate this side of things.

I have seen that many dozens of times on half a dozen Unices, but have
only tracked down the cause in a handful of cases. Of those,
implementation defects that are sanctioned by the standards have
accounted for about half.

Note that the term "race condition" is accurate but misleading! One
of the worst problems with POSIX is that it does not define how
non-memory global state is synchronised. For example, it is possible
for a memory update and an associated signal to occur on different
sides of a synchronisation boundary. Similarly, it is possible for
I/O to sidestep POSIX's synchronisation boundaries. I have seen both.

Perhaps the nastiest is that POSIX leaves it unclear whether the
action of synchronisation is transitive. So, if A synchronises with
B, and then B with C, A may not have synchronised with C. Again, I
have seen that. It can happen on Intel systems, according to the
experts I know.

|> Would you consider the Linux implementation of threads to be concrete?

In this sort of area, Linux tends to be saner than most systems, but
remember that it has had MUCH less stress testing on threaded codes
than many other Unices. In fact, it was only a few years ago that
Linux threads became stable enough to be worth using.

Note that failures due to implementation defects and flaws in the
standards are likely to show up in very obscure ways; ones due to
programmer error tend to be much simpler.

If you want to contact me by Email, and can describe technically
what you are doing and (most importantly) what you are assuming, I
may be able to give some hints. But no promises.


Regards,
Nick Maclaren.
 
A

Aahz

Hey everyone, I have a question about python threads. Before anyone
goes further, this is not a debate about threads vs. processes, just a
question.

With that, are python threads reliable? Or rather, are they safe? I've
had some strange errors in the past, I use threading.lock for my
critical sections, but I wonder if that is really good enough.

Does anyone have any conclusive evidence that python threads/locks are
safe or unsafe?

My response is that you're asking the wrong questions here. Our database
server locked up hard Sunday morning, and we still have no idea why (the
machine itself, not just the database app). I think it's more important
to focus on whether you have done all that is reasonable to make your
application reliable -- and then put your efforts into making your app
recoverable.

I'm particularly making this comment in the context of your later point
about the bug showing up only every three or four months.

Side note: without knowing what error messages you're getting, there's
not much anybody can say about your programs or the reliability of
threads for your application.
 
N

Nick Maclaren

|>
|> My response is that you're asking the wrong questions here. Our database
|> server locked up hard Sunday morning, and we still have no idea why (the
|> machine itself, not just the database app). I think it's more important
|> to focus on whether you have done all that is reasonable to make your
|> application reliable -- and then put your efforts into making your app
|> recoverable.

Absolutely! Shit happens. In a well-designed world, that would not be
the case, but we don't live in one. Until you have identified the cause,
you can't tell if threading has anything to do with the failure - given
what we know, it seems likely, but what Aahz says is how to tackle the
problem WHATEVER the cause.


Regards,
Nick Maclaren.
 
P

Paddy

Chris said:

I'm aware of the issues with the POSIX threading model. I still stand
by my statement - bringing up the problems with the provability of
correctness in the POSIX model amounts to FUD in a discussion of
actual problems with actual code.
Logic and programming errors in user code are far more likely to be
the cause of random errors in a threaded program than theoretical
(I've never come across a case in practice) issues with the POSIX
standard.
Yea, typically I would think that. The problem I am seeing is
incredibly intermittent. Like a simple pyro server that gives me a
problem maybe every three or four months. Just something funky will
happen to the state of the whole thing, some bad data, i'm having an
issue tracking it down and some more experienced programmers mentioned
that its most likely a race condition. THe thing is, I'm really not
doing anything too crazy, so i'm having difficult tracking it down. I
had heard in the past that there may be issues with threads, so I
thought to investigate this side of things.

It still proves difficult, but reassurance of the threading model helps
me focus my efforts.

Three to four months before `strange errors`? I'd spend some time
correlating logs; not just for your program, but for everything running

on the server. Then I'd expect to cut my losses and arrange to safely
re-start the program every TWO months.
(I'd arrange the re-start after collecting logs but before their
analysis.
Life is too short).

- Paddy.
 
K

Klaas

Chris said:

I'm aware of the issues with the POSIX threading model. I still stand
by my statement - bringing up the problems with the provability of
correctness in the POSIX model amounts to FUD in a discussion of
actual problems with actual code.
Logic and programming errors in user code are far more likely to be
the cause of random errors in a threaded program than theoretical
(I've never come across a case in practice) issues with the POSIX
standard.Yea, typically I would think that. The problem I am seeing is
incredibly intermittent. Like a simple pyro server that gives me a
problem maybe every three or four months. Just something funky will
happen to the state of the whole thing, some bad data, i'm having an
issue tracking it down and some more experienced programmers mentioned
that its most likely a race condition. THe thing is, I'm really not
doing anything too crazy, so i'm having difficult tracking it down. I
had heard in the past that there may be issues with threads, so I
thought to investigate this side of things.

It still proves difficult, but reassurance of the threading model helps
me focus my efforts.
Emphasizing this means that people will tend to ignore bugs as being
"the fault of POSIX" rather than either auditing their code more
carefully, or avoiding threads entirely (the second being what I
suspect your goal is).
As a last case, I should point out that while the POSIX memory model
can't be proven safe, concrete implementations do not necessarily
suffer from this problem.Would you consider the Linux implementation of threads to be concrete?

-carl

--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
K

Klaas

Chris said:

I'm aware of the issues with the POSIX threading model. I still stand
by my statement - bringing up the problems with the provability of
correctness in the POSIX model amounts to FUD in a discussion of
actual problems with actual code.
Logic and programming errors in user code are far more likely to be
the cause of random errors in a threaded program than theoretical
(I've never come across a case in practice) issues with the POSIX
standard.Yea, typically I would think that. The problem I am seeing is
incredibly intermittent. Like a simple pyro server that gives me a
problem maybe every three or four months. Just something funky will
happen to the state of the whole thing, some bad data, i'm having an
issue tracking it down and some more experienced programmers mentioned
that its most likely a race condition. THe thing is, I'm really not
doing anything too crazy, so i'm having difficult tracking it down. I
had heard in the past that there may be issues with threads, so I
thought to investigate this side of things.

It still proves difficult, but reassurance of the threading model helps
me focus my efforts.
Emphasizing this means that people will tend to ignore bugs as being
"the fault of POSIX" rather than either auditing their code more
carefully, or avoiding threads entirely (the second being what I
suspect your goal is).
As a last case, I should point out that while the POSIX memory model
can't be proven safe, concrete implementations do not necessarily
suffer from this problem.Would you consider the Linux implementation of threads to be concrete?

-carl

--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
K

Klaas

Yea, typically I would think that. The problem I am seeing is
incredibly intermittent. Like a simple pyro server that gives me a
problem maybe every three or four months. Just something funky will
happen to the state of the whole thing, some bad data, i'm having an
issue tracking it down and some more experienced programmers mentioned
that its most likely a race condition. THe thing is, I'm really not
doing anything too crazy, so i'm having difficult tracking it down. I
had heard in the past that there may be issues with threads, so I
thought to investigate this side of things.

POSIX issues aside, Python's threading model should be less susceptible
to memory-barrier problems that are possible in other languages (this
is due to the GIL). Double-checked locking, frinstance, is safe in
python even though it isn't in java.

Are you ever relying solely on the GIL to access shared data?

-Mike
 
P

Paul Rubin

Klaas said:
POSIX issues aside, Python's threading model should be less susceptible
to memory-barrier problems that are possible in other languages (this
is due to the GIL).

But the GIL is not part of Python's threading model; it's just a
particular implementation artifact. Programs that rely on it are
asking for trouble.
Double-checked locking, frinstance, is safe in python even though it
isn't in java.

What's that?
Are you ever relying solely on the GIL to access shared data?

I think a lot of programs do that, which is probably unwise in the
long run.
 
J

John Nagle

Carl said:
Chris said:


I'm aware of the issues with the POSIX threading model. I still stand
by my statement - bringing up the problems with the provability of
correctness in the POSIX model amounts to FUD in a discussion of
actual problems with actual code.

Logic and programming errors in user code are far more likely to be
the cause of random errors in a threaded program than theoretical
(I've never come across a case in practice) issues with the POSIX
standard.

Yea, typically I would think that. The problem I am seeing is
incredibly intermittent. Like a simple pyro server that gives me a
problem maybe every three or four months. Just something funky will
happen to the state of the whole thing, some bad data, i'm having an
issue tracking it down and some more experienced programmers mentioned
that its most likely a race condition.

Right. You're at MonteVista, which does real-time Linux systems
and support. There will be people there who thoroughly understand
thread issues. (I've used QNX for real time, but MonteVista has
made progress since in recent years.)

The Python thread documentation is kind of vague about how
well the Python primitives are protected against concurrency problems.
For example, do you have to protect basic types like lists
and hashes against concurrent access? Is "pop" atomic?
(It is in "dequeue", but what about regular lists?)
Can you crash Python from within Python via concurrency errors?
Does the garbage collector run concurrently or does it freeze
all threads? What's different depending upon whether you're using
real OS threads or simulated Python threads?

John Nagle
 
K

Klaas

But the GIL is not part of Python's threading model; it's just a
particular implementation artifact. Programs that rely on it are
asking for trouble.

CPython is more that "a particular implementation" of python, and the
GIL is more than an "artifact". It is a central tenet of threaded
python programming.

I don't advocate relying on the GIL to manage shared data when
threading, but 1) it is useful for the reasons I mention 2) the OP's
question was almost certainly about an application written for and run
on CPython.
What's that?

google.com

-Mike
 
P

Paul Rubin

Klaas said:
CPython is more that "a particular implementation" of python,

It's precisely a particular implementation of Python. Other
implementations include Jython, PyPy, and IronPython.
and the GIL is more than an "artifact". It is a central tenet of
threaded python programming.

If it's a central tenet of threaded python programming, why is it not
mentioned at all in the language or library manual? The threading
module documentation describes the right way to handle thread
synchronization in Python, and that module implements traditional
locking approaches without reference to the GIL.
I don't advocate relying on the GIL to manage shared data when
threading, but 1) it is useful for the reasons I mention 2) the OP's
question was almost certainly about an application written for and run
on CPython.

Possibly true.
 
D

Damjan

and the GIL is more than an "artifact". It is a central tenet of
If it's a central tenet of threaded python programming, why is it not
mentioned at all in the language or library manual? The threading
module documentation describes the right way to handle thread
synchronization in Python, and that module implements traditional
locking approaches without reference to the GIL.

And we all hope the GIL will one day die it's natural death ...
maybe... probably.. hopefully ;)
 
K

Klaas

It's precisely a particular implementation of Python. Other
implementations include Jython, PyPy, and IronPython.

I did not deny that it is an implementation of Python. I deny that it
is but an implementation of Python.

Jython: several versions behind, used primariy for interfacing with
java
PyPy: years away from being a practical platform for replacing CPython
IronPython: best example you've given, but still probably three or four
orders of magnitude less significant that CPython
If it's a central tenet of threaded python programming, why is it not
mentioned at all in the language or library manual?

The same reason why IE CSS quirks are not delineated in the HTML 4.01
spec. This doesn't mean that they aren't central to css web
programming (they are).

How could the GIL, which limits the number of threads in which python
code can be run in a single process to one, NOT be a central part of
threaded python programming?
The threading
module documentation describes the right way to handle thread
synchronization in Python, and that module implements traditional
locking approaches without reference to the GIL.

No-one has argued that the GIL should be used instead of
threading-based locking. How could they? The two concepts are not
interchangeable and while they affect each other, are two different
things entirely. In the post you responded to and quoted I said:

-Mike
 
N

Nick Maclaren

|>
|> Three to four months before `strange errors`? I'd spend some time
|> correlating logs; not just for your program, but for everything running
|> on the server. Then I'd expect to cut my losses and arrange to safely
|> re-start the program every TWO months.
|> (I'd arrange the re-start after collecting logs but before their
|> analysis. Life is too short).

Forget it. That strategy is fine in general, but is a waste of time
where threading issues are involved (or signal handling, or some types
of communication problem, for that matter). There are three unrelated
killer facts that interact:

Such failures are usually probabilistic ("Poisson process"), and
so have no "history".

The expected number is usually proportional to the square of the
activity, sometimes a higher power.

Virtually nothing involved does any routine logging, or even has
options to log relevant events.

The first means that the strategy of restarting doesn't help. All
three mean that current logs are almost never any use.


Regards,
Nick Maclaren.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top