Python deadlock using subprocess.popen and communicate

A

Atherun

This is on windows with python 2.6.
I can't seem to remove a possibility of a deadlock in one of my
scripts at the moment. Its not a constant deadlock but it appears
from time to time. The code is below:

try:

process =
subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)

NotDone = True
data = []
while NotDone:
NotDone = False

result = process.poll()
if result == None:
NotDone = True
if NotDone:
out, err = process.communicate()
Log(out)

I was able to get the program to write out the stack trace for my
threads and it is usually deadlocked on sys.stdout.read or
_internal_poll of subproceess.

I've even tried the above, with using "with
tempfile.NamedTemporaryFiles() as buff:" and writing to the file, but
it still deadlocks. In the past I work around this by running fewer
processes asynchronously but I would really like to get this solved so
I don't have to wait to see if it'll be caused with any changes I
make.

Any tips would be appreciated.
 
R

Roy Smith

Atherun said:
This is on windows with python 2.6.
I can't seem to remove a possibility of a deadlock in one of my
scripts at the moment. Its not a constant deadlock but it appears
from time to time. The code is below:

try:

process =
subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)

NotDone = True
data = []
while NotDone:
NotDone = False

result = process.poll()
if result == None:
NotDone = True
if NotDone:
out, err = process.communicate()
Log(out)

I was able to get the program to write out the stack trace for my
threads and it is usually deadlocked on sys.stdout.read or
_internal_poll of subproceess.

I've even tried the above, with using "with
tempfile.NamedTemporaryFiles() as buff:" and writing to the file, but
it still deadlocks. In the past I work around this by running fewer
processes asynchronously but I would really like to get this solved so
I don't have to wait to see if it'll be caused with any changes I
make.

Any tips would be appreciated.

My reading of the docs
(http://docs.python.org/release/2.6.7/library/subprocess.html#popen-objec
ts) says that Popen.poll() doesn't return a value, it sets the object's
return code attribute, which you can then interrogate.

More than that, your loop logic looks amazingly complicated for what's
basically a simple thing. I suggest something along the lines of:

# assuming process.returncode is initially None
while process.returncode is None:
out, err = process.communicate()
Log(out)

I haven't tested that, but I think (from reading the docs) that's the
right idea.
 
C

Chris Rebert

This is on windows with python 2.6.
I can't seem to remove a possibility of a  deadlock in one of my
scripts at the moment.  Its not a constant deadlock but it appears
from time to time.  The code is below:

try:

       process =
subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)

       NotDone = True
       data = []
       while NotDone:
           NotDone = False

           result = process.poll()
           if result == None:
               NotDone = True
           if NotDone:
               out, err = process.communicate()
               Log(out)

I was able to get the program to write out the stack trace for my
threads and it is usually deadlocked on sys.stdout.read or
_internal_poll of subproceess.

I've even tried the above, with using "with
tempfile.NamedTemporaryFiles() as buff:" and writing to the file, but
it still deadlocks.  In the past I work around this by running fewer
processes asynchronously but I would really like to get this solved so
I don't have to wait to see if it'll be caused with any changes I
make.

Any tips would be appreciated.

Your polling loop is completely pointless. The code can be simplified to:

process = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
out, err = process.communicate()
log(out)

Depending on the particular program you're running in a subprocess,
you may need to take heed of the Note in the communicate() docs:
"Note: The data read is buffered in memory, so *do not use this
method* if the data size is large or unlimited." [Emphasis added]

Cheers,
Chris
 
A

Atherun

This is on windows with python 2.6.
I can't seem to remove a possibility of a  deadlock in one of my
scripts at the moment.  Its not a constant deadlock but it appears
from time to time.  The code is below:

       process =
subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
       NotDone = True
       data = []
       while NotDone:
           NotDone = False
           result = process.poll()
           if result == None:
               NotDone = True
           if NotDone:
               out, err = process.communicate()
               Log(out)
I was able to get the program to write out the stack trace for my
threads and it is usually deadlocked on sys.stdout.read or
_internal_poll of subproceess.
I've even tried the above, with using "with
tempfile.NamedTemporaryFiles() as buff:" and writing to the file, but
it still deadlocks.  In the past I work around this by running fewer
processes asynchronously but I would really like to get this solved so
I don't have to wait to see if it'll be caused with any changes I
make.
Any tips would be appreciated.

Your polling loop is completely pointless. The code can be simplified to:

process = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
out, err = process.communicate()
log(out)

Depending on the particular program you're running in a subprocess,
you may need to take heed of the Note in the communicate() docs:
"Note: The data read is buffered in memory, so *do not use this
method* if the data size is large or unlimited." [Emphasis added]

Cheers,
Chris
--http://rebertia.com

I'm pretty sure thats the problem, this is a generic catch all
function for running subprocesses. It can be anything to a simple
command to a complex command with a ton of output. I'm looking for a
better solution to handle the case of running subprocesses that have
an undetermined amount of output. (i.e. p4 edit's or tools that have
an excessive amount of logging)

I'll definitely simplify the loop, thanks for the tip on that part.
 
C

Chris Rebert

My reading of the docs
(http://docs.python.org/release/2.6.7/library/subprocess.html#popen-objec
ts) says that Popen.poll() doesn't return a value, it sets the object's
return code attribute, which you can then interrogate.

Popen.poll():
Check if child process has terminated. Set **and return**
returncode attribute.
[Direct quote from the docs; emphasis added]
More than that, your loop logic looks amazingly complicated for what's
basically a simple thing.  I suggest something along the lines of:

  # assuming process.returncode is initially None
  while process.returncode is None:
     out, err = process.communicate()
     Log(out)

I haven't tested that, but I think (from reading the docs) that's the
right idea.

There is no point in communicate()-ing multiple times; communicate()
reads until EOF and only returns after subprocess termination!

Cheers,
Chris
 
R

Roy Smith

Chris Rebert said:
Popen.poll():
Check if child process has terminated. Set **and return**
returncode attribute.
[Direct quote from the docs; emphasis added]

Doh. I read right past that and didn't see it. Thanks for the
correction.
 
A

Atherun

 Chris Rebert said:
Popen.poll():
    Check if child process has terminated. Set **and return**
returncode attribute.
[Direct quote from the docs; emphasis added]

Doh.  I read right past that and didn't see it.  Thanks for the
correction.

I removed the loop and changed it to a single communicate now, still
get a deadlock. All threads just stop, even threads that have nothing
to do with the ongoing subprocesses. I specifically created a thread
that dumps the stack trace of all threads so I can try and view the
deadlocks, but that stops responding at the same time as the
deadlock. This wasn't happening before so not sure why it changed
suddenly.
 
T

Thomas Rachel

Am 22.09.2011 05:42 schrieb Atherun:
I'm pretty sure thats the problem, this is a generic catch all
function for running subprocesses. It can be anything to a simple
command to a complex command with a ton of output. I'm looking for a
better solution to handle the case of running subprocesses that have
an undetermined amount of output.

Just handle process.stdout/stderr by yourself - read it out until EOF
and then wait() for the process.


Thomas
 
A

Atherun

Am 22.09.2011 05:42 schrieb Atherun:


Just handle process.stdout/stderr by yourself - read it out until EOF
and then wait() for the process.

Thomas

Thats what confuses me though, the documentation says
process.stdout.read()/stderr.read() can deadlock and apparently so can
communicate, how do you read the stdout/stderr on yourself if its
documented using them can cause a deadlock?
 
N

Nobody

Thats what confuses me though, the documentation says
process.stdout.read()/stderr.read() can deadlock and apparently so can
communicate, how do you read the stdout/stderr on yourself if its
documented using them can cause a deadlock?

If you try to read/write two or more of stdin/stdout/stderr via the
"naive" approach, you run the risk of the child process writing more than
a pipe's worth of data to one stream (and thus blocking) while the
parent is performing a blocking read/write on another stream, resulting in
deadlock.

The .communicate() method avoids the deadlock by either:

1. On Unix, using non-blocking I/O and select(), or
2. On Windows, creating a separate thread for each stream.

Either way, the result is that it can always read/write whichever
streams are ready, so the child will never block indefinitely while
waiting for the parent.

If .communicate() is blocking indefinitely, it suggests that the child
process never terminates. There are many reasons why this might happen,
and most of them depend upon exactly what the child process is doing.

I suggest obtaining a copy of Process Explorer, and using it to
investigate the state of both processes (but especially the child) at the
point that the "deadlock" seems to occur.
 
A

Atherun

If you try to read/write two or more of stdin/stdout/stderr via the
"naive" approach, you run the risk of the child process writing more than
a pipe's worth of data to one stream (and thus blocking) while the
parent is performing a blocking read/write on another stream, resulting in
deadlock.

The .communicate() method avoids the deadlock by either:

1. On Unix, using non-blocking I/O and select(), or
2. On Windows, creating a separate thread for each stream.

Either way, the result is that it can always read/write whichever
streams are ready, so the child will never block indefinitely while
waiting for the parent.

If .communicate() is blocking indefinitely, it suggests that the child
process never terminates. There are many reasons why this might happen,
and most of them depend upon exactly what the child process is doing.

I suggest obtaining a copy of Process Explorer, and using it to
investigate the state of both processes (but especially the child) at the
point that the "deadlock" seems to occur.

In the one case I can easily reproduce, its in a p4.exe call that I'm
making both python and p4.exe have nearly the same stack for their
threads:
python:

ntoskrnl.exe!memset+0x64a
ntoskrnl.exe!KeWaitForMultipleObjects+0xd52
ntoskrnl.exe!KeWaitForMutexObject+0x19f
ntoskrnl.exe!__misaligned_access+0xba4
ntoskrnl.exe!__misaligned_access+0x1821
ntoskrnl.exe!KeWaitForMultipleObjects+0xf5d
ntoskrnl.exe!KeWaitForMutexObject+0x19f
ntoskrnl.exe!NtWaitForSingleObject+0xde
ntoskrnl.exe!KeSynchronizeExecution+0x3a43
wow64cpu.dll!TurboDispatchJumpAddressEnd+0x6c0
wow64cpu.dll!TurboDispatchJumpAddressEnd+0x4a8
wow64.dll!Wow64SystemServiceEx+0x1ce
wow64.dll!Wow64LdrpInitialize+0x429
ntdll.dll!RtlUniform+0x6e6
ntdll.dll!RtlCreateTagHeap+0xa7
ntdll.dll!LdrInitializeThunk+0xe
ntdll.dll!ZwWaitForSingleObject+0x15
kernel32.dll!WaitForSingleObjectEx+0x43
kernel32.dll!WaitForSingleObject+0x12
python26.dll!_Py_svnversion+0xcf8


p4:

ntoskrnl.exe!memset+0x64a
ntoskrnl.exe!KeWaitForMultipleObjects+0xd52
ntoskrnl.exe!KeWaitForSingleObject+0x19f
ntoskrnl.exe!_misaligned_access+0xba4
ntoskrnl.exe!_misaligned_access+0x1821
ntoskrnl.exe!KeWaitForMultipleObjects+0xf5d
ntoskrnl.exe!KeWaitForSingleObject+0x19f
ntoskrnl.exe!NtCreateFile+0x4c9
ntoskrnl.exe!NtWriteFile+0x7e3
ntoskrnl.exe!KeSynchronizeExecution+0x3a43
ntdll.dll!ZwWriteFile+0xa
KERNELBASE.dll!WriteFile+0x7b
kernel32.dll!WriteFile+0x36
p4.exe+0x42d4b
p4.exe+0x42ed8


To me it looks like they're both waiting on each other.
 
N

Nobody

In the one case I can easily reproduce, its in a p4.exe call that I'm
making both python and p4.exe have nearly the same stack for their
threads:
python:
kernel32.dll!WaitForSingleObject+0x12
python26.dll!_Py_svnversion+0xcf8

I haven't a clue how this happens. _Py_svnversion just returns a string:

_Py_svnversion(void)
{
/* the following string can be modified by subwcrev.exe */
static const char svnversion[] = SVNVERSION;
if (svnversion[0] != '$')
return svnversion; /* it was interpolated, or passed on command line */
return "Unversioned directory";
}

It doesn't even
 
A

Atherun

You intrigue me, sir. Does it odd?

What is the remainder of this aborted sentence?

ChrisA

That is odd, I also find it odd that it deadlocks the entire python
system, even threads that have nothing to do with the subprocess stop
working, the entire thing just stops. I may not have pasted the full
stack trace looking at it again, I'll double check in a few.

I was able to repro this on a tool I have the source for, when I
attach to it to debug visual studio tells me the threads are
deadlocked and the only stack trace I have available to me is a low
level os call to a WriteFile function.
 
A

Atherun

That is odd, I also find it odd that it deadlocks the entire python
system, even threads that have nothing to do with the subprocess stop
working, the entire thing just stops. I may not have pasted the full
stack trace looking at it again, I'll double check in a few.

I was able to repro this on a tool I have the source for, when I
attach to it to debug visual studio tells me the threads are
deadlocked and the only stack trace I have available to me is a low
level os call to a WriteFile function.

Ya heres the full python stack:


ntoskrnl.exe!memset+0x64a
ntoskrnl.exe!KeWaitForMultipleObjects+0xd52
ntoskrnl.exe!KeWaitForMutexObject+0x19f
ntoskrnl.exe!__misaligned_access+0xba4
ntoskrnl.exe!__misaligned_access+0x1821
ntoskrnl.exe!KeWaitForMultipleObjects+0xf5d
ntoskrnl.exe!KeWaitForMutexObject+0x19f
ntoskrnl.exe!NtWaitForSingleObject+0xde
ntoskrnl.exe!KeSynchronizeExecution+0x3a43
wow64cpu.dll!TurboDispatchJumpAddressEnd+0x6c0
wow64cpu.dll!TurboDispatchJumpAddressEnd+0x4a8
wow64.dll!Wow64SystemServiceEx+0x1ce
wow64.dll!Wow64LdrpInitialize+0x429
ntdll.dll!RtlUniform+0x6e6
ntdll.dll!RtlCreateTagHeap+0xa7
ntdll.dll!LdrInitializeThunk+0xe
ntdll.dll!ZwWaitForSingleObject+0x15
kernel32.dll!WaitForSingleObjectEx+0x43
kernel32.dll!WaitForSingleObject+0x12
python26.dll!_Py_svnversion+0xcf8
python26.dll!PyObject_AsReadBuffer+0x46d
python26.dll!PyEval_EvalCodeEx+0x738
python26.dll!PyEval_EvalFrameEx+0x467
python26.dll!PyObject_Realloc+0x90
python26.dll!PyEval_EvalCodeEx+0x8ef
python26.dll!PyEval_EvalFrameEx+0x467
 
N

Nobody

I haven't a clue how this happens. _Py_svnversion just returns a string:

In retrospect, I think that's a red herring. 0xcf8 seems like too large an
offset for such a small function. I think that it's more likely to be in a
non-exported function, and _Py_svnversion just happens to be the last
exported symbol prior to that point in the code.
 
A

Atherun

In retrospect, I think that's a red herring. 0xcf8 seems like too large an
offset for such a small function. I think that it's more likely to be in a
non-exported function, and _Py_svnversion just happens to be the last
exported symbol prior to that point in the code.

I have the call stacks for each python thread running up until the
dead lock:

# ThreadID: 992
out, err = proc.communicate("change: new\ndescription: %s
\n"%changelistDesc)
File: "c:\src\extern\python\lib\subprocess.py", line 689, in
communicate
return self._communicate(input)
File: "c:\src\extern\python\lib\subprocess.py", line 903, in
_communicate
stdout_thread.join()
File: "c:\src\extern\python\lib\threading.py", line 637, in join
self.__block.wait()
File: "c:\src\extern\python\lib\threading.py", line 237, in wait
waiter.acquire()

# ThreadID: 5516
File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
self.run()
File: "c:\src\extern\python\lib\threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File: "c:\src\extern\python\lib\subprocess.py", line 877, in
_readerthread
buffer.append(fh.read())

# ThreadID: 2668
File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
self.run()
File: "c:\src\scripts\auto\Autobuilder\StackTracer.py", line 69, in
run
self.stacktraces()
File: "c:\src\scripts\auto\Autobuilder\StackTracer.py", line 86, in
stacktraces
fout.write(stacktraces())
File: "c:\src\scripts\auto\Autobuilder\StackTracer.py", line 26, in
stacktraces
for filename, lineno, name, line in traceback.extract_stack(stack):

# ThreadID: 3248
out, err = proc.communicate("change: new\ndescription: %s
\n"%changelistDesc)
File: "c:\src\extern\python\lib\subprocess.py", line 689, in
communicate
return self._communicate(input)
File: "c:\src\extern\python\lib\subprocess.py", line 903, in
_communicate
stdout_thread.join()
File: "c:\src\extern\python\lib\threading.py", line 637, in join
self.__block.wait()
File: "c:\src\extern\python\lib\threading.py", line 237, in wait
waiter.acquire()


# ThreadID: 7700
File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
self.run()
File: "c:\src\extern\python\lib\threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File: "c:\src\extern\python\lib\subprocess.py", line 877, in
_readerthread
buffer.append(fh.read())

# ThreadID: 8020
out, err = proc.communicate("change: new\ndescription: %s
\n"%changelistDesc)
File: "c:\src\extern\python\lib\subprocess.py", line 689, in
communicate
return self._communicate(input)
File: "c:\src\extern\python\lib\subprocess.py", line 903, in
_communicate
stdout_thread.join()
File: "c:\src\extern\python\lib\threading.py", line 637, in join
self.__block.wait()
File: "c:\src\extern\python\lib\threading.py", line 237, in wait
waiter.acquire()

# ThreadID: 4252
File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
self.run()
File: "c:\src\extern\python\lib\threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File: "c:\src\extern\python\lib\subprocess.py", line 877, in
_readerthread
buffer.append(fh.read())

The StackTracer thread freezes trying to update my output file, and
yes I'm trying to 3 tasks in parallel which each one starts by
creating a changelist in perforce. This is just an easy repro case
for me, it happens with commands other then p4. This almost looks
like a threading issue more then the output deadlock.
 
M

Matt Joiner

how do you get the call stacks like this?

I have the call stacks for each python thread running up until the
dead lock:

# ThreadID: 992
 out, err = proc.communicate("change: new\ndescription: %s
\n"%changelistDesc)
File: "c:\src\extern\python\lib\subprocess.py", line 689, in
communicate
 return self._communicate(input)
File: "c:\src\extern\python\lib\subprocess.py", line 903, in
_communicate
 stdout_thread.join()
File: "c:\src\extern\python\lib\threading.py", line 637, in join
 self.__block.wait()
File: "c:\src\extern\python\lib\threading.py", line 237, in wait
 waiter.acquire()

# ThreadID: 5516
File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
 self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
 self.run()
File: "c:\src\extern\python\lib\threading.py", line 477, in run
 self.__target(*self.__args, **self.__kwargs)
File: "c:\src\extern\python\lib\subprocess.py", line 877, in
_readerthread
 buffer.append(fh.read())

# ThreadID: 2668
File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
 self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
 self.run()
File: "c:\src\scripts\auto\Autobuilder\StackTracer.py", line 69, in
run
 self.stacktraces()
File: "c:\src\scripts\auto\Autobuilder\StackTracer.py", line 86, in
stacktraces
 fout.write(stacktraces())
File: "c:\src\scripts\auto\Autobuilder\StackTracer.py", line 26, in
stacktraces
 for filename, lineno, name, line in traceback.extract_stack(stack):

# ThreadID: 3248
 out, err = proc.communicate("change: new\ndescription: %s
\n"%changelistDesc)
File: "c:\src\extern\python\lib\subprocess.py", line 689, in
communicate
 return self._communicate(input)
File: "c:\src\extern\python\lib\subprocess.py", line 903, in
_communicate
 stdout_thread.join()
File: "c:\src\extern\python\lib\threading.py", line 637, in join
 self.__block.wait()
File: "c:\src\extern\python\lib\threading.py", line 237, in wait
 waiter.acquire()


# ThreadID: 7700
File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
 self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
 self.run()
File: "c:\src\extern\python\lib\threading.py", line 477, in run
 self.__target(*self.__args, **self.__kwargs)
File: "c:\src\extern\python\lib\subprocess.py", line 877, in
_readerthread
 buffer.append(fh.read())

# ThreadID: 8020
 out, err = proc.communicate("change: new\ndescription: %s
\n"%changelistDesc)
File: "c:\src\extern\python\lib\subprocess.py", line 689, in
communicate
 return self._communicate(input)
File: "c:\src\extern\python\lib\subprocess.py", line 903, in
_communicate
 stdout_thread.join()
File: "c:\src\extern\python\lib\threading.py", line 637, in join
 self.__block.wait()
File: "c:\src\extern\python\lib\threading.py", line 237, in wait
 waiter.acquire()

# ThreadID: 4252
File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
 self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
 self.run()
File: "c:\src\extern\python\lib\threading.py", line 477, in run
 self.__target(*self.__args, **self.__kwargs)
File: "c:\src\extern\python\lib\subprocess.py", line 877, in
_readerthread
 buffer.append(fh.read())

The StackTracer thread freezes trying to update my output file, and
yes I'm trying to 3 tasks in parallel which each one starts by
creating a changelist in perforce.  This is just an easy repro case
for me, it happens with commands other then p4.  This almost looks
like a threading issue more then the output deadlock.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top