MacOS 10.9.2: threading error using python.org 2.7.6 distribution

M

Matthew Pounsett

I've run into a threading error in some code when I run it on MacOS that works flawlessly on a *BSD system running the same version of python. I'm running the python 2.7.6 for MacOS distribution from python.org's downloads page.

I have tried to reproduce the error with a simple example, but so far haven't been able to find the element or my code that triggers the error. I'm hoping someone can suggest some things to try and/or look at. Googling for "pyton" and the error returns exactly two pages, neither of which are any help.

When I run it through the debugger, I'm getting the following from inside threading.start(). python fails to provide a stack trace when I step into _start_new_thread(), which is a pointer to thread.start_new_thread(). It looks like threading.__bootstrap_inner() may be throwing an exception which thread.start_new_thread() is unable to handle, and for some reason the stackis missing so I get no stack trace explaining the error.

It looks like thread.start_new_thread() is in the binary object, so I can'tactually step into it and find where the error is occurring.
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading..py(745)start()
-> _start_new_thread(self.__bootstrap, ())
(Pdb) s
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading..py(750)start()
-> self.__started.wait()
(Pdb) Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from

My test code (which works) follows the exact same structure as the failing code, making the same calls to the threading module's objects' methods:

----
import threading

class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)

def run(self):
print "MyThread runs and exits."

def main():
try:
t = MyThread()
t.start()
except Exception as e:
print "Failed with {!r}".format(e)

if __name__ == '__main__':
main()
----

The actual thread object that's failing looks like this:

class RTF2TXT(threading.Thread):
"""
Takes a directory path and a Queue as arguments. The directory should be
a collection of RTF files, which will be read one-by-one, converted to
text, and each output line will be appended in order to the Queue.
"""
def __init__(self, path, queue):
threading.Thread.__init__(self)
self.path = path
self.queue = queue

def run(self):
logger = logging.getLogger('RTF2TXT')
if not os.path.isdir(self.path):
raise TypeError, "supplied path must be a directory"
for f in sorted(os.listdir(self.path)):
ff = os.path.join(self.path, f)
args = [ UNRTF_BIN, '-P', '.', '-t', 'unrtf.text', ff ]
logger.debug("Processing file {} with args {!r}".format(f, args))
p1 = subprocess.Popen( args, stdout=subprocess.PIPE,
universal_newlines=True)
output = p1.communicate()[0]
try:
output = output.decode('utf-8', 'ignore')
except Exception as err:
logger.error("Failed to decode output: {}".format(err))
logger.error("Output was: {!r}".format(output))

for line in output.split("\n"):
line = line.strip()
self.queue.put(line)
self.queue.put("<EOF>")

Note: I only run one instance of this thread. The Queue object is used to pass work off to another thread for later processing.

If I insert that object into the test code and run it instead of MyThread(), I get the error. I can't see anything in there that should cause problems for the threading module though... especially since this runs fine on another system with the same version of python.

Any thoughts on what's going on here?
 
C

Chris Angelico

If I insert that object into the test code and run it instead of MyThread(), I get the error. I can't see anything in there that should cause problems for the threading module though... especially since this runs fine on another system with the same version of python.

Any thoughts on what's going on here?

First culprit I'd look at is the mixing of subprocess and threading.
It's entirely possible that something goes messy when you fork from a
thread.

Separately: You're attempting a very messy charset decode there. You
attempt to decode as UTF-8, errors ignored, and if that fails, you log
an error... and continue on with the original bytes. You're risking
shooting yourself in the foot there; I would recommend you have an
explicit fall-back (maybe re-decode as Latin-1??), so the next code is
guaranteed to be working with Unicode. Currently, it might get a
unicode or a str.

ChrisA
 
M

Matthew Pounsett

First culprit I'd look at is the mixing of subprocess and threading.
It's entirely possible that something goes messy when you fork from a
thread.

I liked the theory, but I've run some tests and can't reproduce the error that way. I'm using all the elements in my test code that the real code runs, and I can't get the same error. Even when I deliberately break things I'm getting a proper exception with stack trace.

class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)

def run(self):
logger = logging.getLogger("thread")
p1 = subprocess.Popen( shlex.split( 'echo "MyThread calls echo."'),
stdout=subprocess.PIPE, universal_newlines=True)
logger.debug( p1.communicate()[0].decode('utf-8', 'ignore' ))
logger.debug( "MyThread runs and exits." )

def main():
console = logging.StreamHandler()
console.setFormatter(
logging.Formatter('%(asctime)s [%(name)-12s] %(message)s', '%T'))
logger = logging.getLogger()
logger.addHandler(console)
logger.setLevel(logging.NOTSET)

try:
t = MyThread()
#t = RTF2TXT("../data/SRD/rtf/", Queue.Queue())
t.start()
except Exception as e:
logger.error( "Failed with {!r}".format(e))

if __name__ == '__main__':
main()

Separately: You're attempting a very messy charset decode there. You
attempt to decode as UTF-8, errors ignored, and if that fails, you log
an error... and continue on with the original bytes. You're risking
shooting yourself in the foot there; I would recommend you have an
explicit fall-back (maybe re-decode as Latin-1??), so the next code is
guaranteed to be working with Unicode. Currently, it might get a
unicode or a str.

Yeah, that was a logic error on my part that I hadn't got around to noticing, since I'd been concentrating on the stuff that was actively breaking. That should have been in an else: block on the end of the try.
 
M

Matthew Pounsett

FWIW, the Python 2 version of subprocess is known to be thread-unsafe.
There is a Py2 backport available on PyPI of the improved Python 3
subprocess module:

Since that't the only thread that calls anything in subprocess, and I'm only running one instance of the thread, I'm not too concerned about how threadsafe subprocess is. In this case it shouldn't matter. Thanks for the info though.. that might be handy at some future point.
 
C

Chris Angelico

I liked the theory, but I've run some tests and can't reproduce the errorthat way. I'm using all the elements in my test code that the real code runs, and I can't get the same error. Even when I deliberately break thingsI'm getting a proper exception with stack trace.

In most contexts, "thread unsafe" simply means that you can't use the
same facilities simultaneously from two threads (eg a lot of database
connection libraries are thread unsafe with regard to a single
connection, as they'll simply write to a pipe or socket and then read
a response from it). But processes and threads are, on many systems,
linked. Just the act of spinning off a new thread and then forking can
potentially cause problems. Those are the exact sorts of issues that
you'll see when you switch OSes, as it's the underlying thread/process
model that's significant. (Particularly of note is that Windows is
*very* different from Unix-based systems, in that subprocess
management is not done by forking. But not applicable here.)

You may want to have a look at subprocess32, which Ned pointed out. I
haven't checked, but I would guess that its API is identical to
subprocess's, so it should be a drop-in replacement ("import
subprocess32 as subprocess"). If that produces the exact same results,
then it's (probably) not thread-safety that's the problem.
Yeah, that was a logic error on my part that I hadn't got around to noticing, since I'd been concentrating on the stuff that was actively breaking. That should have been in an else: block on the end of the try.

Ah good. Keeping bytes versus text separate is something that becomes
particularly important in Python 3, so I always like to encourage
people to get them straight even in Py2. It'll save you some hassle
later on.

ChrisA
 
M

Matthew Pounsett

In most contexts, "thread unsafe" simply means that you can't use the
same facilities simultaneously from two threads (eg a lot of database
connection libraries are thread unsafe with regard to a single
connection, as they'll simply write to a pipe or socket and then read
a response from it). But processes and threads are, on many systems,
linked. Just the act of spinning off a new thread and then forking can
potentially cause problems. Those are the exact sorts of issues that
you'll see when you switch OSes, as it's the underlying thread/process
model that's significant. (Particularly of note is that Windows is
*very* different from Unix-based systems, in that subprocess
management is not done by forking. But not applicable here.)

Thanks, I'll keep all that in mind. I have to wonder how much of a problemit is here though, since I was able to demonstrate a functioning fork inside a new thread further up in the discussion.

I have a new development that I find interesting, and I'm wondering if you still think it's the same problem.

I have taken that threading object and turned it into a normal function definition. It's still forking the external tool, but it's doing so in the main thread, and it is finished execution before any other threads are created. And I'm still getting the same error.

Turns out it's not coming from the threading module, but from the subprocess module instead. Specifically, like 709 of /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py
which is this:

try:
self._execute_child(args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell, to_close,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite)
except Exception:

I get the "Warning: No stack to get attribute from" twice when that self._execute_child() call is made. I've tried stepping into it to narrow it downfurther, but I'm getting weird behaviour from the debugger that I've neverseen before once I do that. It's making it hard to track down exactly where the error is occurring.

Interestingly, it's not actually raising an exception there. The except block is not being run.
 
C

Chris Angelico

Thanks, I'll keep all that in mind. I have to wonder how much of a problem it is here though, since I was able to demonstrate a functioning fork inside a new thread further up in the discussion.

Yeah, it's really hard to pin down sometimes. I once discovered a
problem whereby I was unable to spin off subprocesses that did certain
things, but I could do a trivial subprocess (I think I fork/exec'd to
the echo command or something) and that worked fine. Turned out to be
a bug in one of my signal handlers, but the error was being reported
at the point of the forking.
I have a new development that I find interesting, and I'm wondering if you still think it's the same problem.

I have taken that threading object and turned it into a normal function definition. It's still forking the external tool, but it's doing so in the main thread, and it is finished execution before any other threads are created. And I'm still getting the same error.

Interesting. That ought to eliminate all possibility of
thread-vs-process issues. Can you post the smallest piece of code that
exhibits the same failure?

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,874
Messages
2,569,924
Members
46,181
Latest member
GlycoRenewBlood

Latest Threads

Top