Multiprocessing bug, is my editor (SciTE) impeding my progress?

J

John Ladasky

Hi, folks,

Back in 2002, I got back into programming after a nine-year hiatus. I
needed a new programming language, was guided to Python 2.2, and was
off to the races. I chose the SciTE program editor, and I have been
using it ever since. I'm now using Python 2.6 on Ubuntu Linux 10.10.

My programming needs have grown more sophisticated, but I'm still
using SciTE. Pretty much all of my recent posts to comp.lang.python
have concerned multiprocessing. I put together a decent system for my
current project, and had it all working. Then I realized that I
needed to refactor and expand some code, which I did -- and somehow, I
generated a bug that I simply cannot understand. I've been puzzling
over it for three days.

The error is occurring inside one of my subprocesses. As far as I
know, SciTE is limited in what it can do in this situation. The
program does not return when a subprocess generates an exception. I
see the error message, but then the program simply hangs.

I have tried invoking the subprocess directly without scheduling it
through multiprocessing.Pool. It works fine. So the problem is
occurring inside Pool.

I tried opening my code in IDLE, and figured I could step through it,
or at least call functions one line at a time. It appears that
multiprocessing code is not compatible with IDLE. IDLE simply crashes
when I try to invoke any of the important functions.

I know, you want me to post a minimal example. Most of the time,
that's possible, and I do it. Trust me, this time it isn't. I have
about 500 lines of code, split across three files. These implement a
neural network, some test data, and multiprocessing methods for
network evaluation. I made several concerted changes to the code, and
turned a working system into this:

=============================================

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in
__bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 225, in
_handle_tasks
put(task)
TypeError: expected string or Unicode object, NoneType found

=============================================

Here's what I think would help me debug this error: I would like to
catch the TypeError, and examine the contents of task. I need to
accomplish this WITHOUT adding a try...except block to the Python
library file multiprocessing/pool.py. I don't know whether this is
possible, because the traceback isn't clear about where my OWN code
calls the code which is generating the error.

After that, if the cause of the error still is not obvious, I might
need to go back to the working program. Somehow I want to examine the
contents of task when the program works, and no TypeError is being
generated. By comparing the two, I hope to see a difference. From
that, I should be able to figure out how I have broken what is being
fed to Pool.__init__ and/or MapResult.__init__.

Any suggestions how I might best accomplish this task? Does this
error message look familiar to anyone?

More generally, should I consider graduating from SciTE? I have had a
look at a few of the more comprehensive IDE's over the years, and I'll
have to say that I found them to be intimidating. I found it to be a
huge chore just to open a single Python script and run it inside an
IDE. It seems like you had to know how to set up a complete, multi-
script project before you could even accomplish simple tasks. That
steep learning curve is the reason that I didn't choose Java as my
programming language.

So, if any of you have pertinent recommendations in the IDE
department, please feel free to guide me that way.

Thanks!
 
M

Marco Nawijn

Hi, folks,

Back in 2002, I got back into programming after a nine-year hiatus.  I
needed a new programming language, was guided to Python 2.2, and was
off to the races.  I chose the SciTE program editor, and I have been
using it ever since.  I'm now using Python 2.6 on Ubuntu Linux 10.10.

My programming needs have grown more sophisticated, but I'm still
using SciTE.  Pretty much all of my recent posts to comp.lang.python
have concerned multiprocessing.  I put together a decent system for my
current project, and had it all working.  Then I realized that I
needed to refactor and expand some code, which I did -- and somehow, I
generated a bug that I simply cannot understand.  I've been puzzling
over it for three days.

The error is occurring inside one of my subprocesses.  As far as I
know, SciTE is limited in what it can do in this situation.  The
program does not return when a subprocess generates an exception.  I
see the error message, but then the program simply hangs.

I have tried invoking the subprocess directly without scheduling it
through multiprocessing.Pool.  It works fine.  So the problem is
occurring inside Pool.

I tried opening my code in IDLE, and figured I could step through it,
or at least call functions one line at a time.  It appears that
multiprocessing code is not compatible with IDLE.  IDLE simply crashes
when I try to invoke any of the important functions.

I know, you want me to post a minimal example.  Most of the time,
that's possible, and I do it.  Trust me, this time it isn't.  I have
about 500 lines of code, split across three files.  These implement a
neural network, some test data, and multiprocessing methods for
network evaluation.  I made several concerted changes to the code, and
turned a working system into this:

=============================================

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in
__bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 225, in
_handle_tasks
    put(task)
TypeError: expected string or Unicode object, NoneType found

=============================================

Here's what I think would help me debug this error: I would like to
catch the TypeError, and examine the contents of task.  I need to
accomplish this WITHOUT adding a try...except block to the Python
library file multiprocessing/pool.py.  I don't know whether this is
possible, because the traceback isn't clear about where my OWN code
calls the code which is generating the error.

After that, if the cause of the error still is not obvious, I might
need to go back to the working program.  Somehow I want to examine the
contents of task when the program works, and no TypeError is being
generated.  By comparing the two, I hope to see a difference.  From
that, I should be able to figure out how I have broken what is being
fed to Pool.__init__ and/or MapResult.__init__.

Any suggestions how I might best accomplish this task?  Does this
error message look familiar to anyone?

More generally, should I consider graduating from SciTE?  I have had a
look at a few of the more comprehensive IDE's over the years, and I'll
have to say that I found them to be intimidating.  I found it to be a
huge chore just to open a single Python script and run it inside an
IDE.  It seems like you had to know how to set up a complete, multi-
script project before you could even accomplish simple tasks.  That
steep learning curve is the reason that I didn't choose Java as my
programming language.

So, if any of you have pertinent recommendations in the IDE
department, please feel free to guide me that way.

Thanks!
Hello John,

One way of trying to debug the issue could be to use ipython and ipdb.
You cadn than run your code from within the ipython shell. The
debugger will hold at the type error, but keep the context. At this
point you should be able to evaluate task.

As a side comment to your IDE remarks. I keep switching between VIM
and Aptana/Pydev. The more I learn about VIM the more I feel
comfortable and productive. In combination with ipython it is quite a
solid development environment. On the other hand Pydev is very user
friendly, powerfull and easy to learn. Debugging in Pydev is
excellent.

Regards,

Marco
 
T

Terry Reedy

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in
__bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 225, in
_handle_tasks
put(task)
TypeError: expected string or Unicode object, NoneType found

=============================================

Here's what I think would help me debug this error: I would like to
catch the TypeError, and examine the contents of task.

The traceback says that it is None, which has no contents ;=).
I need to
accomplish this WITHOUT adding a try...except block to the Python
library file multiprocessing/pool.py.

I do not understand this statement. You should feel free to make a
backup copy of pool.py and then modify it for debugging.
I don't know whether this is
possible, because the traceback isn't clear about where my OWN code
calls the code which is generating the error.

It appears to be threading trying to call your code, and failing, that
is the problem. But I do not know how threading, multiprocessing.pool,
and your code are supposed to interact.
After that, if the cause of the error still is not obvious, I might
need to go back to the working program. Somehow I want to examine the
contents of task when the program works, and no TypeError is being
generated.

The traceback says that it is a string.

I would start with the line that fails 'put(task)', and work backwards
to see where 'task' comes from and how it could become None. It is even
possible that multiprocessing.pool has a bug that you ran into.
 
J

John Ladasky

Thanks, Marco.

I've noticed that the matplotlib reference manual recommends ipython.
I haven't been clear what its advantages are, but if interacting with
multiprocessing correctly is one of them, I'll try it.

If ipython does everything that IDLE does and more, why is IDLE still
shipped with Python anyway?

I'll follow up on your IDE recommendations too after trying ipython.
 
M

Matt Joiner

John I'm in a similar position. I've been using Geany for 2+ years and
haven't found anything to replace it.
Either the replacement tool makes it too difficult to work with Python
correctly, or I spend more time trying to understand it, rather than
getting the job done.
I also use vim on occasion when GUI isn't an option.

I seem to do okay, so I'm not sure you're at any disadvantage. A stand
alone graphical debugger would be handy tho...
 
J

John Ladasky

The traceback says that it is None, which has no contents ;=).

I'm not sure about that. I don't submit the task variable, it's
something that Pool builds from what I submit. Is task == None when
it should be a string? Or, is task an iterable which contains one
element which should be a string? And what's supposed to be in that
string anyway? My first reading of the source code of Pool didn't
make this clear to me. Also, I've noticed that tracebacks from
subprocesses are less informative than tracebacks from the parent
process. What's missing?

Finally, I also recall that multiprocessing invokes pickle to pass
data between processes (I still don't understand the need for pickle
here). This suggests to me that the string MIGHT contain the pickled
code of the method I want to run in the subprocess. But I'm not sure,
until I can actually examine the task variable in my working version.
I do not understand this statement. You should feel free to make a
backup copy of pool.py and then modify it for debugging.

Right, so, the last time I tried this with a piece of library code, I
ran into some major headaches with import statements. I suppose I
could have a look at Pool and see whether it can be extracted cleanly
and made to run.
 > I don't know whether this is


It appears to be threading trying to call your code, and failing, that
is the problem. But I do not know how threading, multiprocessing.pool,
and your code are supposed to interact.

It might be that pickle has somehow managed to pass a null code string
to the subprocess, so it has nothing to run.
The traceback says that it is a string.

Yes, again... I want to know what that string is supposed to DO.
I would start with the line that fails 'put(task)', and work backwards
to see where 'task' comes from and how it could become None. It is even
possible that multiprocessing.pool has a bug that you ran into.

Oh, please don't say that. I'm no computer scientist, and Python has
been scrutinized by so many professionals. I couldn't have possibly
found a language bug.
 
T

Terry Reedy

Right, so, the last time I tried this with a piece of library code, I
ran into some major headaches with import statements. I suppose I
could have a look at Pool and see whether it can be extracted cleanly
and made to run.

I have patched files both in /Lib and /Lib/idle on Windows with no
problems except that I had to switch to admin account to make the patch.
I probably changed 'copy of x.py' to either 'x.bak' or 'x.py.bak'
but I do not remember. Deleting .pyc might or might not help.

For a file in Lib, I have also copied to the working directory with my
script, which gets prepended to sys.path. This makes restoring the
default easier. One would have to copy all of multiprocessing/ for that
to work with m.../pool.py.
 
S

Steven D'Aprano

Oh, please don't say that. I'm no computer scientist, and Python has
been scrutinized by so many professionals. I couldn't have possibly
found a language bug.

"Professional" just means they get paid for doing it. Professionals gave
us the 2008 banking collapse, the Challenger shuttle explosion, the
sinking of the Titanic, trench warfare in World War I, the Chernobyl
nuclear meltdown, leaded petrol, "Battlefield Earth" the movie, and the BP
Gulf of Mexico oil spill. Amateurs gave us the discovery of electricity,
the Roomba, the original Apple computer, GNU software, Linux, Ogg/Vorbis,
and the discovery of continental drift.

While your modesty is a welcome change from n00bs who imagine that
anything about Python that they misunderstood is a bug, don't sell
yourself short. You don't need to be a computer scientist to identify
bugs in software.
 
C

Chris Angelico

While your modesty is a welcome change from n00bs who imagine that
anything about Python that they misunderstood is a bug, don't sell
yourself short. You don't need to be a computer scientist to identify
bugs in software.

Likelihood of something being a bug generally depends heavily on
"eyeball density". If a piece of code gets a lot of eyeballs, chances
are its bugs have been found (not always but often). Generally, the
code that makes up a heavily-used open source project can be expected
to have quite a few eyeballs near it as a regular thing; but there's
always the obscure bits that don't. Crypto modules have fallen foul of
this on occasion, with bugs lurking there far more than might
otherwise be expected, on account of such a small portion of coders
ever touching cryptography.

Of course, it's always less embarrassing to say "I think I'm using
this wrong" and have someone say "You found a language bug" than to
come in guns blazing with "ur lnguage is teh buggy" (sorry, I don't
speak Lame very fluently) only to learn that you spoiled the
incantation in some way. The number of people who assume that their
first-time code is perfect and the language is hopeless is somewhat
astonishing.

ChrisA
 
J

John Yeung

Oh, please don't say that.  I'm no computer scientist, and
Python has been scrutinized by so many professionals.  I
couldn't have possibly found a language bug.

Scrutiny or no, Python has its fair share of bugs. I think almost all
real-world implementations of almost all general-purpose programming
languages do.

<tangent>
Not long ago a friend of mine (a mathematician, but only a novice
programmer) sent me some code dealing with sets that exposed a bug in
Python 2.6.1. He invoked the union() method from the set class rather
than a set instance, and it took us a long time to figure out why he
was getting different results than I was from the same code (I was the
one on 2.6.1, and my results were wrong).

Fortunately, it was (a) easy enough to work around and (b) fixed in
subsequent versions of Python. But it just goes to show that even
unsophisticated programmers can stumble upon language bugs. This one
wasn't even in the library; it was a built-in.
</tangent>

That hasn't shaken my confidence in Python, though. (Also, for what
it's worth, I use SciTE as my Python editor as well. I've also used
Geany from time to time, and I have no trouble recommending it. It's
a step up the IDE ladder from SciTE, but is still tons lighter than
Eclipse and its brethren.)

John
 
J

John Ladasky

Thanks once again to everyone for their recommendations, here's a
follow-up. In summary, I'm still baffled.

I tried ipython, as Marco Nawijn suggested. If there is some special
setting which returns control to the interpreter when a subprocess
crashes, I haven't found it yet. Yes, I'm RTFM. As with SciTE,
everything just hangs. So I went back to SciTE for now.

And I'm doing what Terry Reedy suggested -- I am editing
multiprocess.Pool in place. I made a backup, of course. I am using
sudo to run SciTE so that I can edit the system files, and not have to
worry about chasing path and import statement problems.

What I have found, so far, is no evidence that a string is needed in
any of the code. What's the task variable? It's a deeply-nested
tuple, containing no strings, not even in the WORKING code. This
makes me wonder whether that traceback is truly complete.

I wrote a routine to display the contents of task, immediately before
the offending put(). Here's a breakdown.


In the WORKING version:

task: <type 'tuple'>
<type 'int'> 0
<type 'int'> 0
<type 'function'> <function mapstar at 0xa7ec5a4>
<type 'tuple'> (see below)
<type 'dict'> {}

task[3]: <type 'tuple'>
<type 'tuple'> (see below)

task[3][0]: <type 'tuple'>
<type 'function'> <function mean_square_error at 0xa7454fc>
<type 'tuple'> (see below)

task[3][0][1]: <type 'tuple'>
<class 'neural.SplitData'> (see below)

task[3][0][1][0]: <class 'neural.SplitData'>
net <class 'neural.CascadeArray'> shape=(2, 3)
inp <type 'numpy.ndarray'> shape=(307, 2)
tgt <type 'numpy.ndarray'> shape=(307, 2)


By watching this run, I've learned that task[0] and task[1] are
counters for groups of subprocesses and individual subprocesses,
respectively. Suppose we have four subprocesses. When everything is
working, task[:2] = [0,0] for the first call, then [0,1], [0,2],
[0,3]; then, [1,0], [1,1], [1,2], etc.

task[2] points to multiprocessing.Pool.mapstar, a one-line function
that I never modify. task[4] is an empty dictionary. So it looks
like everything that I provide appears in task[3].

task[3] is just a tuple inside a tuple (which is weird). task[3][0]
contains the function to be called (in this case, my function,
mean_square_error), and then a tuple containing all of the arguments
to be passed to that function. The docs say that the function in
question must be defined at the top level of the code so that it's
importable (it is), and that all the arguments to be sent to that
function will be wrapped up in a single tuple -- that is presumably
task[3][0][1].

But that presumption is wrong. I wrote a function which creates a
collections.namedtuple object of the type SplitData, which contains
the function's arguments. It's not task[3][0][1] itself, but the
tuple INSIDE it, namely task[3][0][1][0]. More weirdness. You don't
need to worry about task[3][0][1][0], other than to note that these
are my neural network objects, they are intact, they are the classes I
expect, and they are named as I expect -- and that there are NO STRING
objects.


Now, are there any differences between the working version of my code
and the buggy version? Other than a few trivial name changes that I
made deliberately, the structure of task looks the SAME...


task: <type 'tuple'>
<type 'int'> 0
<type 'int'> 0
<type 'function'> <function mapstar at 0x88e0a04>
<type 'tuple'> (see below)
<type 'dict'> {}

task[3]: <type 'tuple'>
<type 'tuple'> (see below)

task[3][0]: <type 'tuple'>
<type 'function'> <function error at 0x88a5fb4>
<type 'tuple'> (see below)

task[3][0][1]: <type 'tuple'>
<class '__main__.SplitData'> (see below)

task[3][0][1][0]: <class '__main__.SplitData'>
func <class 'cascade.Cascade'> shape=(2, 3)
inp <type 'numpy.ndarray'> shape=(307, 2)
tgt <type 'numpy.ndarray'> shape=(307, 2)

Again, all the action is in task[3]. I was worried about the empty
dictionary in task[4] at first, but I've seen this {} in the working
program, too. I'm not sure what it does.

For completeness, here's mean_square_error() from the working program:

def mean_square_error(b):
out = array([b.net(i) for i in b.inp])
return sum((out-b.tgt)**2)

And, here's error() from the buggy program.

def error(b):
out = array([b.func(i) for i in b.inp])
return sum((out-b.tgt)**2)

I renamed mean_square_error(), because I realized that the mean-square
error is the only kind of error I'll ever be computing. I also
renamed "net" to "func", in SplitData, reflecting the more general
nature of the Cascade class I'm developing. So I mirror that name
change here. Other than that, I trust you can see that error() and
mean_square_error() are identical.

I can call mean_square_error directly with a SplitData tuple and it
works. I can call error directly with a SplitData tuple in the broken
program, and it ALSO works. I'm only having problems when I try to
submit the job through Pool. I tried putting a print trap in
error(). When I use Pool then error() never gets called.

I suppose that the logical next step is to compare the two Pool
instances... onward... :^P
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,813
Messages
2,569,696
Members
45,483
Latest member
TedDvb6626

Latest Threads

Top