How to guard against bugs like this one?

K

kj

I just spent about 1-1/2 hours tracking down a bug.

An innocuous little script, let's call it buggy.py, only 10 lines
long, and whose output should have been, at most two lines, was
quickly dumping tens of megabytes of non-printable characters to
my screen (aka gobbledygook), and in the process was messing up my
terminal *royally*. Here's buggy.py:



import sys
import psycopg2
connection_params = "dbname='%s' user='%s' password='%s'" % tuple(sys.argv[1:])
conn = psycopg2.connect(connection_params)
cur = conn.cursor()
cur.execute('SELECT * FROM version;')
print '\n'.join(x[-1] for x in cur.fetchall())


(Of course, buggy.py is pretty useless; I reduced the original,
more useful, script to this to help me debug it.)

Through a *lot* of trial an error I finally discovered that the
root cause of the problem was the fact that, in the same directory
as buggy.py, there is *another* innocuous little script, totally
unrelated, whose name happens to be numbers.py. (This second script
is one I wrote as part of a little Python tutorial I put together
months ago, and is not much more of a script than hello_world.py;
it's baby-steps for the absolute beginner. But apparently, it has
a killer name! I had completely forgotten about it.)

Both scripts live in a directory filled with *hundreds* little
one-off scripts like the two of them. I'll call this directory
myscripts in what follows.

It turns out that buggy.py imports psycopg2, as you can see, and
apparently psycopg2 (or something imported by psycopg2) tries to
import some standard Python module called numbers; instead it ends
up importing the innocent myscript/numbers.py, resulting in *absolute
mayhem*.

(This is no mere Python "wart"; this is a suppurating chancre, and
the fact that it remains unfixed is a neverending source of puzzlement
for me.)

How can the average Python programmer guard against this sort of
time-devouring bug in the future (while remaining a Python programmer)?
The only solution I can think of is to avoid like the plague the
basenames of all the 200 or so /usr/lib/pythonX.XX/xyz.py{,c} files,
and *pray* that whatever name one chooses for one's script does
not suddenly pop up in the appropriate /usr/lib/pythonX.XX directory
of a future release.

What else can one do? Let's see, one should put every script in its
own directory, thereby containing the damage.

Anything else?

Any suggestion would be appreciated.

TIA!

~k
 
C

Chris Rebert

I just spent about 1-1/2 hours tracking down a bug.
Through a *lot* of trial an error I finally discovered that the
root cause of the problem was the fact that, in the same directory
as buggy.py, there is *another* innocuous little script, totally
unrelated, whose name happens to be numbers.py.  (This second script
is one I wrote as part of a little Python tutorial I put together
months ago, and is not much more of a script than hello_world.py;
it's baby-steps for the absolute beginner.  But apparently, it has
a killer name!  I had completely forgotten about it.)

Both scripts live in a directory filled with *hundreds* little
one-off scripts like the two of them.  I'll call this directory
myscripts in what follows.

It turns out that buggy.py imports psycopg2, as you can see, and
apparently psycopg2 (or something imported by psycopg2) tries to
import some standard Python module called numbers; instead it ends
up importing the innocent myscript/numbers.py, resulting in *absolute
mayhem*.

(This is no mere Python "wart"; this is a suppurating chancre, and
the fact that it remains unfixed is a neverending source of puzzlement
for me.)

How can the average Python programmer guard against this sort of
time-devouring bug in the future (while remaining a Python programmer)?
The only solution I can think of is to avoid like the plague the
basenames of all the 200 or so /usr/lib/pythonX.XX/xyz.py{,c} files,
and *pray* that whatever name one chooses for one's script does
not suddenly pop up in the appropriate /usr/lib/pythonX.XX directory
of a future release.

What else can one do?  Let's see, one should put every script in its
own directory, thereby containing the damage.

Anything else?

Any suggestion would be appreciated.

I think absolute imports avoid this problem:

from __future__ import absolute_import

For details, see PEP 328:
http://www.python.org/dev/peps/pep-0328/

Cheers,
Chris
 
R

Roy Smith

kj <[email protected]> said:
Through a *lot* of trial an error I finally discovered that the
root cause of the problem was the fact that, in the same directory
as buggy.py, there is *another* innocuous little script, totally
unrelated, whose name happens to be numbers.py.
[...]
It turns out that buggy.py imports psycopg2, as you can see, and
apparently psycopg2 (or something imported by psycopg2) tries to
import some standard Python module called numbers; instead it ends
up importing the innocent myscript/numbers.py, resulting in *absolute
mayhem*.

I feel your pain, but this is not a Python problem, per-se. The general
pattern is:

1) You have something which refers to a resource by name.

2) There is a sequence of places which are searched for this name.

3) The search finds the wrong one because another resource by the same name
appears earlier in the search path.

I've gotten bitten like this by shells finding the wrong executable (in
$PATH). By dynamic loaders finding the wrong library (in
$LD_LIBRARY_PATH). By C compilers finding the wrong #include file. And so
on. This is just Python's import finding the wrong module in your
$PYTHON_PATH.

The solution is the same in all cases. You either have to refer to
resources by some absolute name, or you need to make sure you set up your
search paths correctly and know what's in them. In your case, one possible
solution be to make sure "." (or "") isn't in sys.path (although that might
cause other issues).
 
S

Steven D'Aprano

I just spent about 1-1/2 hours tracking down a bug.

An innocuous little script, let's call it buggy.py, only 10 lines long,
and whose output should have been, at most two lines, was quickly
dumping tens of megabytes of non-printable characters to my screen (aka
gobbledygook), and in the process was messing up my terminal *royally*.
Here's buggy.py: [...]
It turns out that buggy.py imports psycopg2, as you can see, and
apparently psycopg2 (or something imported by psycopg2) tries to import
some standard Python module called numbers; instead it ends up importing
the innocent myscript/numbers.py, resulting in *absolute mayhem*.


There is no module numbers in the standard library, at least not in 2.5.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named numbers

It must be specific to psycopg2.

I would think this is a problem with psycopg2 -- it sounds like it should
be written as a package, but instead is written as a bunch of loose
modules. I could be wrong of course, but if it is just a collection of
modules, I'd definitely call that a poor design decision, if not a bug.

(This is no mere Python "wart"; this is a suppurating chancre, and the
fact that it remains unfixed is a neverending source of puzzlement for
me.)

No, it's a wart. There's no doubt it bites people occasionally, but I've
been programming in Python for about ten years and I've never been bitten
by this yet. I'm sure it will happen some day, but not yet.

In this case, the severity of the bug (megabytes of binary crud to the
screen) is not related to the cause of the bug (shadowing a module).

As for fixing it, unfortunately it's not quite so simple to fix without
breaking backwards-compatibility. The opportunity to do so for Python 3.0
was missed. Oh well, life goes on.

How can the average Python programmer guard against this sort of
time-devouring bug in the future (while remaining a Python programmer)?
The only solution I can think of is to avoid like the plague the
basenames of all the 200 or so /usr/lib/pythonX.XX/xyz.py{,c} files, and
*pray* that whatever name one chooses for one's script does not suddenly
pop up in the appropriate /usr/lib/pythonX.XX directory of a future
release.

Unfortunately, Python makes no guarantee that there won't be some clash
between modules. You can minimize the risks by using packages, e.g. given
a package spam containing modules a, b, c, and d, if you refer to spam.a
etc. then you can't clash with modules a, b, c, d, but only spam. So
you've cut your risk profile from five potential clashes to only one.

Also, generally most module clashes are far more obvious. If you do this:

import module
x = module.y

and module is shadowed by something else, you're *much* more likely to
get an AttributeError than megabytes of crud to the screen.

I'm sorry that you got bitten so hard by this, but in practice it's
uncommon, and relatively mild when it happens.

What else can one do? Let's see, one should put every script in its own
directory, thereby containing the damage.

That's probably a bit extreme, but your situation:

"Both scripts live in a directory filled with *hundreds* little
one-off scripts like the two of them."

is far too chaotic for my liking. You don't need to go to the extreme of
a separate directory for each file, but you can certainly tidy things up
a bit. For example, anything that's obsolete should be moved out of the
way where it can't be accidentally executed or imported.
 
T

Tim Chase

Stephen said:
First, I don't shadow built in modules. Its really not very hard to avoid.

Given the comprehensive nature of the batteries-included in
Python, it's not as hard to accidentally shadow a built-in,
unknown to you, but yet that is imported by a module you are
using. The classic that's stung me enough times (and many others
on c.l.p and other forums, as a quick google evidences) such that
I *finally* remember:

bash$ touch email.py
bash$ python
... Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/smtplib.py", line 46, in <module>
import email.Utils
ImportError: No module named Utils

Using "email.py" is an innocuous name for a script/module you
might want to do emailish things, and it's likely you'll use
smtplib in the same code...and kablooie, things blow up even if
your code doesn't reference or directly use the built-in email.py.

Yes, as Chris mentions, PEP-328 absolute vs. relative imports
should help ameliorate the problem, but it's not yet commonly
used (unless you're using Py3, it's only at the request of a
__future__ import in 2.5+).

-tkc
 
C

Carl Banks

Both scripts live in a directory filled with *hundreds* little
one-off scripts like the two of them.  I'll call this directory
myscripts in what follows.
[snip]

How can the average Python programmer guard against this sort of
time-devouring bug in the future (while remaining a Python programmer)?


Don't put hundreds of little one-off scripts in single directory.
Python can't save you from polluting your own namespace.

Don't choose such generic names for modules. Keep in mind module
names are potentially globally visible and any sane advice you ever
heard about globals is to use descriptive names. I instinctively use
adjectives, compound words, and abstract nouns for the names of all my
modules so as to be more descriptive, and to avoid name conflicts with
classes and variables.

Also learn to debug better.


Carl Banks
 
C

Carl Banks

Given the comprehensive nature of the batteries-included in
Python, it's not as hard to accidentally shadow a built-in,
unknown to you, but yet that is imported by a module you are
using.  The classic that's stung me enough times (and many others
on c.l.p and other forums, as a quick google evidences) such that
I *finally* remember:

   bash$ touch email.py
   bash$ python
   ...
   >>> import smtplib
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/lib/python2.5/smtplib.py", line 46, in <module>
       import email.Utils
   ImportError: No module named Utils

Using "email.py" is an innocuous name for a script/module you
might want to do emailish things, and it's likely you'll use
smtplib in the same code...and kablooie, things blow up even if
your code doesn't reference or directly use the built-in email.py.


email.py is not an innocuous name, it's a generic name in a global
namespace, which is a Bad Thing. Plus what does a script or module
called "email.py" actually do? Send email? Parse email? "email" is
terrible name for a module and you deserve what you got for using it.

Name your modules "send_email.py" or "sort_email.py" or if it's a
library module of related functions, "email_handling.py". Modules and
scripts do things (usually), they should be given action words as
names.


(**) Questionable though it be, if the Standard Library wants to use
an "innocuous" name, It can.


Carl Banks
 
J

Jean-Michel Pichavant

Carl said:
email.py is not an innocuous name, it's a generic name in a global
namespace, which is a Bad Thing. Plus what does a script or module
called "email.py" actually do? Send email? Parse email? "email" is
terrible name for a module and you deserve what you got for using it.

Name your modules "send_email.py" or "sort_email.py" or if it's a
library module of related functions, "email_handling.py". Modules and
scripts do things (usually), they should be given action words as
names.


(**) Questionable though it be, if the Standard Library wants to use
an "innocuous" name, It can.


Carl Banks
That does not solve anything, if the smtplib follows your advice, then
you'll be shadowing its send_email module.
The only way to avoid collision would be to name your module
__PDSFLSDF_send_email__13221sdfsdf__.py

That way, the probabilty you'd shadow one package hidden module is below
the probability that Misses Hilton ever says something relevant.
However nobody wants to use such names.

Stephen gave good advices in this thread that helps avoiding this issue.

JM
 
K

kj

Let me preface everything by thanking you and all those who
replied for their comments.

I have only one follow-up question (or rather, set of related
questions) that I'm very keen about, plus a bit of a vent at the
end.

In said:
As for fixing it, unfortunately it's not quite so simple to fix without
breaking backwards-compatibility. The opportunity to do so for Python 3.0
was missed.

This last point is to me the most befuddling of all. Does anyone
know why this opportunity was missed for 3.0? Anyone out there
with the inside scoop on this? Was the fixing of this problem
discussed in some PEP or some mailing list thread? (I've tried
Googling this but did not hit on the right keywords to bring up
the deliberations I'm looking for.)

~k

[NB: as I said before, what follows begins to slide into a vent,
and is quite unimportant; I've left it, for whatever grain of truth
it may contain, as an grossly overgrown PS; feel free to ignore
it, I'm *by far* most interested in the question stated in the
paragraph right above, because it will give me, I hope, a better
sense of where the biggest obstacles to fixing this problem lie.]

P.S. Yes, I see the backwards-compatibility problem, but that's
what rolling out a whole new versions is good for; it's a bit of
a fresh start. I remember hearing GvR's Google Talk on the coming
Python 3, which was still in the works then, and being struck by
the sheer *modesty* of the proposed changes (while the developers
of the mythical Perl6 seemed to be on a quest for transcendence to
a Higher Plane of Programming, as they still are). In particular
the business with print -> print() seemed truly bizarre to me: this
is a change that will break a *huge* volume of code, and yet,
judging by the rationale given for it, the change solves what are,
IMHO, a relatively minor annoyances. Python's old print statement
is, I think, at most a tiny little zit invisible to all but those
obsessed with absolute perfection. And I can't imagine that whatever
would be required to fix Python's import system could break more
code than redefining the rules for a workhorse like print.

In contrast, the Python import problem is a ticking bomb potentially
affecting all code that imports other modules. All that needs to
happen is that, in a future release of Python, some new standard
module emerges (like numbers.py emerged in 2.6), and this module
is imported by some module your code imports. Boom! Note that it
was only coincidental that the bug I reported in this thread occurred
in a script I wrote recently. I could have written both scripts
before 2.6 was released, and the new numbers.py along with it;
barring the uncanny clairvoyance of some responders, there would
have been, at the time, absolutely no plausible reason for not
naming one of the two scripts numbers.py.

To the argument that the import system can't be easily fixed because
it breaks existing code, one can reply that the *current* import
system already breaks existing code, as illustrated by the example
I've given in this thread: this could have easily been old pre-2.6
code that got broken just because Python decided to add numbers.py
to the distribution. (Yes, Python can't guarantee that the names
of new standard modules won't clash with the names of existing
local modules, but this is true for Perl as well, and due to Perl's
module import scheme (and naming conventions), a scenario like the
one I presented in this thread would have been astronomically
improbable. The Perl example shows that the design of the module
import scheme and naming conventions for standard modules can go
a long way to minimize the consequences of this unavoidable potential
for future name clashes.)
 
G

Grant Edwards

kj <[email protected]> said:
Through a *lot* of trial an error I finally discovered that the
root cause of the problem was the fact that, in the same directory
as buggy.py, there is *another* innocuous little script, totally
unrelated, whose name happens to be numbers.py.
[...]
It turns out that buggy.py imports psycopg2, as you can see, and
apparently psycopg2 (or something imported by psycopg2) tries to
import some standard Python module called numbers; instead it ends
up importing the innocent myscript/numbers.py, resulting in *absolute
mayhem*.

I feel your pain, but this is not a Python problem, per-se.

I think it is. There should be different syntax to import from
"standard" places and from "current directory". Similar to the
difference between "foo.h" and said:
The general
pattern is:

1) You have something which refers to a resource by name.

2) There is a sequence of places which are searched for this
name.

Searching the current directory by default is the problem.
Nobody in their right mind has "." in the shell PATH and IMO it
shouldn't be in Python's import path either. Even those
wreckless souls who do put "." in their path put it at the end
so they don't accidentally override system commands.
 
N

Nobody

I think it is.

I agree.
There should be different syntax to import from
"standard" places and from "current directory". Similar to the
difference between "foo.h" and <foo.h> in cpp.

I don't know if that's necessary. Only supporting the "foo.h" case would
work fine if Python behaved like gcc, i.e. if the "current directory"
referred to the directory contain the file performing the import rather
than in the process' CWD.

As it stands, imports are dynamically scoped, when they should be
lexically scoped.
Searching the current directory by default is the problem.
Nobody in their right mind has "." in the shell PATH and IMO it
shouldn't be in Python's import path either. Even those
wreckless souls who do put "." in their path put it at the end
so they don't accidentally override system commands.

Except, what should be happening here is that it should be searching the
directory containing the file performing the import *first*. If foo.py
contains "import bar", and there's a bar.py in the same directory as
foo.py, that's the one it should be using.

The existing behaviour is simply wrong, and there's no excuse for it
("but it's easier to implement" isn't a legitimate argument).

The only situation where the process' CWD should be used is for an import
statement in a non-file source (i.e. stdin or the argument to the -c
switch).
 
A

Alf P. Steinbach

* Nobody:
I agree.


I don't know if that's necessary. Only supporting the "foo.h" case would
work fine if Python behaved like gcc, i.e. if the "current directory"
referred to the directory contain the file performing the import rather
than in the process' CWD.

As it stands, imports are dynamically scoped, when they should be
lexically scoped.


Except, what should be happening here is that it should be searching the
directory containing the file performing the import *first*. If foo.py
contains "import bar", and there's a bar.py in the same directory as
foo.py, that's the one it should be using.

The existing behaviour is simply wrong, and there's no excuse for it
("but it's easier to implement" isn't a legitimate argument).
+1


The only situation where the process' CWD should be used is for an import
statement in a non-file source (i.e. stdin or the argument to the -c
switch).

Hm, not sure about that last.


Cheers,

- Alf
 
T

Terry Reedy

This last point is to me the most befuddling of all. Does anyone
know why this opportunity was missed for 3.0? Anyone out there
with the inside scoop on this? Was the fixing of this problem
discussed in some PEP or some mailing list thread? (I've tried
Googling this but did not hit on the right keywords to bring up
the deliberations I'm looking for.)

There was a proposal to put the whole stdlib into a gigantic package, so
that

import itertools

would become, for instance

import std.itertools.

Guido rejected that. I believe he both did not like it and was concerned
about making upgrade to 3.x even harder. The discussion was probably on
the now closed py3k list.

Terry Jan Reedy
 
C

Carl Banks

I don't know if that's necessary. Only supporting the "foo.h" case would
work fine if Python behaved like gcc, i.e. if the "current directory"
referred to the directory contain the file performing the import rather
than in the process' CWD.

As it stands, imports are dynamically scoped, when they should be
lexically scoped.

Mostly incorrect. The CWD is in sys.path only for interactive
sessions, and when started with -c switch. When running scripts, the
directory where the script is located is used instead, not the
process's working directory.

So, no, it isn't anything like dynamic scoping.

The only situation where the process' CWD should be used is for an import
statement in a non-file source (i.e. stdin or the argument to the -c
switch).

It already is that way, chief.

I think you're misunderstanding what's wrong here; the CWD doesn't
have anything to do with it. Even if CWD isn't in the path you still
get the bad behavior kj noted. So now what?

Python's importing can be improved but there's no foolproof way to get
rid of the fundamental problem of name clashes.


Carl Banks
 
C

Carl Banks

That does not solve anything,

Of course it does, it solves the problem of having poorly-named
modules. It also helps reduce possibility of name clashes.
if the smtplib follows your advice, then
you'll be shadowing its send_email module.
The only way to avoid collision would be to name your module
__PDSFLSDF_send_email__13221sdfsdf__.py

I know, and as we all know accidental name clashes are the end of the
world and Mother Python should protect us feeble victims from any
remote possibility of ever having a name clash.


Carl Banks
 
J

Jean-Michel Pichavant

Carl said:
Of course it does, it solves the problem of having poorly-named
modules. It also helps reduce possibility of name clashes.

Actually don't you think it will increase the possibility ? There are
much less possibilties of properly naming an object than badly naming it.
So if everybody tend to properly name their object with their obvious
version like you proposed, the set of possible names will decrease,
increasing the clash ratio.

I'm just nitpicking by the way, but it may be better to ask for better
namespacing instead of naming (which is good thing but unrelated to the
OP issue).

JM
 
K

kj

In said:
On 2/2/2010 9:13 AM, kj wrote:
There was a proposal to put the whole stdlib into a gigantic package, so
that
import itertools
would become, for instance
import std.itertools.
Guido rejected that. I believe he both did not like it and was concerned
about making upgrade to 3.x even harder. The discussion was probably on
the now closed py3k list.


Thanks. I'll look for this thread.

~K
 
C

Carl Banks

Actually don't you think it will increase the possibility ? There are
much less possibilties of properly naming an object than badly naming it.

You've got to be kidding me, you're saying that a bad name like
email.py is less likely to clash than a more descriptive name like
send_email.py?
So if everybody tend to properly name their object with their obvious
version like you proposed, the set of possible names will decrease,
increasing the clash ratio.

I did not propose obvious module names. I said obvious names like
email.py are bad; more descriptive names like send_email.py are
better.


Carl Banks
 
R

Roel Schroeven

Op 2010-02-02 18:02, Nobody schreef:
I agree.


I don't know if that's necessary. Only supporting the "foo.h" case would
work fine if Python behaved like gcc, i.e. if the "current directory"
referred to the directory contain the file performing the import rather
than in the process' CWD.

That is what I would have expected, it is the way I would have
implemented it, and I don't understand why anyone would think
differently. Yet not everyone seems to agree.

Apparently, contrary to my expectations, Python looks in the directory
containing the currently running script instead. That means that the
behavior of "import foo" depends very much on circumstances not under
control of the module in which that statement appears. Very fragile.
Suggestions to use better names or just poor workarounds, IMO. Of the
same nature are suggestions to limit the amount of scrips/modules in a
directory... my /usr/bin contains no less than 2685 binaries, with 0
problems of name clashes; there is IMO no reason why Python should
restrict itself to any less.

Generally I like the design decisions used in Python, or at least I
understand the reasons; in this case though, I don't see the advantages
of the current approach.

--
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
-- Isaac Asimov

Roel Schroeven
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top