How to guard against bugs like this one?

J

Jonathan Gardner

An innocuous little script, let's call it buggy.py, only 10 lines
long, and whose output should have been, at most two lines, was
quickly dumping tens of megabytes of non-printable characters to
my screen (aka gobbledygook), and in the process was messing up my
terminal *royally*.

In linux terminals, try running the command "reset" to clear up any
gobbledy-gook. It also works if you happen to hit CTRL-C while
entering a password, in the rare case that it fails to set the text
back to visible mode.
 
T

Terry Reedy

Thanks. I'll look for this thread.

Stephen Hansen's post explains a bit more than I did. To supplement his
explanation: since print *was* a keyword, every use of 'print' in 2.x
denotes a print statement with standard semantics. Therefore 2to3
*knows* what the statement means and can translate it. On the other
hand, 'import string' usually means 'import the string module of the
stdlib', but it could mean 'import my string module'. This depends on
the execution environment. Moreover, I believe people have intentionally
shadowed stdlib modules. So. like it or not, 2to3 cannot know what
'import string' means.

Terry Jan Reedy
 
S

Steven D'Aprano

I did not propose obvious module names. I said obvious names like
email.py are bad; more descriptive names like send_email.py are better.

But surely send_email.py doesn't just send email, it parses email and
receives email as well?
 
K

kj

(For reasons I don't understand Stephen Hansen's posts don't show
in my news server. I became aware of his reply from a passing
reference in one of Terry Reedy's post. Then I found Hansen's post
online, and then an earlier one, and pasted the relevant portion
below.)


First, I don't shadow built in modules. Its really not very hard to avoid.

....*if* you happen to be clairvoyant. I still don't see how the rest of us
could have followed this fine principle in the case of numbers.py
prior to Python 2.6.
Secondly, I use packages structuring my libraries, and avoid junk
directories of a hundred some odd 'scripts'.

Third, I don't execute scripts in that directory structure directly, but
instead do python -c 'from package.blah import main; main.main()' or some
such. Usually via some short-cut, or a runner batch file.

Breathtaking... I wonder why the Python documentation, in particular
the official Python tutorial, is not more forthcoming with these
rules.

~K
 
K

kj

Stephen Hansen's post explains a bit more than I did. To supplement his
explanation: since print *was* a keyword, every use of 'print' in 2.x
denotes a print statement with standard semantics. Therefore 2to3
*knows* what the statement means and can translate it. On the other
hand, 'import string' usually means 'import the string module of the
stdlib', but it could mean 'import my string module'. This depends on
the execution environment. Moreover, I believe people have intentionally
shadowed stdlib modules. So. like it or not, 2to3 cannot know what
'import string' means.

Thanks, this dispels some of the mystery.

~K
 
S

Steve Holden

kj said:
(For reasons I don't understand Stephen Hansen's posts don't show
in my news server. I became aware of his reply from a passing
reference in one of Terry Reedy's post. Then I found Hansen's post
online, and then an earlier one, and pasted the relevant portion
below.)




...*if* you happen to be clairvoyant. I still don't see how the rest of us
could have followed this fine principle in the case of numbers.py
prior to Python 2.6.
Clearly the more you know about the standard library the less likely
this is to be a problem. Had you been migrqating from an earlier version
the breakage would have alerted you to look for some version-dependent
difference.
<small>(I feel so icky now...)</small>
Be as flippant as you like, but that is good advice.
Breathtaking... I wonder why the Python documentation, in particular
the official Python tutorial, is not more forthcoming with these
rules.
Because despite the fact that this issue has clearly bitten you badly
enough to sour you against the language, such issues are remarkably rare
in practice and normally rather easier to debug.

regards
Steve
 
S

Steven D'Aprano

No, it doesn't.

Nevertheless, as a general principle, modules will tend to be multi-
purpose and/or generic. How would you rename the math or random modules
to be less "obvious" and more "descriptive"?

And of course, the less obvious the name, the harder it becomes for
people to find and use it. Which extreme would you rather?

import zip
import compress_and_decompress_files_to_zip_archives


I'm sympathetic to the position you're taking. It's not bad advice at
all, but I think you're over-selling it as a complete solution to the
problem of name clashes. I think it can only slightly alleviate the
problem of name clashes, not eliminate it.
 
K

kj

Steve, I apologize for the snarkiness of my previous reply to you.
After all, I started the thread by asking the forum for advice on
how to avoid a certain kind of bugs, you were among those who gave
me advice. So nothing other than thanking you for it was in order.
I just let myself get carried away by my annoyance with the Python
import scheme. I'm sorry about it. Even though I don't think I
can put to practice all of your advice, I can still learn a good
deal from it.

Cheers,

~kj


Steve Holden said:
Clearly the more you know about the standard library the less likely
this is to be a problem. Had you been migrqating from an earlier version
the breakage would have alerted you to look for some version-dependent
difference.

<snip>
 
K

kj

Steve, I apologize for the snarkiness of my previous reply to you.
After all, I started the thread by asking the forum for advice on
how to avoid a certain kind of bugs, you were among those who gave
me advice. So nothing other than thanking you for it was in order.
I just let myself get carried away by my annoyance with the Python
import scheme. I'm sorry about it. Even though I don't think I
can put to practice all of your advice, I can still learn a good
deal from it.


Boy, that was dumb of me. The above apology was meant for Stephen
Hansen, not Steve Holden. I guess this is now a meta-apology...
(Sheesh.)

~kj
 
T

Tim Golden

Boy, that was dumb of me. The above apology was meant for Stephen
Hansen, not Steve Holden. I guess this is now a meta-apology...
(Sheesh.)

You see? That's what I like about the Python community:
people even apologise for apologising :)

TJG
 
N

Nobody

Mostly incorrect. The CWD is in sys.path only for interactive
sessions, and when started with -c switch. When running scripts, the
directory where the script is located is used instead, not the
process's working directory.

Okay, so s/CWD/directory containing __main__ script/, but the general
argument still holds.
So, no, it isn't anything like dynamic scoping.

That's what it looks like to me. The way that an import name is resolved
depends upon the run-time context in which the import occurs.
It already is that way, chief.

I think you're misunderstanding what's wrong here; the CWD doesn't
have anything to do with it. Even if CWD isn't in the path you still
get the bad behavior kj noted. So now what?

Search for imports first in the directory containing the file performing
the import.

This is essentially the situation with gcc; the directory containing the
current file takes precedence over directories specified by -I switches.
If you want to override this, you have to use the -I- switch, which makes
it very unlikely to happen by accident.
 
S

Steve Holden

Don't give it another thought. I'd much rather you cared than you didn't ...

regards
Steve
 
S

Steve Holden

kj said:
Boy, that was dumb of me. The above apology was meant for Stephen
Hansen, not Steve Holden. I guess this is now a meta-apology...
(Sheesh.)
Oh, so you don't like *my* advice? ;-)

regards
Steve
 
D

Dan Stromberg

kj said:
I just spent about 1-1/2 hours tracking down a bug.

An innocuous little script, let's call it buggy.py, only 10 lines
long, and whose output should have been, at most two lines, was
quickly dumping tens of megabytes of non-printable characters to
my screen (aka gobbledygook), and in the process was messing up my
terminal *royally*. Here's buggy.py:



import sys
import psycopg2
connection_params = "dbname='%s' user='%s' password='%s'" % tuple(sys.argv[1:])
conn = psycopg2.connect(connection_params)
cur = conn.cursor()
cur.execute('SELECT * FROM version;')
print '\n'.join(x[-1] for x in cur.fetchall())


(Of course, buggy.py is pretty useless; I reduced the original,
more useful, script to this to help me debug it.)

Through a *lot* of trial an error I finally discovered that the
root cause of the problem was the fact that, in the same directory
as buggy.py, there is *another* innocuous little script, totally
unrelated, whose name happens to be numbers.py. (This second script
is one I wrote as part of a little Python tutorial I put together
months ago, and is not much more of a script than hello_world.py;
it's baby-steps for the absolute beginner. But apparently, it has
a killer name! I had completely forgotten about it.)

Both scripts live in a directory filled with *hundreds* little
one-off scripts like the two of them. I'll call this directory
myscripts in what follows.

It turns out that buggy.py imports psycopg2, as you can see, and
apparently psycopg2 (or something imported by psycopg2) tries to
import some standard Python module called numbers; instead it ends
up importing the innocent myscript/numbers.py, resulting in *absolute
mayhem*.

(This is no mere Python "wart"; this is a suppurating chancre, and
the fact that it remains unfixed is a neverending source of puzzlement
for me.)

How can the average Python programmer guard against this sort of
time-devouring bug in the future (while remaining a Python programmer)?
The only solution I can think of is to avoid like the plague the
basenames of all the 200 or so /usr/lib/pythonX.XX/xyz.py{,c} files,
and *pray* that whatever name one chooses for one's script does
not suddenly pop up in the appropriate /usr/lib/pythonX.XX directory
of a future release.

What else can one do? Let's see, one should put every script in its
own directory, thereby containing the damage.

Anything else?

Any suggestion would be appreciated.

TIA!

~k
Here's a pretty simple fix that should work in about any version of
python available:

Put modules in ~/lib. Put scripts in ~/bin. Your modules end with
..py. Your scripts don't. Your scripts add ~/lib to sys.path as
needed. Things that go in ~/lib are named carefully. Things in ~/bin
also need to be named carefully, but for an entirely different reason -
if you name something "ls", you may get into trouble.

Then things in ~/lib plainly could cause issues. Things in ~/bin don't.

Ending everything with .py seems to come from the perl tradition of
ending everything with .pl. This perl tradition appears to have come
from perl advocates wanting everyone to know (by looking at a URL) that
they are using a perl CGI. IMO, it's language vanity, and best
dispensed with - aside from this issue, it also keeps you from rewriting
your program in another language with an identical interface.

This does, however, appear to be a scary issue from a security
standpoint. I certainly hope that scripts running as root don't search
"." for modules.
 
C

Carl Banks

Okay, so s/CWD/directory containing __main__ script/, but the general
argument still holds.


That's what it looks like to me. The way that an import name is resolved
depends upon the run-time context in which the import occurs.

Well it has one superficial similarity to dynamic binding, but that's
pretty much it.

When the directory containing __main__ script is in the path, it
doesn't matter if you invoke it from a different directory or os.chdir
() during the program, the same modules get imported. I.e., there's
nothing dynamic at all about which modules are used. (Unless you
fiddle with the path but that's another question.)

All I'm saying is the analogy is bad. A better analogy would be if
you have lexical binding, but you automatically look in various sister
scope before your own scope.


Carl Banks
 
C

Carl Banks

Nevertheless, as a general principle, modules will tend to be multi-
purpose and/or generic.

Uh, no?

If your module is a library with a public API, then you might
defensibly have a "generic and/or multi-purpose module", but if that's
the case you should have already christened it something unique.

Otherwise modules should stick to a single purpose that can be
summarized in a short action word or phrase.


Carl Banks
 
J

John Nagle

kj wrote:
....
Through a *lot* of trial an error I finally discovered that the
root cause of the problem was the fact that, in the same directory
as buggy.py, there is *another* innocuous little script, totally
unrelated, whose name happens to be numbers.py.

The right answer to this is to make module search return an
error if two modules satisfy the search criteria. "First find"
isn't a good solution.

John Nagle
 
M

MRAB

Stephen said:
On Fri, Feb 5, 2010 at 12:16 PM, John Nagle <[email protected]

kj wrote:

Through a *lot* of trial an error I finally discovered that the
root cause of the problem was the fact that, in the same directory
as buggy.py, there is *another* innocuous little script, totally
unrelated, whose name happens to be numbers.py.


The right answer to this is to make module search return an
error if two modules satisfy the search criteria. "First find"
isn't a good solution.


And thereby slowdown every single import and application startup time as
the common case of finding a module in one of the first couple entries
in sys.path now has to search it in every single item on that path. Its
not uncommon to have a LOT of things on sys.path.

No thanks. "First Find" is good enough, especially with PEP328 and
absolute_import being on in Python 3 (and presumably 2.7). It doesn't
really help older Python versions unfortunately, but changing how import
works wouldn't help them anyways. Yeah, there might be two paths on
sys.path which both have a 'numbers.py' at the top level and First Find
might return the wrong one, but... people making poor decisions on code
organization and not using packages isn't something the language really
needs to fix.
You might want to write a script that looks through the search paths for
duplicated names, especially ones which hide modules in the standard
library. Has anyone done this already?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,182
Latest member
BettinaPol

Latest Threads

Top