import bug

K

kj

I'm running into an ugly bug, which, IMHO, is really a bug in the
design of Python's module import scheme. Consider the following
directory structure:

ham
|-- __init__.py
|-- re.py
`-- spam.py

....with the following very simple files:

% head ham/*.py
==> ham/__init__.py <==

==> ham/re.py <==

==> ham/spam.py <==
import inspect

I.e. only ham/spam.py is not empty, and it contains the single line
"import inspect".

If I now run the innocent-looking ham/spam.py, I get the following
error:

% python26 ham/spam.py
Traceback (most recent call last):
File "ham/spam.py", line 1, in <module>
import inspect
File "/usr/local/python-2.6.1/lib/python2.6/inspect.py", line 35, in <module>
import string
File "/usr/local/python-2.6.1/lib/python2.6/string.py", line 122, in <module>
class Template:
File "/usr/local/python-2.6.1/lib/python2.6/string.py", line 116, in __init__
'delim' : _re.escape(cls.delimiter),
AttributeError: 'module' object has no attribute 'escape'

or, similarly,

% python3 ham/spam.py
Traceback (most recent call last):
File "ham/spam.py", line 1, in <module>
import inspect
File "/usr/local/python-3.0/lib/python3.0/inspect.py", line 36, in <module>
import string
File "/usr/local/python-3.0/lib/python3.0/string.py", line 104, in <module>
class Template(metaclass=_TemplateMetaclass):
File "/usr/local/python-3.0/lib/python3.0/string.py", line 98, in __init__
'delim' : _re.escape(cls.delimiter),
AttributeError: 'module' object has no attribute 'escape'

My sin appears to be having the (empty) file ham/re.py. So Python
is confusing it with the re module of the standard library, and
using it when the inspect module tries to import re.

I've tried a lot of things to appease Python on this one, including
a liberal sprinkling of "from __future__ import absolute_import"
all over the place (except, of course, in inspect.py, which I don't
control), but to no avail.

I also pored over pp. 149-151 of Beazley's Python Essential Reference
(4th ed.) on anything that would shed light on this problem, and
again, nothing.

I give up: what's the trick? (Of course, renaming ham/re.py is
hardly "the trick." It's rather Procrustes' Bed.)

BTW, it is hard for me to imagine of an argument that could convince
me that this is not a design bug, and a pretty ugly one at that.
But, as they say, "hope springs eternal": is there a PEP on the
subject? (I know that there's a PEP on absolute_import, but since
absolute_import appears to be absolutely ineffectual here, I figure
I must look elsewhere for enlightenment.)

TIA!

kynn
 
J

Jon Clements

I'm running into an ugly bug, which, IMHO, is really a bug in the
design of Python's module import scheme.  Consider the following
directory structure:

ham
|-- __init__.py
|-- re.py
`-- spam.py

...with the following very simple files:

% head ham/*.py
==> ham/__init__.py <==

==> ham/re.py <==

==> ham/spam.py <==
import inspect

I.e. only ham/spam.py is not empty, and it contains the single line
"import inspect".

If I now run the innocent-looking ham/spam.py, I get the following
error:

% python26 ham/spam.py
Traceback (most recent call last):
  File "ham/spam.py", line 1, in <module>
    import inspect
  File "/usr/local/python-2.6.1/lib/python2.6/inspect.py", line 35, in <module>
    import string
  File "/usr/local/python-2.6.1/lib/python2.6/string.py", line 122, in <module>
    class Template:
  File "/usr/local/python-2.6.1/lib/python2.6/string.py", line 116, in __init__
    'delim' : _re.escape(cls.delimiter),
AttributeError: 'module' object has no attribute 'escape'

or, similarly,

% python3 ham/spam.py
Traceback (most recent call last):
  File "ham/spam.py", line 1, in <module>
    import inspect
  File "/usr/local/python-3.0/lib/python3.0/inspect.py", line 36, in <module>
    import string
  File "/usr/local/python-3.0/lib/python3.0/string.py", line 104, in <module>
    class Template(metaclass=_TemplateMetaclass):
  File "/usr/local/python-3.0/lib/python3.0/string.py", line 98, in __init__
    'delim' : _re.escape(cls.delimiter),
AttributeError: 'module' object has no attribute 'escape'

My sin appears to be having the (empty) file ham/re.py.  So Python
is confusing it with the re module of the standard library, and
using it when the inspect module tries to import re.

I've tried a lot of things to appease Python on this one, including
a liberal sprinkling of "from __future__ import absolute_import"
all over the place (except, of course, in inspect.py, which I don't
control), but to no avail.

I also pored over pp. 149-151 of Beazley's Python Essential Reference
(4th ed.) on anything that would shed light on this problem, and
again, nothing.

I give up: what's the trick?  (Of course, renaming ham/re.py is
hardly "the trick."  It's rather Procrustes' Bed.)

BTW, it is hard for me to imagine of an argument that could convince
me that this is not a design bug, and a pretty ugly one at that.
But, as they say, "hope springs eternal": is there a PEP on the
subject?  (I know that there's a PEP on absolute_import, but since
absolute_import appears to be absolutely ineffectual here, I figure
I must look elsewhere for enlightenment.)

TIA!

kynn

You can shift the location of the current directory further down the
search path.
Assuming sys.path[0] is ''...

Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import sys
sys.path = sys.path[1:] + ['']
import spam
spam.__file__
'spam.pyc'

hth
Jon.
hth,

Jon.
 
S

Stefan Behnel

kj, 31.10.2009 16:12:
My sin appears to be having the (empty) file ham/re.py. So Python
is confusing it with the re module of the standard library, and
using it when the inspect module tries to import re.

1) it's a bad idea to name your own modules after modules in the stdlib
2) this has been fixed in Py3

Stefan
 
K

kj

In said:
kj, 31.10.2009 16:12:
1) it's a bad idea to name your own modules after modules in the stdlib

Obviously, since it leads to the headaches this thread illustrates.
But there is nothing intrisically wrong with it. The fact that it
is problematic in Python is a design bug, plain and simple. There's
no rational basis for it, and represents an unreasonable demand on
module writers, since contrary to the tight control on reserved
Python keywords, there does not seem to be a similar control on
the names of stdlib modules. What if, for example, in the future
it was decided that my_favorite_module name would become part of
the standard library? This alone would cause code to break.
2) this has been fixed in Py3

In my post I illustrated that the failure occurs both with Python
2.6 *and* Python 3.0. Did you have a particular version of Python
3 in mind?

kynn
 
S

Steven D'Aprano

Obviously, since it leads to the headaches this thread illustrates. But
there is nothing intrisically wrong with it. The fact that it is
problematic in Python is a design bug, plain and simple. There's no
rational basis for it,

Incorrect. Simplicity of implementation and API is a virtue, in and of
itself. The existing module machinery is quite simple to understand, use
and maintain. Dealing with name clashes doesn't come for free. If you
think it does, I encourage you to write a patch implementing the
behaviour you would prefer.

In addition, there are use-cases where the current behaviour is the
correct behaviour. Here's one way to backport (say) functools to older
versions of Python (untested):


# === functools.py ===

import sys

if sys.version >= '2.5':
# Use the standard library version if it is available.
old_path = sys.path[:]
del sys.path[0] # Delete the current directory.
from functools import *
sys.path[:] = old_path # Restore the path.
else:
# Backport code you want.
pass



and represents an unreasonable demand on module
writers, since contrary to the tight control on reserved Python
keywords, there does not seem to be a similar control on the names of
stdlib modules. What if, for example, in the future it was decided that
my_favorite_module name would become part of the standard library? This
alone would cause code to break.

Not necessarily. Obviously your module my_favorite_module.py isn't
calling the standard library version, because it didn't exist when you
wrote it. Nor are any of your callers. Mere name clashes alone aren't
necessarily an issue. One problem comes about when some module you import
is modified to start using the standard library module, which conflicts
with yours. Example:

You have a collections module, which imports the standard library stat
module. The Python standard library can safely grow a collections module,
but what it can't do is grow a collections module *and* modify stat to
use that.

But in general, yes, you are correct -- there is a risk that future
modules added to the standard library can clash with existing third party
modules. This is one of the reasons why Python is conservative about
adding to the std lib.

In other words, yes, module naming conflicts is the Python version of DLL
Hell. Python doesn't distinguish between "my modules" and "standard
modules" and "third party modules" -- they're all just modules, there
aren't three different implementations for importing a module and you
don't have to learn three different commands to import them.

But there is a downside too: if you write "import os" Python has no
possible way of knowing whether you mean the standard os.py module or
your own os.py module.

Of course, Python does expose the import machinary to you. If avoiding
standard library names is too much a trial for you, or if you are
paranoid and want to future-proof your module against changes to the
standard library (a waste of time in my opinion), you can use Python's
import machinery to build your own system.
 
G

Gabriel Genellina

I'm running into an ugly bug, which, IMHO, is really a bug in the
design of Python's module import scheme.

The basic problem is that the "import scheme" was not designed in advance.
It was a very simple thing at first. Then came packages. And then the
__import__ builtin. And later some import hooks. And later support for zip
files. And more import hooks and meta hooks. And namespace packages. And
relative imports, absolute imports, and mixed imports. And now it's a mess.
Consider the following
directory structure:
[containing a re.py file in the same directory as the main script]

If I now run the innocent-looking ham/spam.py, I get the following
error:

% python26 ham/spam.py
Traceback (most recent call last):
[...]
File "/usr/local/python-2.6.1/lib/python2.6/string.py", line 116, in
__init__
'delim' : _re.escape(cls.delimiter),
AttributeError: 'module' object has no attribute 'escape'
My sin appears to be having the (empty) file ham/re.py. So Python
is confusing it with the re module of the standard library, and
using it when the inspect module tries to import re.

Exactly; that's the root of your problem, and has been a problem ever
since import existed.

In my post I illustrated that the failure occurs both with Python
2.6 *and* Python 3.0. Did you have a particular version of Python
3 in mind?

If the `re` module had been previously loaded (the true one, from the
standard library) then this bug is not apparent. This may happen if re is
imported from site.py, sitecustomize.py, any .pth file, the PYTHONSTARTUP
script, perhaps other sources...

The same error happens if ham\spam.py contains the single line: import
smtpd, and instead of re.py there is an empty asyncore.py file; that fails
on 3.1 too.

En Sat, 31 Oct 2009 22:27:09 -0300, Steven D'Aprano
Incorrect. Simplicity of implementation and API is a virtue, in and of
itself. The existing module machinery is quite simple to understand, use
and maintain.

Uhm... module objects might be quite simple to understand, but module
handling is everything but simple! (simplicity of implem...? quite simple
to WHAT? ROTFLOL!!! :) )
Dealing with name clashes doesn't come for free. If you
think it does, I encourage you to write a patch implementing the
behaviour you would prefer.

I'd say it is really a bug, and has existed for a long time.
One way to avoid name clashes would be to put the entire standard library
under a package; a program that wants the standard re module would write
"import std.re" instead of "import re", or something similar.
Every time the std package is suggested, the main argument against it is
backwards compatibility.
In addition, there are use-cases where the current behaviour is the
correct behaviour. Here's one way to backport (say) functools to older
versions of Python (untested):

You still would be able to backport or patch modules, even if the standard
ones live in the "std" package.

I've tried a lot of things to appease Python on this one, including
a liberal sprinkling of "from __future__ import absolute_import"
all over the place (except, of course, in inspect.py, which I don't
control), but to no avail.

I think the only way is to make sure *your* modules always come *after*
the standard ones in sys.path; try using this code right at the top of
your main script:

import sys, os.path
if sys.argv[0]:
script_path = os.path.dirname(os.path.abspath(sys.argv[0]))
else:
script_path = ''
if script_path in sys.path:
sys.path.remove(script_path)
sys.path.append(script_path)

(I'd want to put such code in sitecustomize.py, but sys.argv doesnt't
exist yet at the time sitecustomize.py is executed)
 
S

Steven D'Aprano

Uhm... module objects might be quite simple to understand, but module
handling is everything but simple! (simplicity of implem...? quite
simple to WHAT? ROTFLOL!!! )

I stand corrected :)


Nevertheless, the API is simple: the first time you "import name", Python
searches a single namespace (the path) for a module called name. There
are other variants of import, but the basics remain:

search the path for the module called name, and do something with the
first one you find.

I'd say it is really a bug, and has existed for a long time.

Since import is advertised to return the first module with the given name
it finds, I don't see it as a bug even if it doesn't do what the
programmer intended it to do. If I do this:
.... print len(s)
....Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in parrot
TypeError: 'int' object is not callable


it isn't a bug in Python that I have misunderstood scopes and
inadvertently shadowed a builtin. Shadowing a standard library module is
no different.

One way to
avoid name clashes would be to put the entire standard library under a
package; a program that wants the standard re module would write "import
std.re" instead of "import re", or something similar. Every time the std
package is suggested, the main argument against it is backwards
compatibility.

You could do it in a backwards compatible way, by adding the std package
directory into the path.
 
G

Gabriel Genellina

En Sun, 01 Nov 2009 02:54:15 -0300, Steven D'Aprano
I stand corrected :)
Nevertheless, the API is simple: the first time you "import name", Python
searches a single namespace (the path) for a module called name. There
are other variants of import, but the basics remain:

search the path for the module called name, and do something with the
first one you find.

Sure, beautiful, a plain and simple search over a list of directories.
That's how it worked in Python 1.4, I think...
Now you have lots of "hooks" and even "meta-hooks": sys.meta_path,
sys.path_hooks, sys.path_importer_cache. And sys.path, of course, which
may contain other things apart of directory names (zip files, eggs, and
even instances of custom "loader" objects...). PEP 302 explains this but
I'm not sure the description is still current. PEP369, if approved, would
add even more hooks.
Add packages to the picture, including relative imports and __path__[]
processing, and it becomes increasingly harder to explain.
Bret Cannon has rewritten the import system in pure Python (importlib) for
3.1; this should help to understand it, I hope.
The whole system works, yes, but looks to me more like a collection of
patches over patches than a coherent system. Perhaps this is due to the
way it evolved.
I'd say it is really a bug, and has existed for a long time.

Since import is advertised to return the first module with the given name
it finds, I don't see it as a bug even if it doesn't do what the
programmer intended it to do. [...] Shadowing a standard library module
is no different.

But that's what namespaces are for; if the standard library had its own
namespace, such collisions would not occur. I can think of C++, Java, C#,
all of them have some way of qualifying names. Python too - packages. But
nobody came with a method to apply packages to the standard library in a
backwards compatible way. Perhaps those name collisions are not considered
serious. Perhaps every user module should live in packages and only the
standard library has the privilege of using the global module namespace.
Both C++ and XML got namespaces late in their life so in principle this
should be possible.
You could do it in a backwards compatible way, by adding the std package
directory into the path.

Unfortunately you can't, at least not without some special treatment of
the std package. One of the undocumented rules of the import system is
that you must not have more than one way to refer to the same module (in
this case, std.re and re). Suppose someone imports std.re; an entry in
sys.modules with that name is created. Later someone imports re; as there
is no entry in sys.modules with such name, the re module is imported
again, resulting in two module instances, darkness, weeping and the
gnashing of teeth :)
(I'm sure you know the problem: it's the same as when someone imports the
main script as a module, and gets a different module instance because the
"original" is called __main__ instead).
 
M

MRAB

Gabriel Genellina wrote:
[snip]
Unfortunately you can't, at least not without some special treatment
of the std package. One of the undocumented rules of the import
system is that you must not have more than one way to refer to the
same module (in this case, std.re and re). Suppose someone imports
std.re; an entry in sys.modules with that name is created. Later
someone imports re; as there is no entry in sys.modules with such
name, the re module is imported again, resulting in two module
instances, darkness, weeping and the gnashing of teeth :) (I'm sure
you know the problem: it's the same as when someone imports the main
script as a module, and gets a different module instance because the
"original" is called __main__ instead).
Couldn't the entry in sys.modules be where the module was found, so that
if 're' was found in 'std' then the entry is 'std.re' even if the import
said just 're'?
 
S

Steven D'Aprano

En Sun, 01 Nov 2009 02:54:15 -0300, Steven D'Aprano


Sure, beautiful, a plain and simple search over a list of directories.
That's how it worked in Python 1.4, I think... Now you have lots of
"hooks" and even "meta-hooks": sys.meta_path, sys.path_hooks,
sys.path_importer_cache. And sys.path, of course, which may contain
other things apart of directory names (zip files, eggs, and even
instances of custom "loader" objects...).

You'll notice I deliberately didn't refer to directories. I just said
"the path". If the path contains things other than directories, they are
searched too.

PEP 302 explains this but I'm
not sure the description is still current. PEP369, if approved, would
add even more hooks.
Add packages to the picture, including relative imports and __path__[]
processing, and it becomes increasingly harder to explain. Bret Cannon
has rewritten the import system in pure Python (importlib) for 3.1; this
should help to understand it, I hope. The whole system works, yes, but
looks to me more like a collection of patches over patches than a
coherent system. Perhaps this is due to the way it evolved.

You've convinced me that the implementation of the import infrastructure
isn't as clean as I imagined. I'm sure it's a gnarly hack on top of
gnarly hacks, and that maintaining it requires heroic measures worthy of
a medal *grin*.

Dealing with name clashes doesn't come for free. If you think it
does, I encourage you to write a patch implementing the behaviour you
would prefer.

I'd say it is really a bug, and has existed for a long time.

Since import is advertised to return the first module with the given
name it finds, I don't see it as a bug even if it doesn't do what the
programmer intended it to do. [...] Shadowing a standard library module
is no different.

But that's what namespaces are for; if the standard library had its own
namespace, such collisions would not occur.


Sure. But that's not a bug in the import system. If it's a bug, it's a
bug in the layout of the standard library.

Unfortunately you can't, at least not without some special treatment of
the std package. One of the undocumented rules of the import system is
that you must not have more than one way to refer to the same module (in
this case, std.re and re).

*slaps head*

How obvious in hindsight.
 
G

Gabriel Genellina

Gabriel Genellina wrote:
One way to avoid name clashes would be to put the entire standard
library under a package; a program that wants the standard re
module would write "import std.re" instead of "import re", or
something similar.
You could do it in a backwards compatible way, by adding the std
package directory into the path.
Unfortunately you can't, at least not without some special treatment
of the std package. One of the undocumented rules of the import
system is that you must not have more than one way to refer to the
same module (in this case, std.re and re). [...]
Couldn't the entry in sys.modules be where the module was found, so that
if 're' was found in 'std' then the entry is 'std.re' even if the import
said just 're'?

What about a later 'import re'? 're' would not be found in sys.modules
then.
In any case, it requires a change in the current behavior, a PEP, and a
lot of discussion...
 
G

Gabriel Genellina

En Sun, 01 Nov 2009 19:51:04 -0300, Steven D'Aprano
Sure. But that's not a bug in the import system. If it's a bug, it's a
bug in the layout of the standard library.

Half and half? The standard library cannot have a different structure
because the import system cannot handle it in a backgwards compatible way?
 
C

Carl Banks

I'm running into an ugly bug, which, IMHO, is really a bug in the
design of Python's module import scheme.  Consider the following
directory structure:

ham
|-- __init__.py
|-- re.py
`-- spam.py

...with the following very simple files:

% head ham/*.py
==> ham/__init__.py <==

==> ham/re.py <==

==> ham/spam.py <==
import inspect

I.e. only ham/spam.py is not empty, and it contains the single line
"import inspect".

If I now run the innocent-looking ham/spam.py, I get the following
error:

% python26 ham/spam.py
Traceback (most recent call last):
  File "ham/spam.py", line 1, in <module>
    import inspect
  File "/usr/local/python-2.6.1/lib/python2.6/inspect.py", line 35, in <module>
    import string
  File "/usr/local/python-2.6.1/lib/python2.6/string.py", line 122, in <module>
    class Template:
  File "/usr/local/python-2.6.1/lib/python2.6/string.py", line 116, in __init__
    'delim' : _re.escape(cls.delimiter),
AttributeError: 'module' object has no attribute 'escape'

or, similarly,

% python3 ham/spam.py
Traceback (most recent call last):
  File "ham/spam.py", line 1, in <module>
    import inspect
  File "/usr/local/python-3.0/lib/python3.0/inspect.py", line 36, in <module>
    import string
  File "/usr/local/python-3.0/lib/python3.0/string.py", line 104, in <module>
    class Template(metaclass=_TemplateMetaclass):
  File "/usr/local/python-3.0/lib/python3.0/string.py", line 98, in __init__
    'delim' : _re.escape(cls.delimiter),
AttributeError: 'module' object has no attribute 'escape'

My sin appears to be having the (empty) file ham/re.py.  So Python
is confusing it with the re module of the standard library, and
using it when the inspect module tries to import re.


Python is documented as behaving this way, so this is not a bug.

It is arguably poor design. However, Guido van Rossum already ruled
against using a single package for the standard library, and its not
likely that special case code to detect accidental name-clashes with
the standard library is going to be added, since there are legitimate
reasons to override the standard library.

So for better or worse, you'll just have to deal with it.


Carl Banks
 
A

Ask Solem

Python is documented as behaving this way, so this is not a bug.

It is arguably poor design.  However, Guido van Rossum already ruled
against using a single package for the standard library, and its not
likely that special case code to detect accidental name-clashes with
the standard library is going to be added, since there are legitimate
reasons to override the standard library.

So for better or worse, you'll just have to deal with it.

Carl Banks

Just have to add that you're not just affected by the standard
library.
If you have a module named myapp.django, and someone writes a cool
library called
django that you want to use, you can't use it unless you rename your
local django module.


file myapp/django.py:

from django.utils.functional import curry

ImportError: No module named utils.functional

At least that's what I get, maybe there is some workaround, some way
to say this is an absolute path?
 
G

Gabriel Genellina

If you have a module named myapp.django, and someone writes a cool
library called
django that you want to use, you can't use it unless you rename your
local django module.


file myapp/django.py:

from django.utils.functional import curry

ImportError: No module named utils.functional

At least that's what I get, maybe there is some workaround, some way
to say this is an absolute path?

Yes, that's exactly the way to solve it. Either move on to Python 3, or
use:
from __future__ import absolute_import

When absolute imports are in effect, and assuming your code is inside a
package, then neither "import re" nor "from django.utils.functional import
curry" are affected by your own module names, because those statements
imply an absolute import ("absolute" means that the module is searched
along sys.path).
The only way to import a local file "re.py" is using "from .re import
something"; the leading dot means it's a relative import ("relative" means
that the module is searched in a single directory: the current package
directory and its parents, depending on how many dots are specified)
 
M

Mark Leander

I give up: what's the trick? (Of course, renaming ham/re.py is
hardly "the trick." It's rather Procrustes' Bed.)

I realize that this is probably not the answer you were looking for,
but:

$ python -m ham.spam

or

==> ./spammain.py <==
import ham.spam

$ python spammain.py


I've found it easier to not fight the module/package system but work
with it. But yes, I also think the problem you're seeing is a wart or
bug even.

Best regards
Mark Leander
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top