I thought I understood how import worked...

R

Roy Smith

I've been tracking down some weird import problems we've been having with
django. Our settings.py file is getting imported twice. It has some
non-idempotent code in it, and we blow up on the second import.

I thought modules could not get imported twice. The first time they get
imported, they're cached, and the second import just gets you a reference to the
original. Playing around, however, I see that it's possible to import a module
twice if you refer to it by different names. Here's a small-ish test case which
demonstrates what I'm talking about (python 2.6.5):

In directory /home/roy/play/import/foo, I've got:

__init__.py (empty file)
try.py
broken.py


$ cat broken.py
print __file__


$ cat try.py
import broken
import foo.broken

import sys
for m in sys.modules.items():
if m[0].endswith('broken'):
print m


And when I run try.py (with foo as the current directory):

$ PYTHONPATH=/home/roy/play/import python try.py
/home/roy/play/import/foo/broken.pyc
/home/roy/play/import/foo/broken.pyc
('broken', <module 'broken' from '/home/roy/play/import/foo/broken.pyc'>)
('foo.broken', <module 'foo.broken' from '/home/roy/play/import/foo/broken.pyc'>)


So, it appears that you *can* import a module twice, if you refer to it by
different names! This is surprising. It means that having non-idempotent code
which is executed at import time is a Bad Thing.

It also means that you could have multiple copies of a module's global
namespace, depending on how your users imported the module. Which is kind of
mind-blowing.
 
S

Steven D'Aprano

I thought modules could not get imported twice. The first time they get
imported, they're cached, and the second import just gets you a
reference to the original. Playing around, however, I see that it's
possible to import a module twice if you refer to it by different names.

Yes. You've found a Python gotcha.

The most common example of this is when a single file acts as both an
importable module, and as a runnable script. When run as a script, it is
known as "__main__". When imported, it is known by the file name. Unless
you take care, it is easy to end up with the module imported twice.

The usual advice is "never have one module used as both script and
importable module". I think *never* is a bit strong, but if you do so,
you need to take extra care.
Here's a small-ish test case which demonstrates what I'm talking about
(python 2.6.5):

In directory /home/roy/play/import/foo, I've got:

__init__.py (empty file)
try.py
broken.py

Aside: calling a module "try.py" is asking for trouble, because you can't
do this:

import try

$ cat broken.py
print __file__


$ cat try.py
import broken
import foo.broken

Which are two names for the same module.

So, it appears that you *can* import a module twice, if you refer to it
by different names! This is surprising. It means that having
non-idempotent code which is executed at import time is a Bad Thing.

Well yes. In general, you should avoid non-idempotent code. You should
doubly avoid it during imports, and triply avoid it on days ending with Y.

The rest of the time, it is perfectly safe to have non-idempotent code.

:)

I kid, of course, but only half. Side-effects are bad, non-idempotent
code is bad, and you should avoid them as much as possible, and unless
you have no other reasonable choice.

It also means that you could have multiple copies of a module's global
namespace, depending on how your users imported the module. Which is
kind of mind-blowing.

Oh that part is trivial. Module namespaces are just dicts, there's
nothing special about them.

py> import math # for example
py> import copy
py> namespace = copy.deepcopy(math.__dict__)
py> math.__dict__ == namespace
True
py> math.__dict__ is namespace
False


It are modules which should be special, and Python tries really hard to
ensure that they are singletons. (Multitons?) But not superhumanly hard.
 
M

Mark Lawrence

I don't think the modules are actually imported twice. The entry is just
doubled;that's all

Please don't top post, this is the third time of asking.
 
L

Laszlo Nagy

Roy Smith said:
So, it appears that you *can* import a module twice, if you refer to
it by different names! This is surprising.
The tutorial is misleading on this. It it says plainly:

A module can contain executable statements as well as function
definitions. […] They are executed only the *first* time the module
is imported somewhere.

<URL:http://docs.python.org/tutorial/modules.html>

but it doesn't make clear that a module can exist in the ‘sys.modules’
list multiple times under different names.
sys.modules is a dict. But yes, there can be multiple "instances" of the
same module loaded.

What I do with bigger projects is that I always use absolute module
names. For example, when I develop a project called "project1" that has
several sub packages, then I always do these kinds of imports:

from project1.package1.subpackage2.submodule3 import *
from project1.package1.subpackage2 import submodule3
from project1.package1.subpackage2.submodule3 import some_class

Even from a source file that is inside project1.package1.subpackage2, I
tend to import them the same way. This makes sure that every module is
imported under the same package path.

You just need to make sure that the main project has a unique name
(which is usually the case) and that it is on your sys path (which is
usually the case, especially when the script is started in the project's
directory).

The cost is that you have to type more. The benefit is that you can be
sure that you are importing the thing that you want to import, and there
will be no multiple imports for the same module.

Mabye somebody will give method that works even better.

For small projects without sub-packages, it is not a problem.

Best,

Laszlo
 
R

Roy Smith

In general, you should avoid non-idempotent code. You should
doubly avoid it during imports, and triply avoid it on days ending with Y.

I don't understand your aversion to non-idempotent code as a general rule. Most code is non-idempotent. Surely you're not saying we should never write:

???

Making top-level module code idempotent, I can understand (given this new-found revelation that modules aren't really singletons), but you seem to be arguing something stronger and more general.
 
R

Roy Smith

The tutorial is misleading on this. It it says plainly:

A module can contain executable statements as well as function
definitions. […] They are executed only the *first* time the module
is imported somewhere.

<URL:http://docs.python.org/tutorial/modules.html>

That's more than misleading. It's plain wrong. The example I gave demonstrates the "print __file__" statement getting executed twice.

The footnote to that is wrong too:
[1] In fact function definitions are also ‘statements’ that are ‘executed’; the execution of a
module-level function enters the function name in the module’s global symbol table.

I think what it's supposed to say is "... the execution of a module-level def statement ..."
Care to file a documentation bug <URL:http://bugs.python.org/>
describing this?

Sure, once I understand how it's really supposed to work :)
 
P

Paul Rubin

Roy Smith said:
I don't understand your aversion to non-idempotent code as a general
rule. Most code is non-idempotent. Surely you're not saying we
should never write:
???

I don't think "in general avoid" means the same thing as "never write".

One of the tenets of the functional-programming movement is that it is
in fact reasonable to write in a style that avoids "foo += 1" and
"my_list.pop()" most of the time, leading to cleaner, more reliable
code.

In Python it's not possible to get rid of ALL of the data mutation
without horrendous contortions, but it's pretty easy (and IMHO of
worthwhile benefit) to avoid quite a lot of it.
 
T

Terry Reedy

I don't think the modules are actually imported twice.

This is incorrect as Roy's original unposted example showed.
Modify one of the two copies and it will be more obvious.

PS. I agree with Mark about top posting. I often just glance as such
postings rather that go look to find out the context. However, this one
is wrong on its own ;-).
 
T

Terry Reedy

The tutorial is misleading on this. It it says plainly:

A module can contain executable statements as well as function
definitions. […] They are executed only the *first* time the
module is imported somewhere.

The last sentence should be more like "They are executed only the
*first* time the module is imported anywhere with a particular name.
(One should avoid importing a module under different names.)
<URL:http://docs.python.org/tutorial/modules.html>
[1] In fact function definitions are also ‘statements’that are
‘executed’; the execution of a module-level function enters the
function name in the module’s global symbol table.

I think what it's supposed to say is "... the execution of a
module-level def statement ..."
right
Care to file a documentation bug <URL:http://bugs.python.org/>
describing this?

Sure, once I understand how it's really supposed to work :)

You don't need a final solution to file. Anyway, I think the change
above might be enough.
 
M

Mark Lawrence

I've been tracking down some weird import problems we've been having with
django. Our settings.py file is getting imported twice. It has some
non-idempotent code in it, and we blow up on the second import.

I thought modules could not get imported twice. The first time they get
imported, they're cached, and the second import just gets you a reference to the
original. Playing around, however, I see that it's possible to import a module
twice if you refer to it by different names. Here's a small-ish test case which
demonstrates what I'm talking about (python 2.6.5):

In directory /home/roy/play/import/foo, I've got:

__init__.py (empty file)
try.py
broken.py


$ cat broken.py
print __file__


$ cat try.py
import broken
import foo.broken

import sys
for m in sys.modules.items():
if m[0].endswith('broken'):
print m


And when I run try.py (with foo as the current directory):

$ PYTHONPATH=/home/roy/play/import python try.py
/home/roy/play/import/foo/broken.pyc
/home/roy/play/import/foo/broken.pyc
('broken', <module 'broken' from '/home/roy/play/import/foo/broken.pyc'>)
('foo.broken', <module 'foo.broken' from '/home/roy/play/import/foo/broken.pyc'>)


So, it appears that you *can* import a module twice, if you refer to it by
different names! This is surprising. It means that having non-idempotent code
which is executed at import time is a Bad Thing.

It also means that you could have multiple copies of a module's global
namespace, depending on how your users imported the module. Which is kind of
mind-blowing.

Maybe not directly applicable to what you're saying, but Brett Cannon
ought to know something about the import mechanism. I believe he's been
working on it on and off for several years. See
http://docs.python.org/dev/whatsnew/3.3.html for a starter on the gory
details.
 
S

Steven D'Aprano

You seem to have accidentally deleted my smiley.
I don't understand your aversion to non-idempotent code as a general
rule. Most code is non-idempotent.

That doesn't necessarily make it a good thing. Most code is also buggy.

Surely you're not saying we should never write:


???

Of course not. I'm not going so far as to say that we should always,
without exception, write purely functional code. I like my list.append as
much as anyone :)

But at the level of larger code units, functions and modules, it is a
useful property to have where possible. A function is free to increment
an integer, or pop items from a list, as much as it likes -- so long as
they are *local* to the function, and get reset to their initial state
each time the function is called with the same arguments.

I realise that many problems are most easily satisfied by non-idempotent
tactics. "Customer orders widget" is not naturally idempotent, since if
the customer does it twice, they get two widgets, not one. But such
behaviour should be limited to the parts of your code which must be non-
idempotent.

In short, non-idempotent code is hard to get right, hard to test, and
hard to debug, so we should use as little of it as possible.
 
C

Cameron Simpson

| On Tue, 07 Aug 2012 09:18:26 -0400, Roy Smith wrote:
| > I thought modules could not get imported twice. The first time they get
| > imported, they're cached, and the second import just gets you a
| > reference to the original. Playing around, however, I see that it's
| > possible to import a module twice if you refer to it by different names.
|
| Yes. You've found a Python gotcha.
[...]
| > $ cat try.py
| > import broken
| > import foo.broken
|
| Which are two names for the same module.
[...]

This, I think, is a core issue in this misunderstanding. (I got bitten
by this too, maybe a year ago. My error, and I'm glad to have improved
my understanding.)

All of you are saying "two names for the same module", and variations
thereof. And that is why the doco confuses.

I would expect less confusion if the above example were described as
_two_ modules, with the same source code.

Make it clear that these are _two_ modules (because they have two
names), who merely happen to have been obtained from the same "physical"
filesystem object due to path search effects i.e. change the doco
wording to describe a module as the in-memory result of reading a "file"
found from an import name.

So I think I'm arguing for a small change in terminology in the doco
with no change in Python semantics. Is a module a set of files on the
disc, or an in-memory Python notion with a name? I would argue for the
latter.

With such a change, the "a module can't be imported twice" would then be
true (barring hacking around in sys.modules between imports).

Cheers,
 
R

Roy Smith

Cameron Simpson said:
This, I think, is a core issue in this misunderstanding. (I got bitten
by this too, maybe a year ago. My error, and I'm glad to have improved
my understanding.)

All of you are saying "two names for the same module", and variations
thereof. And that is why the doco confuses.

I would expect less confusion if the above example were described as
_two_ modules, with the same source code.

+1
 
L

Laszlo Nagy

That's not true though, is it? It's the same module object with two
different references, I thought.
They are not the same. Proof:

$ mkdir test
$ cd test
$ touch __init__.py
$ touch m.py
$ cd ..
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'a'

So it is still true that top level code gets executed only once, when
the module is first imported. The trick is that a module is not a file.
It is a module object that is created from a file, with a name. If you
change the name, then you create ("import") a new module.

You can also use the reload() function to execute module level code
again, but it won't create a new module object. It will just update the
contents of the very same module object:

What is more interesting is how the reload() function works:

Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 
J

Jean-Michel Pichavant

Roy said:
So, it appears that you *can* import a module twice, if you refer to it by
different names! This is surprising. It means that having non-idempotent code
which is executed at import time is a Bad Thing.
Not exactly, it means that one module is different from another if its
path is different.
That means you need to be extra careful about how you reference a
module. Content is not used to discriminate modules.

JM
 
R

Roy Smith

Ben Finney said:
That's not true though, is it? It's the same module object with two
different references, I thought.

Nope. I modified my test case to print out the id of the module:

('broken', <module 'broken' from
'/home/roy/play/import/foo/broken.pyc'>) 140608299115512
('foo.broken', <module 'foo.broken' from
'/home/roy/play/import/foo/broken.pyc'>) 140608299116232
Also, even if what you say were true, “source code†implies the module
was loaded from source code, when Python allows loading modules with no
source code available.

This is true. In fact, when I first started chasing this down, one
import was finding the .py file, and the other the .pyc file (created
during the first import). I originally assumed that the .py/.pyc
distinction was the critical piece of the puzzle (and went down a
rathole exploring that).

Then I went down a different rathole when I noticed that one code path
was doing "import settings", while the other was doing
"__import__(module_name)", thinking import and __import__ were somehow
doing different things.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top