Project organization and import

C

Chris Mellon

As far as I can tell, the moment you use "from foo_module import bar",
you've broken reload(). Reloading higher level packages doesn't help.
The only practical solution I can see is to rewrite __import__ and
reload.

Example:

a.py
AExport = object()

b.py
from a import AExport

class Object(object): pass

BExport = Object()
BExport.a = AExport

interpreter session:
import b
b.AExport
b.BExport.a
import a
a.AExport
"changed a.py such that AExport = list()" 'changed a.py such that AExport = list()'
reload(b)
b.AExport
"note no change" 'note no change'
reload(a)
b.AExport
"note still no change" 'note still no change'
reload(b)
b.AExport []
"now its changed" 'now its changed'
b.BExport.a []
 
R

Russell E. Owen

"Martin Unsal said:
I'm using Python for what is becoming a sizeable project and I'm
already running into problems organizing code and importing packages.
I feel like the Python package system, in particular the isomorphism
between filesystem and namespace, doesn't seem very well suited for
big projects. However, I might not really understand the Pythonic way.
I'm not sure if I have a specific question here, just a general plea
for advice.

1) Namespace. Python wants my namespace heirarchy to match my
filesystem heirarchy. I find that a well organized filesystem
heirarchy for a nontrivial project will be totally unwieldy as a
namespace. I'm either forced to use long namespace prefixes, or I'm
forced to use "from foo import *" and __all__, which has its own set
of problems.
1a) Module/class collision. I like to use the primary class in a file
as the name of the file. However this can lead to namespace collisions
between the module name and the class name. Also it means that I'm
going to be stuck with the odious and wasteful syntax foo.foo
everywhere, or forced to use "from foo import *".

The issue of module names vs contained class names is one thing I find a
bit frustrating about python. Fortunately it is fairly easy to work
around.

My own solution has been to import up just one level. So for example:
pkg/subpkg/foo.py defines class foo and associated stuff
pkg/subpkg/bar.py defines class bar
pkt/subpkg/__init__.py contains:

from foo import *
from bar import *

To use this I then do:
import pkg.subpkg
myfoo = pkg.subpkg.foo(...)

But that's the only "from x import" that I do. I never raise stuff from
a sub-package to a higher level.

Once you do this (or in some other way eliminate the foo.foo problem), I
think you will find that python namespaces work very well for large
projects.

Overall I personally like having the namespace follow the file structure
(given that one has to use source files in the first place; my smalltalk
roots are showing). Java reportedly does much the same thing and it is
very helpful for finding code.

I'm sure it's partly what you're used to that counts. C++ experts
probably enjoy the freedom of C++ namespaces, but to me it's just a pain
that they are totally independent of file structure.
1b) The Pythonic way seems to be to put more stuff in one file, but I
believe this is categorically the wrong thing to do in large projects.
The moment you have more than one developer along with a revision
control system, you're going to want files to contain the smallest
practical functional blocks. I feel pretty confident saying that "put
more stuff in one file" is the wrong answer, even if it is the
Pythonic answer.

I don't personally find that python encourages lots of code per file. I
think this perception only stems from (1a) and once you solve that
you'll find it's fine to divide your code into small files.
2) Importing and reloading. I want to be able to reload changes
without exiting the interpreter. This pretty much excludes "from foo
import *", unless you resort to this sort of hack:

http://www.python.org/search/hypermail/python-1993/0448.html

Has anyone found a systematic way to solve the problem of reloading in
an interactive interpreter when using "from foo import *"?

I totally agree here. This is a real weakness to python and makes it
feel much more static than it ought to be. I know of no solution other
than restarting. That tends to be fast, but it can be a pain to get back
to where you were.

Smalltalk solved this problem long ago in a way that makes for very
dynamic development and debugging. Unfortunately few languages have
followed suit. The Smalltalk development environment is the one feature
I really miss in all other languages I've used (I certainly don't miss
its quirky syntax for control flow :)).

-- Russell
 
R

Russell E. Owen

"Martin Unsal said:
Now we're getting somewhere. :)


This breaks if you ever need to test more than one branch of the same
code base. I use a release branch and a development branch. Only the
release branch goes into site-packages, but obviously I do most of my
work in the development branch.

This is an interesting point that we are just facing. If you have a big
package for all your stuff and you want to separately version components
of it, you do run into problems. The solution we are adopting is to
write a custom import hook, but a simpler solution is to make sure each
separately versioned component is a top-level package (in which case you
can manipulate PYTHONPATH to temporarily "install" a test version).

-- Russell
 
B

Ben Finney

Martin Unsal said:
I think you should be asking yourselves, "Did we all abandon reload()
because it is actually an inferior workflow, or just because it's
totally broken in Python?"

I never "abandoned reload()", because it never even occurred to me to
use the interpreter for developing the code that I know is going to
end up in a file anyway. That's what my text editor is for.
 
J

Jorge Godoy

Martin Unsal said:
More work, like rewriting __import__ and reload??? :)

There's a point where you should blame the language, not the
programmer. Are you saying I'm lazy just because I don't want to mess
with __import__?

I *never* messed with __import__. And one of my systems has more than 15
packages, with an average of 7 more subpackages plus __init__.py...

Why do you need messing with __import__?
I was clearly talking about files and you assumed I was talking about
namespace. That's Pythonic thinking... and I don't mean that in a good
way!

Hmmm... Why not? How are you going to track down where is something, on
which file? I can make sucessive imports and I can subclass things, so I
might be importing a subclass of a subclass of the class that provides the
method that I want to change. Having a direct correlation helps me a lot with
big projects. For small ones I don't care since they are very simple and a
grep usually takes me directly to where I want (just to avoid tools that map
classes to files that are specific to one IDE or editor).
Because I have written a project with 50,000 lines of Python and I'm trying
to organize it in such a way that it'll scale up cleanly by another order of
magnitude. Because I've worked on projects with millions of lines of code
and I know about how such things are organized. It's funny, I'm a newbie to
Python but it seems like I'm one of the only people here thinking about it
as a large scale development language rather than a scripting language.

I don't se a problem scaling my biggest project with, now, 65K lines of code.
What are the problems you're seeing for yours? In fact, the Python part of
this code is the easiest to deal with. And there's ctypes involved here,
which messes things up a bit since I need to keep C + Python in sync.

And if I once imagined I'd write that many LOC and would reach the millions of
LOC of *Python* code then it would certainly make me feel comfortable knowing
that this approach *do* scale. At least to me and to the ones that work with
me and use the system... Implementing new features is fast and extremely
modular. There are modules specific to one client, modules specific to
another, modules shared between all clients, etc. It isn't a monolithic take
all or nothing. And even like that it works.

There are customizations on some features that only exists at one client's
branch, there are customizations that might be selected "on the fly" by
choosing something on a preferences screen, etc.

It is a "normal" (but rather complex) application on any aspect that we see
around. And it scales. I don't fear changing code. I don't fear adding new
features. It "simply works".
 
J

Jorge Godoy

Not sure I get what you mean; when I write tests, just as when I write
production code, I'm focused (not worried:) about the application

semantics... ;-) Thanks for the correction.
functionality I'm supposed to deliver. The language mostly "gets out of
my way" -- that's why I like Python, after all:).

That's the same reason why I like it. I believe it is not a coincidence that
we both like writing Python code.

But there are cases where investigating is more necessary than testing. This
is where I see the need of the interactive session. For program's features I
also write tests.
I do generally keep an interactive interpreter running in its own
window, and help and dir are probably the functions I call most often
there. If I need to microbenchmark for speed, I use timeit (which I
find far handier to use from the commandline). I wouldn't frame this as
"worried with how to best use the language" though; it's more akin to a
handy reference manual (I also keep a copy of the Nutshell handy for
exactly the same reason -- some things are best looked up on paper).

That's the same use -- and the same most used functions -- that I have here.
I believe that I wasn't clear on my previous post, and this is why you saw a
different meaning to it.
I don't really see "getting a bit big to setup" as the motivation for
writing automated, repeatable tests (including load-tests, if speed is
such a hot topic in your case); rather, the key issue is, will you ever

It's not for writing tests. It's for investigating things. If I have to open
database connections, make several queries to get to a point where I have the
object that I want to "dir()", it is easier to me to put that all in a file.
It isn't a test.

want to run this again? For example, say you want to check the relative
speeds of approaches A and B -- if you do that in a way that's not
automated and repeatable (i.e., not by writing scripts), then you'll
have to repeat those manual operations exactly every time you refactor
your code, upgrade Python or your OS or some library, switch to another
system (HW or SW), etc, etc. Even if it's only three or four steps, who
needs the aggravation? Almost anything worth doing (in the realm of
testing, measuring and variously characterizing software, at least) is
worth automating, to avoid any need for repeated manual labor; that's
how you get real productivity, by doing ever less work yourself and
pushing ever more work down to your computer.

I won't write a script to write two commands and rerun them often. But I
would for some more -- lets say starting from 5 commands I might start
thinking about having this somewhere where I can at least Cut'n'Past to the
interactive interpreter (even with readline's help).
 
S

sjdevnull

More work, like rewriting __import__ and reload??? :)

There's a point where you should blame the language, not the
programmer. Are you saying I'm lazy just because I don't want to mess
with __import__?


I don't; you do!

I was clearly talking about files and you assumed I was talking about
namespace. That's Pythonic thinking... and I don't mean that in a good
way!


Because I have written a project with 50,000 lines of Python and I'm
trying to organize it in such a way that it'll scale up cleanly by
another order of magnitude. Because I've worked on projects with
millions of lines of code and I know about how such things are
organized. It's funny, I'm a newbie to Python but it seems like I'm
one of the only people here thinking about it as a large scale
development language rather than a scripting language.

Martin


I'm still not clear on what your problem is or why you don't like
"from foo import bar". FWIW our current project is about 330,000
lines of Python code. I do a ton of work in the interpreter--I'll
often edit code and then send a few lines over to the interpreter to
be executed. For simple changes, reload() works fine; for more
complex cases we have a reset() function to clear out most of the
namespace and re-initialize. I don't really see how reload could be
expected to guess, in general, what we'd want reloaded and what we'd
want kept, so I have a hard time thinking of it as a language problem.
 
S

sjdevnull

I never "abandoned reload()", because it never even occurred to me to
use the interpreter for developing the code that I know is going to
end up in a file anyway. That's what my text editor is for.

It's most useful for debugging for me; I'll instantiate the objects of
a known bad test case, poke around, maybe put some more debugging code
into one of my classes and re-instantiate only those objects (but keep
the rest of the test objects as-is).

Even there I find that I'd rather use a scratch file in an editor to
set up the test cases and send a specified region to the interpreter
for the most part, only actually typing in the interpreter when I'm
poking at an object. I'll often wind up wanting to pull part of the
test case out either to go into the production code or to set up a
permanent unit test.

Once I figure out what's going on, the production code definitely gets
edited in the text editor.

Even though I use the interactive interpreter every day, though, I
haven't noticed reload being a major issue.
 
D

Dennis Lee Bieber

As far as I can tell, the moment you use "from foo_module import bar",
you've broken reload(). Reloading higher level packages doesn't help.
The only practical solution I can see is to rewrite __import__ and
reload.
And what behavior do you expect

import foo_module
bar = foo_module.bar
bat = bar
bar = baz
<edit foo_module file>
reload(foo_module)

to have... "from <> import <>" is equivalent...

Is bat supposed to now reference baz, since bar was rebound?

reload() brings in the changed module, and rebinds the internal
names to the new objects defined by execution of the module (class/def
are executed). the first bar would have been bound to the /object/
"foo_module.bar" defined at-that-time. bat is bound to the same object
in memory. bar is then bound to some other object, but does not rebind
bat. Reloading foo_module creates new objects and binds them, but does
not look for other bindings to former objects of...
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
T

Terry Hancock

Martin said:
I'm using Python for what is becoming a sizeable project and I'm
already running into problems organizing code and importing packages.
I feel like the Python package system, in particular the isomorphism
between filesystem and namespace, doesn't seem very well suited for
big projects.

I've never worked on what you would call a "big project", but I *am*
kind of a neat-freak/control-freak about file organization of code, so I
have tinkered with the structure of source trees in Python quite a bit.

If you want to explode a module into a lot of smaller files, you create
a package. I find that this usually works best like this (this is what
the filesystem looks like):

package_name/
package_pre.py - contains globals for the package
component_a.py - a useful-sized collection of functionality
component_b.py - another
component_c.py - another
package_post.py - stuff that relies on the prior stuff
__init__.py - or you can put the "post" stuff here

Then __init__.py contains something like:

from package_pre import *
from component_a import *
from component_b import *
from component_c import *
from package_post import *

or you can explicitly load what you need:

from package_pre import *
from component_a import A, A1, A2
from component_a import A3 as A5
from component_b import B, B1
from component_c import C, C2, C5
from package_post import *

if you want to keep the namespace cleaner.

Also, instead of just dropping things into the module's global
namespace, use an named namespace, such as a class, or use the
"package_pre" in the example above. That helps to keep things separable.

IOW, you can use __init__.py to set up the package's namespace anyway
you want, breaking the actual code up into just about as many files as
you like (I also don't like reading long source files -- I find it
easier to browse directories than source files, even with outlining
extensions. It's rare for me to have more than 2-3 classes per file).

Of course, if you *really* want your namespace to be *completely*
different from the filesystem, then there's no actual reason that all of
these files have to be in the same directory. You can use Python's
relative import (standard in Python 2.5+, available using __future__ in
2.4, IIRC) to make this easier. There was an obnoxious hack used in Zope
which used code to extract the "package_path" and then prepend that to
get absolute import locations which was necessary in earlier versions --
but I can't recommend that, just use the newer version of Python.

So, you could do evil things like this in __init__.py:

from .other_package.fiddly_bit import dunsel

(i.e. grab a module from a neighboring package)

Of course, I really can't recommend that either. Python will happily do
it, but it's a great way to shoot yourself in the foot in terms of
keeping your code organized!

The only exception to that is that I often have a "util" or "utility"
package which has a collection of little extras I find useful throughout
my project.

As for relying heavily on reload(), it isn't that great of a feature for
debugging large projects. Any code of sufficient size to make reload()
problematic, though, needs formal unit testing, anyway. The cheapest and
easiest unit test method is doctests (IMHO), so you ought to give those
a try -- I think you'll like the easy relationship those have to working
in the interactive interpreter: just walk your objects through their
paces in the interpreter, then cut-and-paste.

What reload() and the interactive interpreter is good for is
experimentation, not development.

If you need huge amounts of code to be loaded to be able to do any
useful experiments with the modules you are writing, then your code is
too tightly coupled to begin with. Try to solve that by using something
like "mock objects" to replace the full blown implementations of objects
you need for testing. I've never formally used any of the "mock"
packages, but I have done a number of tests using objects which are
dumbed-down versions of objects which are really supposed to be provided
from another module -- but I wanted to test the two separately (which is
essentially creating my own mock objects from scratch).

HTH,
Terry
 
M

Michele Simionato

2) Importing and reloading. I want to be able to reload changes
without exiting the interpreter.

What about this?

$ cat reload_obj.py
"""
Reload a function or a class from the filesystem.

For instance, suppose you have a module

$ cat mymodule.py
def f():
print 'version 1 of function f'

Suppose you are testing the function from the interactive interpreter:
version 1 of function f

Then suppose you edit mymodule.py:

$ cat mymodule.py
def f():
print 'version 2 of function f'

You can see the changes in the interactive interpreter simply by doing
version 2 of function f
"""

import inspect

def reload_obj(obj):
assert inspect.isfunction(obj) or inspect.isclass(obj)
mod = __import__(obj.__module__)
reload(mod)
return getattr(mod, obj.__name__)

Pretty simple, isn't it?

The issue is that if you have other objects dependending on the
previous version
of the function/class, they will keep depending on the previous
version, not on
the reloaded version, but you cannot pretende miracles from reload! ;)

You can also look at Michael Hudson's recipe

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/160164

for a clever approach to automatic reloading.

Michele Simionato
 
G

garylinux

package_name/
package_pre.py - contains globals for the package
component_a.py - a useful-sized collection of functionality
component_b.py - another
component_c.py - another
package_post.py - stuff that relies on the prior stuff
__init__.py - or you can put the "post" stuff here

Then __init__.py contains something like:

from package_pre import *
from component_a import *
from component_b import *
from component_c import *
from package_post import *

Anansi Spaceworkshttp://www.AnansiSpaceworks.com

Thank you! That is by far the clearest I have ever seen that
explained.
I saved it and Sent it on to a friend that is learning python.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top