Project organization and import

C

Chris Mellon

Then we're in total agreement. I'm not sure why you thought my
opinions were the result of baggage from other languages when you
don't seem to actually disagree with me.

Because you're advocating single class per file. A scan through the
standard library may be instructive, where there are some modules that
expose a single class (StringIO, pprint) and others that expose many,
and some that expose none at all. "smallest unit that it makes sense
to work on" and "single class" are totally different things. In any
case, as I hinted at, I prefer an organic, developer driven approach
to deciding these things, not handed down from above style guidelines.
You know your modules are broken up enough when you no longer have
conflicts.
I know all about incremental builds and I just don't think people use
small compilation units in C++ to make their builds faster. It
certainly never been the reason why I subdivided a source file.

Faster compile/debug/edit cycle is the main justification I've heard
for single class per file. The others are variations of your RCS
argument, which I don't think is justifiable for the above reasons. It
smells of the kind of "my developers are stupid" short sighted
management that kills projects.
I don't think reload works for anything but trivial scripts. The
moment you use "from foo import bar" reload is broken.


I agree that there is some subtlety there, and I appreciate your
example. However the fact that Python's module system essentially
forces you to use "from foo import *" and that reload is almost
entirely imcompatible with "from foo import *", I would say that
reload is essentially useless.

I'm still not sure why you believe this, since several counterexamples
where given. As an intellectual exercise, though, lets assume that
reload is totally broken and you just can't use it. Pretend it will
reformat your machine if you ever call it. Can you really think of no
other reason to use Python? You still haven't given any justification
for why a magic reload is essential to Python development when a) all
existing python development works fine without it and b) all existing
development in every other language works fine without it.
Well "from foo import Foo" is just a special case of "from foo import
*". :) It still breaks reload. It still means you're restarting your
interpreter even to do the most trivial development cycle.

You're totally fixated on reload. I don't understand this. I'm totally
positive that your traditional development experience has not been in
an environment where you could effortlessly slot in new code to a
running image. Why do you demand it from Python?

Also, the difference between "from foo import Bar" and "from foo
import *" is that the former is limited in scope (you're adding a
limited set of explicit names to your namespace) and is futureproof
(additional names exported from foo won't clash with vars in the
importing module with unknown effects). The reason why one is common
and accepted and the other is frowned upon has nothing to do with
reload().
I'm perfectly well aware that I'm not going to be able to reload a
widget in the middle of a running GUI app, for example. I'm not
looking for gotcha free, I'll settle for minimally useful.

Then reload() as is is what you want.
Here's an analogy. In C, you can do an incremental build and run your
modified application without having to first reboot your computer. In
Python, where reload() is essentially the incremental build process,
and the interpreter is essentially a virtual machine, you guys are
saying that my best option is to just "reboot" the virtual machine to
make sure I have a "clean slate". It may be the path of least
resistance, but to say that it is necessary or inevitable is 1960s
mainframe thinking.

But you do need to restart the application image. The python
interpreter is not an emulator. You're drawing incompatible analogies
and making unjustified assumptions based on them. reload() is not an
incremental build process, and starting a new Python instance is not
rebooting your machine. This is just not a justifiable comparison.
 
A

Alex Martelli

Martin Unsal said:
From the way you describe your workflow, it sounds like you spend very
little time working interactively in the interpreter. Is that the case
or have I misunderstood?

I often do have an interpreter open in its own window, to help me find
out something or other, but you're correct that it isn't where I "work";
I want all tests to be automated and repeatable, after all, so they're
better written as their own scripts and run in the test-framework. I
used to use a lot of doctests (often produced by copy and paste from an
interactive interpreter session), but these days I lean more and more
towards unittest and derivatives thereof.

Sometimes, when I don't immediately understand why a test is failing
(or, at times, why it's unexpectedly succeeding _before_ I have
implemented the feature it's supposed to test!-), I stick a
pdb.set_trace() call at the right spot to "look around" (and find out
how to fix the test and/or the code) -- I used to use "print" a lot for
such exploration, but the interactive interpreter started by pdb is
often handier (I can look at as many pieces of data as I need to find
out about the problem). I still prefer to run the test within the
test framework, getting interactive only at the point where I want to
be, rather than running the tests from within pdb to "set breakpoints"
manually -- not a big deal either way, I guess.


Alex
 
A

Alex Martelli

Jorge Godoy said:
I believe this is a distinct case. When we write tests we're worried with the
system itself.

Not sure I get what you mean; when I write tests, just as when I write
production code, I'm focused (not worried:) about the application
functionality I'm supposed to deliver. The language mostly "gets out of
my way" -- that's why I like Python, after all:).

When using the interactive interpreter we're worried with how
to best use the language. There might be some feature of the system related
to that investigation, but there might be not. For example: "what are the
methods provided by this object?" or "which approach is faster for this loop?"

I do generally keep an interactive interpreter running in its own
window, and help and dir are probably the functions I call most often
there. If I need to microbenchmark for speed, I use timeit (which I
find far handier to use from the commandline). I wouldn't frame this as
"worried with how to best use the language" though; it's more akin to a
handy reference manual (I also keep a copy of the Nutshell handy for
exactly the same reason -- some things are best looked up on paper).

I won't write a test case to test loop speed. But I'd poke with the
interpreter and if the environment gets a bit big to setup then I'd go to the
text editor as I said.

I don't really see "getting a bit big to setup" as the motivation for
writing automated, repeatable tests (including load-tests, if speed is
such a hot topic in your case); rather, the key issue is, will you ever
want to run this again? For example, say you want to check the relative
speeds of approaches A and B -- if you do that in a way that's not
automated and repeatable (i.e., not by writing scripts), then you'll
have to repeat those manual operations exactly every time you refactor
your code, upgrade Python or your OS or some library, switch to another
system (HW or SW), etc, etc. Even if it's only three or four steps, who
needs the aggravation? Almost anything worth doing (in the realm of
testing, measuring and variously characterizing software, at least) is
worth automating, to avoid any need for repeated manual labor; that's
how you get real productivity, by doing ever less work yourself and
pushing ever more work down to your computer.


Alex
 
M

Martin Unsal

Bruno said:
<imho>
Which is not a problem. reload() is of very limited use for any
non-trivial stuff.
</imho>

Now that I've heard this from 5 different people it might be sinking
in. :) :) I really do appreciate all of you taking the time to explain
this to me.

When I started using Python a few years ago I was very excited about
the fact that it was an interpreted language and offered a more
interactive workflow than the old compile-link-test workflow. As my
project has grown to be pretty sizeable by Python standards, I tried
to continue taking advantage of the tight, reload-based, interpreted-
language workflow and it's become really cumbersome, which is
disappointing. However y'all are right, giving up on reload() doesn't
mean Python is inadequate for large projects, just that it doesn't
live up entirely to what I perceived as its initial promise. Once I
adjust my mindset and workflow for a life without reload(), I'll
probably be better off.

I'd like to point out something though. More than one of the people
who responded have implied that I am bringing my prior-language
mindset to Python, even suggesting that my brain isn't built for
Python. ;) In fact I think it's the other way around. I am struggling
to take full advantage of the fact that Python is an interpreted
language, to use Python in the most "Pythonic" way. You guys are
telling me that's broken and I should go back to a workflow that is
identical in spirit, and not necessarily any faster than I would use
with a compiled language. While that might be the right answer in
practice, I don't feel like it's a particularly "good" answer, and it
confirms my initial impression that Python package management is
broken.

I think you should be asking yourselves, "Did we all abandon reload()
because it is actually an inferior workflow, or just because it's
totally broken in Python?"

I have one question left but I'll ask that in a separate post.

Martin
 
M

Martin Unsal

Martin Unsal a écrit :

Some of us still manage to do so without messing with PYTHONPATH.

How exactly do you manage it?

The only way I can see to do it is to have widgets/__init__.py look
something like this:

from common import util
from scrollbar import Scrollbar
from form import Form

Then Scrollbar.py doesn't have to worry about importing util, it just
assumes that util is already present in its namespace.

BUT ... this means that Scrollbar.py can only be loaded in the
interpreter as part of package "widgets". You can't run an interpreter
and type "import widgets.scrollbar.Scrollbar" and start going to town,
because Scrollbar doesn't import its own dependencies.

So what I want to clarify here: Do Python programmers try to design
packages so that each file in the package can be individually loaded
into the interpreter and will automatically import its own
dependencies? Or do you design packages so they can only be used by
importing from the top level and running the top level __init__.py?

I hope that made sense. :)

Martin
 
D

Diez B. Roggisch

I'd like to point out something though. More than one of the people
who responded have implied that I am bringing my prior-language
mindset to Python, even suggesting that my brain isn't built for
Python. ;) In fact I think it's the other way around. I am struggling
to take full advantage of the fact that Python is an interpreted
language, to use Python in the most "Pythonic" way. You guys are
telling me that's broken and I should go back to a workflow that is
identical in spirit, and not necessarily any faster than I would use
with a compiled language. While that might be the right answer in
practice, I don't feel like it's a particularly "good" answer, and it
confirms my initial impression that Python package management is
broken.

I think you should be asking yourselves, "Did we all abandon reload()
because it is actually an inferior workflow, or just because it's
totally broken in Python?"

Sorry, but I fail to see the point of your argumentation.

Reloading a module means that you obviously have some editor open you code
your module in, and an interactive interpreter running where you somehow
have to make the

reload(module)

line (re-)appear, and then most probably (unless the pure reloading itself
triggers some testing code) some other line that e.g. instantiates a class
defined in "module"

Now how exactly does that differ from having a test.py file containing

import module
<do-something>

and a commandline sitting there with a

python test.py

waiting to be executed, easily brought back by a single key-stroke.

Especially if <do-something> becomes more that some easy lines brought back
by the command line history.

I've been writing python for a few years now, to programs the size of a few
K-lines, and _never_ felt the slightest need to reload anything. And as
there have been quite a few discussions like this in the past few years,
IMHO reload is a wart and should be removed.

Diez
 
C

Chris Mellon

How exactly do you manage it?

The only way I can see to do it is to have widgets/__init__.py look
something like this:

from common import util
from scrollbar import Scrollbar
from form import Form

Then Scrollbar.py doesn't have to worry about importing util, it just
assumes that util is already present in its namespace.

BUT ... this means that Scrollbar.py can only be loaded in the
interpreter as part of package "widgets". You can't run an interpreter
and type "import widgets.scrollbar.Scrollbar" and start going to town,
because Scrollbar doesn't import its own dependencies.

So what I want to clarify here: Do Python programmers try to design
packages so that each file in the package can be individually loaded
into the interpreter and will automatically import its own
dependencies? Or do you design packages so they can only be used by
importing from the top level and running the top level __init__.py?

I hope that made sense. :)

Scrollbar *can't* assume that util will be present in its namespace,
because it won't be unless it imports it. Scrollbar needs to import
its own dependencies. But why do you think thats a problem?
 
M

Martin Unsal

Because you're advocating single class per file.

What I actually said was "Smallest practical functional block." I
never said one class per file, in fact I generally have more than one
class per file. Nonetheless I frequently have a class which has the
same name as the file it's contained in, which is where I start having
trouble.
What you said was A scan through the
standard library may be instructive, where there are some modules that
expose a single class (StringIO, pprint) and others that expose many,
and some that expose none at all.

AHA! Here we see the insidious Python package system at work! ;)

I said "file" and you assume that I am talking about the exposed
namespace. Files should not have to be isomorphic with namespace! A
package that exposes many classes may still use one class per file if
it wants to.
In any
case, as I hinted at, I prefer an organic, developer driven approach
to deciding these things, not handed down from above style guidelines.

PRECISELY. And in the case of Python, package stucture is dictated,
not by a style guideline, but by the design flaws of Python's package
system.

Martin
 
C

Chris Mellon

What I actually said was "Smallest practical functional block." I
never said one class per file, in fact I generally have more than one
class per file. Nonetheless I frequently have a class which has the
same name as the file it's contained in, which is where I start having
trouble.

You do? Or do you only have trouble because you don't like using "from
foo import Foo" because you need to do more work to reload such an
import?
AHA! Here we see the insidious Python package system at work! ;)

I said "file" and you assume that I am talking about the exposed
namespace. Files should not have to be isomorphic with namespace! A
package that exposes many classes may still use one class per file if
it wants to.

What makes you think that the exposed namespace has to be isomorphic
with the filesystem? Further, why do you think doing so is bad? People
do it because it's convenient and simple, not because its necessary.
Why don't you like filesystems?
PRECISELY. And in the case of Python, package stucture is dictated,
not by a style guideline, but by the design flaws of Python's package
system.

What design flaws are those? Is it because you're trying to have
packages as part of your project without installing them on your
PYTHONPATH somewhere?

If you want to break a module internally into multiple files, then
make it a package. To an importer, they're almost indistinguishable.
If you want to break a module into multiple packages and then stick
the files that make up the package in bizarre spots all over the
filesystem, can you give a reason why?
 
M

Martin Unsal

Scrollbar *can't* assume that util will be present in its namespace,
because it won't be unless it imports it. Scrollbar needs to import
its own dependencies. But why do you think thats a problem?

OK, maybe I'm totally missing something here, but you can't do
"import ../util/common" in Python can you?

Look at the directory structure in my original post. How does
Scrollbar.py import its dependencies from common.py, without relying
on PYTHONPATH?

Martin
 
C

Chris Mellon

OK, maybe I'm totally missing something here, but you can't do
"import ../util/common" in Python can you?

Look at the directory structure in my original post. How does
Scrollbar.py import its dependencies from common.py, without relying
on PYTHONPATH?

It assumes that util.common is a module thats on the PYTHONPATH.

The common way to ensure that this is the case is either to handle
util as a separate project, and install it into the system
site-packages just as you would any third party package, or to have it
(and all your other application packages and modules) off a single
root which is where your your application "base" scripts live.

This, and other intra-package import issues are affected by the
relative/absolute import changes that were begun in Python 2.5, you
can read about them here: http://www.python.org/dev/peps/pep-0328/

Note that using relative imports to import a package that "happens" to
be share a common higher level directory would be frowned upon. The
"blessed" mechanism would still be to use an absolute import, and to
install the other package on the PYTHONPATH in one of any number of
ways.
 
D

Dennis Lee Bieber

On 6 Mar 2007 08:24:04 -0800, "Martin Unsal" <[email protected]>
declaimed the following in comp.lang.python:

<dropping into the thread late, with just a quick off-the-wall
comment>
Python. ;) In fact I think it's the other way around. I am struggling
to take full advantage of the fact that Python is an interpreted
language, to use Python in the most "Pythonic" way. You guys are

At it's core, Python is compiled to byte-code. "import" is a dynamic
"link with optional compile (if no .pyc/.pyo file is found)". This
interpretation makes it closer to some of the "load&go" compilers found
in educational contexts (though my experience is only with the Sigma
FLAG -- FORTRAN Load And Go -- compiler; feed it the source file, watch
it compile, link, and execute... all from ONE command "flag
telling me that's broken and I should go back to a workflow that is
identical in spirit, and not necessarily any faster than I would use
with a compiled language. While that might be the right answer in
I think you should be asking yourselves, "Did we all abandon reload()
because it is actually an inferior workflow, or just because it's
totally broken in Python?"

The only usage I've ever made of "reload()" has been during
interactive debugging: Modify the module, then reload it at the
interactive prompt so I could create an instance of the modified code,
and manually manipulate it. I've never seen reload() as being useful for
"delivered" programs (ie, used programmatically).
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

Martin Unsal

You do? Or do you only have trouble because you don't like using "from
foo import Foo" because you need to do more work to reload such an
import?

More work, like rewriting __import__ and reload??? :)

There's a point where you should blame the language, not the
programmer. Are you saying I'm lazy just because I don't want to mess
with __import__?
What makes you think that the exposed namespace has to be isomorphic
with the filesystem?

I don't; you do!

I was clearly talking about files and you assumed I was talking about
namespace. That's Pythonic thinking... and I don't mean that in a good
way!
If you want to break a module into multiple packages and then stick
the files that make up the package in bizarre spots all over the
filesystem, can you give a reason why?

Because I have written a project with 50,000 lines of Python and I'm
trying to organize it in such a way that it'll scale up cleanly by
another order of magnitude. Because I've worked on projects with
millions of lines of code and I know about how such things are
organized. It's funny, I'm a newbie to Python but it seems like I'm
one of the only people here thinking about it as a large scale
development language rather than a scripting language.

Martin
 
M

Martin Unsal

The only usage I've ever made of "reload()" has been during
interactive debugging: Modify the module, then reload it at the
interactive prompt so I could create an instance of the modified code,
and manually manipulate it.

That's exactly what I want to do. That's exactly what I'm having
trouble with.

Martin
 
C

Chris Mellon

More work, like rewriting __import__ and reload??? :)

There's a point where you should blame the language, not the
programmer. Are you saying I'm lazy just because I don't want to mess
with __import__?

You have to reload the importing module as well as the module that
changed. That doesn't require rewriting the import infrastructure.
It's only an issue because you're changing things at one level but
you're trying to use them at a level removed from that. I never work
that way, because I only have any need or desire to reload when I'm
working interactively and I when I'm doing that I work directly with
the modules I'm changing. The interfaces are what my unit tests are
for. If you're doing stuff complicated and intricate enough in the
interpreter that you need reload() to do very much more than its
doing, then you're working poorly - that sort of operation should be
in a file you can run and test automatically.
I don't; you do!

I was clearly talking about files and you assumed I was talking about
namespace. That's Pythonic thinking... and I don't mean that in a good
way!

All the files on the PYTHONPATH will map into the namespace. However,
you can have items in the namespace that do not map to files. The main
reasons to do so are related to deployment, not development though so
I wonder why you want to.
Because I have written a project with 50,000 lines of Python and I'm
trying to organize it in such a way that it'll scale up cleanly by
another order of magnitude. Because I've worked on projects with
millions of lines of code and I know about how such things are
organized. It's funny, I'm a newbie to Python but it seems like I'm
one of the only people here thinking about it as a large scale
development language rather than a scripting language.

Thats not answering the question. Presumably you have some sort of
organization for your code in mind. What about that organization
doesn't work for Python? If you want multiple files to map to a single
module, make them a package.
 
B

Bruno Desthuilliers

Diez B. Roggisch a écrit :
Sorry, but I fail to see the point of your argumentation.

Reloading a module means that you obviously have some editor open you code
your module in, and an interactive interpreter running where you somehow
have to make the

reload(module)

line (re-)appear, and then most probably (unless the pure reloading itself
triggers some testing code) some other line that e.g. instantiates a class
defined in "module"

Now how exactly does that differ from having a test.py file containing

import module
<do-something>

and a commandline sitting there with a

python test.py

Actually, make it
python -i test.py

Then you have test.py executed, and your interactive interpreter up and
ready in the desired state.
 
M

Matthew Woodcraft

Martin Unsal said:
We could discuss this till we're blue in the face but it's beside the
point. For any given project, architecture, and workflow, the
developers are going to have a preference for how to organize the
code structurally into files, directories, packages, etc. The
language itself should not place constraints on them.

I agree.

For example, say you want to organize the widgets package as follows:
widgets/scrollbar/*.py
widgets/form/*.py
widgets/common/util.py

One possibility is to have one module for each namespace that you want,
and compose each module out of multiple files by using execfile().

-M-
 
M

Martin Unsal

It assumes that util.common is a module thats on the PYTHONPATH.

Now we're getting somewhere. :)
The common way to ensure that this is the case is either to handle
util as a separate project, and install it into the system
site-packages just as you would any third party package,

This breaks if you ever need to test more than one branch of the same
code base. I use a release branch and a development branch. Only the
release branch goes into site-packages, but obviously I do most of my
work in the development branch.
or to have it
(and all your other application packages and modules) off a single
root which is where your your application "base" scripts live.

This has SERIOUS scaling problems.
This, and other intra-package import issues are affected by the
relative/absolute import changes that were begun in Python 2.5, you
can read about them here:http://www.python.org/dev/peps/pep-0328/

Awesome! Thanks. I'll take a look.
Note that using relative imports to import a package that "happens" to
be share a common higher level directory would be frowned upon.

What if it shares a common higher level directory by design? :)

Relative imports aren't ideal, but I think in some cases it's better
than relying on PYTHONPATH which is global state (an environment
variable no less).

Martin
 
C

Chris Mellon

Now we're getting somewhere. :)


This breaks if you ever need to test more than one branch of the same
code base. I use a release branch and a development branch. Only the
release branch goes into site-packages, but obviously I do most of my
work in the development branch.

Theres a number of solutions. They do involve manipulation of
PYTHONPATH or creation of infrastructure, though. I find that I
generally work "against" only one version of package at a time, so
it's not any trouble for me to create a local directory that has all
the version I'm working against. Testing infrastructure manipulates
PYTHONPATH to ensure it's testing the version its supposed to.
This has SERIOUS scaling problems.

If you have lots of modules used by lots of "things" it can be. Not
necessarily though, it depends on how you package and deploy them.
It's often the best solution to the above issue when it comes to
testing, though.
Awesome! Thanks. I'll take a look.


What if it shares a common higher level directory by design? :)

Then its a subpackage of a parent package. That's different than just
walking up to wherever your did your RCS checkout.
Relative imports aren't ideal, but I think in some cases it's better
than relying on PYTHONPATH which is global state (an environment
variable no less).

Environment and manipulation of it is the job of the top level
script/application/whatever. Modules/packages/whatever should rely on
PYTHONPATH being sane.
 
M

Martin Unsal

You have to reload the importing module as well as the module that
changed. That doesn't require rewriting the import infrastructure.

As far as I can tell, the moment you use "from foo_module import bar",
you've broken reload(). Reloading higher level packages doesn't help.
The only practical solution I can see is to rewrite __import__ and
reload.
Thats not answering the question. Presumably you have some sort of
organization for your code in mind.

I already gave a simple example. I thought you were asking why I would
want to organize code that way, and the only short answer is
experience. I'd prefer not to try to formulate a long answer because
it would be time consuming and somewhat off topic, but we can go there
if necessary.

Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top