Project organization and import

M

Martin Unsal

I'm using Python for what is becoming a sizeable project and I'm
already running into problems organizing code and importing packages.
I feel like the Python package system, in particular the isomorphism
between filesystem and namespace, doesn't seem very well suited for
big projects. However, I might not really understand the Pythonic way.
I'm not sure if I have a specific question here, just a general plea
for advice.

1) Namespace. Python wants my namespace heirarchy to match my
filesystem heirarchy. I find that a well organized filesystem
heirarchy for a nontrivial project will be totally unwieldy as a
namespace. I'm either forced to use long namespace prefixes, or I'm
forced to use "from foo import *" and __all__, which has its own set
of problems.

1a) Module/class collision. I like to use the primary class in a file
as the name of the file. However this can lead to namespace collisions
between the module name and the class name. Also it means that I'm
going to be stuck with the odious and wasteful syntax foo.foo
everywhere, or forced to use "from foo import *".

1b) The Pythonic way seems to be to put more stuff in one file, but I
believe this is categorically the wrong thing to do in large projects.
The moment you have more than one developer along with a revision
control system, you're going to want files to contain the smallest
practical functional blocks. I feel pretty confident saying that "put
more stuff in one file" is the wrong answer, even if it is the
Pythonic answer.

2) Importing and reloading. I want to be able to reload changes
without exiting the interpreter. This pretty much excludes "from foo
import *", unless you resort to this sort of hack:

http://www.python.org/search/hypermail/python-1993/0448.html

Has anyone found a systematic way to solve the problem of reloading in
an interactive interpreter when using "from foo import *"?


I appreciate any advice I can get from the community.

Martin
 
J

Jorge Godoy

Martin Unsal said:
1) Namespace. Python wants my namespace heirarchy to match my filesystem
heirarchy. I find that a well organized filesystem heirarchy for a
nontrivial project will be totally unwieldy as a namespace. I'm either
forced to use long namespace prefixes, or I'm forced to use "from foo import
*" and __all__, which has its own set of problems.

I find it nice. You have the idea of where is something just from the import
and you don't have to search for it everywhere. Isn't, e.g., Java like that?
(It's been so long since I last worried with Java that I don't remember if
this is mandatory or just a convention...)

You might get bitten with that when moving files from one OS to another,
specially if one of them disconsider the case and the other is strict with
it.
1a) Module/class collision. I like to use the primary class in a file as the
name of the file. However this can lead to namespace collisions between the
module name and the class name. Also it means that I'm going to be stuck
with the odious and wasteful syntax foo.foo everywhere, or forced to use
"from foo import *".

Your classes should be CamelCased and start with an uppercase letter. So
you'd have foo.Foo, being "foo" the package and "Foo" the class inside of it.
1b) The Pythonic way seems to be to put more stuff in one file, but I
believe this is categorically the wrong thing to do in large projects. The
moment you have more than one developer along with a revision control
system, you're going to want files to contain the smallest practical
functional blocks. I feel pretty confident saying that "put more stuff in
one file" is the wrong answer, even if it is the Pythonic answer.

Why? RCS systems can merge changes. A RCS system is not a substitute for
design or programmers communication. You'll only have a problem if two people
change the same line of code and if they are doing that (and worse: doing that
often) then you have a bigger problem than just the contents of the file.

Unit tests help being sure that one change doesn't break the project as a
whole and for a big project you're surely going to have a lot of those tests.

If one change breaks another, then there is a disagreement on the application
design and more communication is needed between developers or a better
documentation of the API they're implementing / using.
2) Importing and reloading. I want to be able to reload changes without
exiting the interpreter. This pretty much excludes "from foo import *",
unless you resort to this sort of hack:

http://www.python.org/search/hypermail/python-1993/0448.html

Has anyone found a systematic way to solve the problem of reloading in an
interactive interpreter when using "from foo import *"?

I don't reload... When my investigative tests gets bigger I write a script
and run it with the interpreter. It is easy since my text editor can call
Python on a buffer (I use Emacs).
I appreciate any advice I can get from the community.

This is just how I deal with it... My bigger "project" has several modules
now each with its own namespace and package. The API is very documented and
took the most work to get done.

Using setuptools, entrypoints, etc. helps a lot as well.


The thing is that for big projects your design is the most important part.
Get it right and you won't have problems with namespaces and filenames. If
you don't dedicate enough time on this task you'll find yourself in trouble
really soon.
 
B

bruno.desthuilliers

I'm using Python for what is becoming a sizeable project and I'm
already running into problems organizing code and importing packages.
I feel like the Python package system, in particular the isomorphism
between filesystem and namespace,

It's not necessarily a 1:1 mapping. Remember that you can put code in
the __init__.py of a package, and that this code can import sub-
packages/modules namespaces, making the package internal organisation
transparent to user code (I've quite often started with a simple
module, latter turning it into a package as the source-code was
growing too big).
doesn't seem very well suited for
big projects. However, I might not really understand the Pythonic way.

cf above.
I'm not sure if I have a specific question here, just a general plea
for advice.

1) Namespace. Python wants my namespace heirarchy to match my
filesystem heirarchy. I find that a well organized filesystem
heirarchy for a nontrivial project will be totally unwieldy as a
namespace. I'm either forced to use long namespace prefixes, or I'm
forced to use "from foo import *" and __all__, which has its own set
of problems.

cf above. Also remember that you can "import as", ie:

import some_package.some_subpackage.some_module as some_module
1a) Module/class collision. I like to use the primary class in a file
as the name of the file.

Bad form IMHO. Packages and module names should be all_lower,
classnames CamelCased.
1b) The Pythonic way seems to be to put more stuff in one file,

Pythonic way is to group together highly related stuff. Not to "put
more stuff".
but I
believe this is categorically the wrong thing to do in large projects.

Oh yes ? Why ?
The moment you have more than one developer along with a revision
control system,

You *always* have a revision system, don't you ? And having more than
one developper on a project - be it big or small - is quite common.
you're going to want files to contain the smallest
practical functional blocks. I feel pretty confident saying that "put
more stuff in one file" is the wrong answer, even if it is the
Pythonic answer.

Is this actually based on working experience ? It seems that there are
enough not-trivial Python projects around to prove that it works just
fine.
 
M

Martin Unsal

Remember that you can put code in
the __init__.py of a package, and that this code can import sub-
packages/modules namespaces, making the package internal organisation
transparent to user code

Sure, but that doesn't solve the problem.

Say you have a package "widgets" with classes ScrollBar, Form, etc.
You want the end user to "import widgets" and then invoke
"widgets.ScrollBar()". As far as I know there are only two ways to do
this, both seriously flawed: 1) Put all your code in one module
widgets.py, 2) use "from scrollbar import *" in widgets/__init__.py,
which is semi-deprecated and breaks reload().
Also remember that you can "import as", ie:

import some_package.some_subpackage.some_module as some_module

Sure but that doesn't eliminate the unfortunate interaction between
Python class organization and filesystem heirarchy. For example, say
you want to organize the widgets package as follows:

widgets/scrollbar/*.py
widgets/form/*.py
widgets/common/util.py

Other than messing around with PYTHONPATH, which is horrible, I don't
see how to import util.py from the widget code.
Bad form IMHO. Packages and module names should be all_lower,
classnames CamelCased.

You're still stuck doing foo.Foo() everywhere in your client code,
which is ugly and wastes space, or using "from foo import *" which is
broken.
Oh yes ? Why ?

For myriad reasons, just one of them being the one I stated -- smaller
files with one functional unit each are more amenable to source code
management with multiple developers.

We could discuss this till we're blue in the face but it's beside the
point. For any given project, architecture, and workflow, the
developers are going to have a preference for how to organize the code
structurally into files, directories, packages, etc. The language
itself should not place constraints on them. The mere fact that it is
supposedly "Pythonic" to put more functionality in one file indicates
to me that the Python package system is obstructing some of its users
who have perfectly good reasons to organize their code differently.
Is this actually based on working experience ? It seems that there are
enough not-trivial Python projects around to prove that it works just
fine.

Yes. I've worked extensively on several projects in several languages
with multi-million lines of code and they invariably have coding
styles that recommend one functional unit (such as a class), or at
most a few closely related functional units per file.

In Python, most of the large projects I've looked at use "from foo
import *" liberally.

I guess my question boils down to this. Is "from foo import *" really
deprecated or not? If everyone has to use "from foo import *" despite
the problems it causes, how do they work around those problems (such
as reloading)?

Martin
 
M

Martin Unsal

Jorge, thanks for your response. I replied earlier but I think my
response got lost. I'm trying again.

Why? RCS systems can merge changes. A RCS system is not a substitute for
design or programmers communication.

Text merges are an error-prone process. They can't be eliminated but
they are best avoided when possible.

When refactoring, it's much better to move small files around than to
move chunks of code between large files. In the former case your SCM
system can track integration history, which is a big win.
Unit tests help being sure that one change doesn't break the project as a
whole and for a big project you're surely going to have a lot of those tests.

But unit tests are never an excuse for error prone workflow. "Oh,
don't worry, we'll catch that with unit tests" is never something you
want to say or hear.
I don't reload... When my investigative tests gets bigger I write a script
and run it with the interpreter. It is easy since my text editor can call
Python on a buffer (I use Emacs).

That's interesting, is this workflow pretty universal in the Python
world?

I guess that seems unfortunate to me, one of the big wins for
interpreted languages is to make the development cycle as short and
interactive as possible. As I see it, the Python way should be to
reload a file and reinvoke the class directly, not to restart the
interpreter, load an entire package and then run a test script to set
up your test conditions again.

Martin
 
C

Chris Mellon

Jorge, thanks for your response. I replied earlier but I think my
response got lost. I'm trying again.



Text merges are an error-prone process. They can't be eliminated but
they are best avoided when possible.

When refactoring, it's much better to move small files around than to
move chunks of code between large files. In the former case your SCM
system can track integration history, which is a big win.


But unit tests are never an excuse for error prone workflow. "Oh,
don't worry, we'll catch that with unit tests" is never something you
want to say or hear.

That's actually the exact benefit of unit testing, but I don't feel
that you've actually made a case that this workflow is error prone.
You often have multiple developers working on the same parts of the
same module?
That's interesting, is this workflow pretty universal in the Python
world?

I guess that seems unfortunate to me, one of the big wins for
interpreted languages is to make the development cycle as short and
interactive as possible. As I see it, the Python way should be to
reload a file and reinvoke the class directly, not to restart the
interpreter, load an entire package and then run a test script to set
up your test conditions again.

If you don't do this, you aren't really testing your changes, you're
testing your reload() machinery. You seem to have a lot of views about
what the "Python way" should be and those are at odds with the actual
way people work with Python. I'm not (necessarily) saying you're
wrong, but you seem to be coming at this from a confrontational
standpoint.

Your claim, for example, that the language shouldn't place constraints
on how you manage your modules is questionable. I think it's more
likely that you've developed a workflow based around the constraints
(and abilities) of other languages and you're now expecting Python to
conform to that instead of its own.

I've copied some of your responses from your earlier post below:
Yes. I've worked extensively on several projects in several languages
with multi-million lines of code and they invariably have coding
styles that recommend one functional unit (such as a class), or at
most a few closely related functional units per file.

I wonder if you've ever asked yourself why this is the case. I know
from my own experience why it's done in traditional C++/C environments
- it's because compiling is slow and breaking things into as many
files (with as few interdependencies) as possible speeds up the
compilation process. Absent this need (which doesn't exist in Python),
what benefit is there to separating out related functionality into
multiple files? Don't split them up just because you've done so in the
past - know why you did it in the past and if those conditions still
apply. Don't split them up until it makes sense for *this* project,
not the one you did last year or 10 years ago.
I guess my question boils down to this. Is "from foo import *" really
deprecated or not? If everyone has to use "from foo import *" despite
the problems it causes, how do they work around those problems (such
as reloading)?

from foo import * is a bad idea at a top level because it pollutes
your local namespace. In a package __init__, which exists expressly
for the purpose of exposing it's interior namespaces as a single flat
one, it makes perfect sense. In some cases you don't want to export
everything, which is when __all__ starts to make sense. Clients of a
package (or a module) shouldn't use from foo import * without a good
reason. Nobody I know uses reload() for anything more than trivial "as
you work" testing in the interpreter. It's not reliable or recommended
for anything other than that. It's not hard to restart a shell,
especially if you use ipython (which can save and re-create a session)
or a script thats set up to create your testing environment. This is
still a much faster way than compiling any but the most trivial of
C/C++ modules. In fact, on my system startup time for the interpreter
is roughly the same as the "startup time" of my compiler (that is to
say, the amount of time it takes deciding what its going to compile,
without actually compiling anything).
You're still stuck doing foo.Foo() everywhere in your client code,
which is ugly and wastes space, or using "from foo import *" which is
broken.

If you don't like working with explicit namespaces, you've probably
chosen the wrong language. If you have a specific name (or a few
names) which you use all the time from a module, then you can import
just those names into your local namespace to save on typing. You can
also alias deeply nested names to something more shallow.
For myriad reasons, just one of them being the one I stated -- smaller
files with one functional unit each are more amenable to source code
management with multiple developers.

I propose that the technique most amenable to source code management
is for a single file (or RCS level module, if you have a locking RCS)
to have everything that it makes sense to edit or change for a
specific feature. This is an impossible goal in practice (because you
will inevitably and necessarily have intermodule dependencies) but
your developers don't write code based around individual files. They
base it around the systems and the interfaces that compose your
project. It makes no more sense to arbitrarily break them into
multiple files than it does to arbitrarily leave them all in a single
file.

In summary: I think you've bound yourself to a style of source
management that made sense in the past without reanalyzing it to see
if it makes sense now. Trust your judgment and that of your developers
when it comes to modularization. When they end up needing to merge all
the time because they're conflicting with someone else's work, they'll
break things up into modules.

You're also placing far too much emphasis on reload. Focus yourself on
unit tests and environment scripts instead. These are more reliable
and easier to validate than reload() in a shell.
 
M

Martin Unsal

That's actually the exact benefit of unit testing, but I don't feel
that you've actually made a case that this workflow is error prone.
You often have multiple developers working on the same parts of the
same module?

Protecting your head is the exact benefit of bike helmets, that
doesn't mean you should bike more more recklessly just because you're
wearing a helmet. :)

Doing text merges is more error prone than not doing them. :)

There are myriad other benefits of breaking up large files into
functional units. Integration history, refactoring, reuse, as I
mentioned. Better clarity of design. Easier communication and
coordination within a team. What's the down side? What's the advantage
of big files with many functional units?
If you don't do this, you aren't really testing your changes, you're
testing your reload() machinery.

Only because reload() is hard in Python! ;)
You seem to have a lot of views about
what the "Python way" should be and those are at odds with the actual
way people work with Python. I'm not (necessarily) saying you're
wrong, but you seem to be coming at this from a confrontational
standpoint.

When I refer to "Pythonic" all I'm talking about is what I've read
here and observed in other people's code. I'm here looking for more
information about how other people work, to see if there are good
solutions to the problems I see.

However when I talk about what I think is "wrong" with the Pythonic
way, obviously that's just my opinion formed by my own experience.
Your claim, for example, that the language shouldn't place constraints
on how you manage your modules is questionable. I think it's more
likely that you've developed a workflow based around the constraints
(and abilities) of other languages and you're now expecting Python to
conform to that instead of its own.

I don't think so; I'm observing things that are common to several
projects in several languages.
I wonder if you've ever asked yourself why this is the case. I know
from my own experience why it's done in traditional C++/C environments
- it's because compiling is slow and breaking things into as many
files (with as few interdependencies) as possible speeds up the
compilation process.

I don't think that's actually true. Fewer, bigger compilation units
actually compile faster in C, at least in my experience.
Absent this need (which doesn't exist in Python),

Python still takes time to load & "precompile". That time is becoming
significant for me even in a modest sized project; I imagine it would
be pretty awful in a multimillion line project.

No matter how fast it is, I'd rather reload one module than exit my
interpreter and reload the entire world.

This is not a problem for Python as scripting language. This is a real
problem for Python as world class application development language.
In a package __init__, which exists expressly
for the purpose of exposing it's interior namespaces as a single flat
one, it makes perfect sense.

OK! That's good info, thanks.
Nobody I know uses reload() for anything more than trivial "as
you work" testing in the interpreter. It's not reliable or recommended
for anything other than that.

That too... although I think that's unfortunate. If reload() were
reliable, would you use it? Do you think it's inherently unreliable,
that is, it couldn't be fixed without fundamentally breaking the
Python language core?
This is
still a much faster way than compiling any but the most trivial of
C/C++ modules.

I'm with you there! I love Python and I'd never go back to C/C++. That
doesn't change my opinion that Python's import mechanism is an
impediment to developing large projects in the language.
If you don't like working with explicit namespaces, you've probably
chosen the wrong language.

I never said that. I like foo.Bar(), I just don't like typing
foo.Foo() and bar.Bar(), which is a waste of space; syntax without
semantics.
I propose that the technique most amenable to source code management
is for a single file (or RCS level module, if you have a locking RCS)
to have everything that it makes sense to edit or change for a
specific feature.

Oh, I agree completely. I think we're using the exact same criterion.
A class is a self-contained feature with a well defined interface,
just what you'd want to put in it's own file. (Obviously there are
trivial classes which don't implement features, and they don't need
their own files.)
You're also placing far too much emphasis on reload. Focus yourself on
unit tests and environment scripts instead. These are more reliable
and easier to validate than reload() in a shell.

I think this is the crux of my frustration. I think reload() is
unreliable and hard to validate because Python's package management is
broken. I appreciate your suggestion of alternatives and I think I
need to come to terms with the fact that reload() is just broken. That
doesn't mean it has to be that way or that Python is blameless in this
problem.

Martin
 
C

Chris Mellon

Protecting your head is the exact benefit of bike helmets, that
doesn't mean you should bike more more recklessly just because you're
wearing a helmet. :)

Doing text merges is more error prone than not doing them. :)

There are myriad other benefits of breaking up large files into
functional units. Integration history, refactoring, reuse, as I
mentioned. Better clarity of design. Easier communication and
coordination within a team. What's the down side? What's the advantage
of big files with many functional units?


I never advocated big files with many functional units - just files
that are "just big enough". You'll know you've broken them down small
enough when you stop having to do text merges every time you commit.
Only because reload() is hard in Python! ;)


When I refer to "Pythonic" all I'm talking about is what I've read
here and observed in other people's code. I'm here looking for more
information about how other people work, to see if there are good
solutions to the problems I see.

However when I talk about what I think is "wrong" with the Pythonic
way, obviously that's just my opinion formed by my own experience.


I don't think so; I'm observing things that are common to several
projects in several languages.

..... languages with similar runtime semantics and perhaps common
ancestry? All languages place limitations on how you handle modules,
either because they have infrastructure you need to use or because
they lack it and you're left on your own.
I don't think that's actually true. Fewer, bigger compilation units
actually compile faster in C, at least in my experience.

If you're doing whole project compilation. When you're working,
though, you want to be able to do incremental compilation (all modern
compilers I know of support this) so you just recompile the files
you've changed (and dependencies) and relink. Support for this is why
we have stuff like precompiled headers, shadow headers like Qt uses,
and why C++ project management advocates single class-per-file
structures. Fewer dependencies between compilation units means a
faster rebuild-test turnaround.
Python still takes time to load & "precompile". That time is becoming
significant for me even in a modest sized project; I imagine it would
be pretty awful in a multimillion line project.

No matter how fast it is, I'd rather reload one module than exit my
interpreter and reload the entire world.

Sure, but whats your goal here? If you're just testing something as
you work, then this works fine. If you're testing large changes, that
affect many modules, then you *need* to reload your world, because you
want to make sure that what you're testing is clean. I think this
might be related to your desire to have everything in lots of little
files. The more modules you load, the harder it is to track your
dependencies and make sure that the reload is correct.
This is not a problem for Python as scripting language. This is a real
problem for Python as world class application development language.

Considering that no other "world class application development
language" supports reload even as well as Python does, I'm not sure I
can agree here. A perfect reload might be a nice thing to have, but
lack of it hardly tosses Python (or any language) out of the running.
OK! That's good info, thanks.


That too... although I think that's unfortunate. If reload() were
reliable, would you use it? Do you think it's inherently unreliable,
that is, it couldn't be fixed without fundamentally breaking the
Python language core?

The semantics of exactly what reload should do are tricky. Pythons
reload works in a sensible but limited way. More complicated reloads
are generally considered more trouble than they are worth. I've wanted
different things from reload() at different times, so I'm not even
sure what I would consider it being "reliable".

Here's a trivial example - if you rename a class in a module and then
reload it, what should happen to instances of the class you renamed?
I'm with you there! I love Python and I'd never go back to C/C++. That
doesn't change my opinion that Python's import mechanism is an
impediment to developing large projects in the language.


I never said that. I like foo.Bar(), I just don't like typing
foo.Foo() and bar.Bar(), which is a waste of space; syntax without
semantics.

There's nothing that prevents there being a bar.Foo, the namespace
makes it clear where you're getting the object. This is again a
consequence of treating modules like classes. Some modules only expose
a single class (StringIO/cStringIO in the standardlib is a good
example), but it's more common for them to expose a single set of
"functionality".

That said, nothing prevents you from using "from foo import Foo" if
Foo is all you need (or need most - you can combine this with import
foo).
Oh, I agree completely. I think we're using the exact same criterion.
A class is a self-contained feature with a well defined interface,
just what you'd want to put in it's own file. (Obviously there are
trivial classes which don't implement features, and they don't need
their own files.)

Sure, if all your classes are that. But very few classes exist in
isolation - there's external and internal dependencies, and some
classes are tightly bound. There's no reason for these tightly bound
classes to be in external files (or an external namespace), because
when you work on one you'll need to work on them all.
I think this is the crux of my frustration. I think reload() is
unreliable and hard to validate because Python's package management is
broken. I appreciate your suggestion of alternatives and I think I
need to come to terms with the fact that reload() is just broken. That
doesn't mean it has to be that way or that Python is blameless in this
problem.

I wonder what environments you worked in before that actually had a
reliable and gotcha free version of reload? I actually don't know of
any - Smalltalk is closest. It's not really "broken" when you
understand what it does. There's just an expectation that it does
something else, and when it doesn't meet that expectation it's assumed
to be broken. Now, thats a fair definition of "broken", but replacing
running instances in a live image is a very hard problem to solve
generally. Limiting reload() to straightforward, reliable behavior is
a reasonable design decision.
 
D

Dave Baum

"Martin Unsal said:
That too... although I think that's unfortunate. If reload() were
reliable, would you use it? Do you think it's inherently unreliable,
that is, it couldn't be fixed without fundamentally breaking the
Python language core?

I wrote a module that wraps __import__ and tracks the dependencies of
imports. It then allows you to unload any modules whose source have
changed. That seemed to work out nicely for multi-module projects.

However, one problem I ran into was that dynamic libraries don't get
reloaded, so if you are doing hybrid C++/Python development then this
doesn't help - you still have to restart the whole python process to
pick up changes in your C++ code.

I also didn't do a ton of testing. It worked for a few small projects
I was working on, but I stopped using it once I ran into the dynamic
library thing, and at this point I'm used to just restarting python
each time. I'm sure there are some odd things that some python modules
could do that would interfere with the automatic reloading code I
wrote.

If you're interested in the code, drop me an email.

Dave
 
B

Bruno Desthuilliers

Martin Unsal a écrit :
Sure, but that doesn't solve the problem.

Say you have a package "widgets" with classes ScrollBar, Form, etc.
You want the end user to "import widgets" and then invoke
"widgets.ScrollBar()". As far as I know there are only two ways to do
this, both seriously flawed: 1) Put all your code in one module
widgets.py, 2) use "from scrollbar import *" in widgets/__init__.py,
which is semi-deprecated

"deprecated" ? Didn't see any mention of this so far. But it's bad form,
since it makes hard to know where some symbol comes from.

# widgets.__init
from scrollbar import Scrollbar, SomeOtherStuff, some_function, SOME_CONST
and breaks reload().


Sure but that doesn't eliminate the unfortunate interaction between
Python class organization and filesystem heirarchy.

*class* organization ? It's not Java here. Nothing forces you to use
classes.
For example, say
you want to organize the widgets package as follows:

widgets/scrollbar/*.py
widgets/form/*.py
widgets/common/util.py

Other than messing around with PYTHONPATH, which is horrible, I don't
see how to import util.py from the widget code.

Some of us still manage to do so without messing with PYTHONPATH.
You're still stuck doing foo.Foo() everywhere in your client code,

from foo import Foo

But:
which is ugly

It's not ugly, it's informative. At least you know where Foo comes from.
> and wastes space,

My. Three letters and a dot...
or using "from foo import *" which is
broken.

cf above.
For myriad reasons, just one of them being the one I stated -- smaller
files with one functional unit each

Oh. So you're proposing that each and any single function goes in a
separate file ?
are more amenable to source code
management with multiple developers.

This is not my experience.
We could discuss this till we're blue in the face but it's beside the
point. For any given project, architecture, and workflow, the
developers are going to have a preference for how to organize the code
structurally into files, directories, packages, etc. The language
itself should not place constraints on them. The mere fact that it is
supposedly "Pythonic" to put more functionality in one file indicates
to me that the Python package system is obstructing some of its users
who have perfectly good reasons to organize their code differently.

It has never been an issue for me so far.
Yes. I've worked extensively on several projects in several languages
with multi-million lines of code

I meant, based on working experience *with Python* ? I've still not seen
a "multi-million" KLOC project in Python - unless of course you include
all the stdlib and the interpreter itself, and even then I doubt we get
so far.
and they invariably have coding
styles that recommend one functional unit (such as a class), or at
most a few closely related functional units per file.

Which is what I see in most Python packages I've seen so far. But we may
not have the same definition for "a few" and "closely related" ?
In Python, most of the large projects I've looked at use "from foo
import *" liberally.

I've seen few projects using this. And I wouldn't like having to
maintain such a project.
I guess my question boils down to this. Is "from foo import *" really
deprecated or not?

This syntax is only supposed to be a handy shortcut for quick testing
and exploration in an interactive session. Using it in production code
is considered bad form.
If everyone has to use "from foo import *"

I never did in 7 years.
despite
the problems it causes, how do they work around those problems (such
as reloading)?

Do you often have a need for "reloading" in production code ???

Martin, I'm not saying Python is perfect, but it really feels like
you're worrying about things that are not problems.
 
B

Bruno Desthuilliers

Martin Unsal a écrit :
(snip)
When refactoring, it's much better to move small files around than to
move chunks of code between large files.

Indeed. But having hundreds or thousands of files each with at most a
dozen lines of effective code is certainly not an ideal. Remember that
Python let you tell much more in a few lines than some mainstream
languages I won't name here.
That's interesting, is this workflow pretty universal in the Python
world?

I don't know, but that's also mostly how I do work.
I guess that seems unfortunate to me,

So I guess you don't understand what Jorge is talking about.
one of the big wins for
interpreted languages is to make the development cycle as short and
interactive as possible.

It's pretty short interactive. Emacs Python mode let you fire up a
subinterpreter and eval either your whole buffer or a class or def block
or even a single expression - and play with the result in the
subinterpreter.
As I see it, the Python way should be to
reload a file and reinvoke the class directly, not to restart the
interpreter, load an entire package and then run a test script to set
up your test conditions again.

^Cc^C! to start a new interpeter
^Cc^Cc to eval the whole module

Since the module takes care of "loading the entire package", you don't
have to worry about this. And since, once the script eval'd, you still
have your (interactive) interpreter opened, with all state set, you can
then explore at will. Try it by yourself. It's by far faster and easier
than trying to manually keep track of the interpreter state.
 
B

Bruno Desthuilliers

Martin Unsal a écrit :
There are myriad other benefits of breaking up large files into
functional units. Integration history, refactoring, reuse, as I
mentioned. Better clarity of design. Easier communication and
coordination within a team. What's the down side? What's the advantage
of big files with many functional units?

What is a "big file" ?

(snip)
However when I talk about what I think is "wrong" with the Pythonic
way, obviously that's just my opinion formed by my own experience.

Your own experience *with Python* ? or any close-enough language ? Or
your experience with C++ ?

(snip)
Python still takes time to load & "precompile".

compile. To byte-code, FWIW. Not "load & precompile". And - apart from
the top-level script - only modified modules get recompiled.
That time is becoming
significant for me even in a modest sized project;

On what hardware are you working ??? I have my interpreter up and
running in a couple milliseconds, and my box is a poor athlon xp1200/256.
I imagine it would
be pretty awful in a multimillion line project.

I still wait to see a multimillion line project in Python !-)

If you find yourself in this situation, then you there's certainly
something totally wrong in the way you (and/or your team) design and code.

But anyway - remember that only the modified modules get recompiled.
No matter how fast it is, I'd rather reload one module than exit my
interpreter and reload the entire world.

Did you actually *tried* it ?
This is not a problem for Python as scripting language. This is a real
problem for Python as world class application development language.

Sorry to have to say so, but this is total bullshit IMHO - which is
based on working experience.
That too... although I think that's unfortunate. If reload() were
reliable, would you use it?

I wouldn't. It easier to rerun a simple test script and keep the
interpreter opened with full state - and you're sure you have the
correct desired state then.
I'm with you there! I love Python and I'd never go back to C/C++. That
doesn't change my opinion that Python's import mechanism is an
impediment to developing large projects in the language.

What about basing your opinion on facts ? What about going with the
language instead of fighting against it ?
I never said that. I like foo.Bar(), I just don't like typing
foo.Foo() and bar.Bar(), which is a waste of space; syntax without
semantics.

May I say that the problem here comes from your insistance on putting
each class in a single module ?
Oh, I agree completely. I think we're using the exact same criterion.

I really doubt you do. What Chris is talking about is grouping together
what usually needs to change together.
A class is a self-contained feature with a well defined interface,

So is a function. Should we put any single function in a separate module
then ?
just what you'd want to put in it's own file. (Obviously there are
trivial classes which don't implement features, and they don't need
their own files.)




I think this is the crux of my frustration. I think reload() is
unreliable and hard to validate because Python's package management is
broken.

I think the "crux of your frustation" comes from your a priori. Fighting
against a language can only bring you into frustration. If the language
don't fit your brain - which is perfectly legitimate - then use another
one - but don't blame the language for it.
 
A

Alex Martelli

Bruno Desthuilliers said:
I don't know, but that's also mostly how I do work.

My favorite way of working: add a test (or a limited set of tests) for
the new or changed feature, run it, check that it fails, change the
code, rerun the test, check that the test now runs, rerun all tests to
see that nothing broke, add and run more tests to make sure the new code
is excellently covered, rinse, repeat. Occasionally, to ensure the code
stays clean, stop to refactor, rerunning tests as I go.

I'm also keen on bigger tests (integration tests, as well as system
tests for regressions, acceptance, etc), but of course I don't run those
anywhere as frequently (they're not part of my daily workflow, iterated
multiple times per day -- more like a "nightly run" kind of thing, or
for special occasions such as just before committing into HEAD... I'm
somewhat of a stickler about HEAD *always* passing *all* tests...).

Not exactly TDD, please note -- I tend to start the cycle with a few
tests (not strictly just one), implement some large chunk of the
new/changed stuff, and add "coverage" and "boundary cases" tests towards
the end of the cycle (more often than not I don't need further changes
to satisfy the coverage and boundary-case tests, because of the "large
chunk" thing). So, a TDD purist would blast me for heresy.

Nevertheless, having tried everything from pure TDD to papertrail-heavy
waterfall (including the "toss the bits over the wall to QA", shudder!)
to typical Chaos Driven Development, in over a quarter century of
experience, this almost-TDD is what works best for me -- in Python, C,
and C++, at least (it's been a long time, if ever, since I did enough
production Java, Haskell, Ruby, SML, assembly, Perl, bash, Fortran,
Cobol, Objective C, Tcl, awk, Scheme, PL/I, Rexx, Forth, Pascal,
Modula-2, or Basic, to be sure that the same approach would work well in
each of these cases, though I have no reason to think otherwise).


Alex
 
M

Martin Unsal

I never advocated big files with many functional units - just files
that are "just big enough".

Then we're in total agreement. I'm not sure why you thought my
opinions were the result of baggage from other languages when you
don't seem to actually disagree with me.
Fewer dependencies between compilation units means a
faster rebuild-test turnaround.

I know all about incremental builds and I just don't think people use
small compilation units in C++ to make their builds faster. It
certainly never been the reason why I subdivided a source file.
Sure, but whats your goal here? If you're just testing something as
you work, then this works fine. If you're testing large changes, that
affect many modules, then you *need* to reload your world, because you
want to make sure that what you're testing is clean.

I don't think reload works for anything but trivial scripts. The
moment you use "from foo import bar" reload is broken.
The semantics of exactly what reload should do are tricky. Pythons
reload works in a sensible but limited way.

I agree that there is some subtlety there, and I appreciate your
example. However the fact that Python's module system essentially
forces you to use "from foo import *" and that reload is almost
entirely imcompatible with "from foo import *", I would say that
reload is essentially useless.
That said, nothing prevents you from using "from foo import Foo" if
Foo is all you need (or need most - you can combine this with import
foo).

Well "from foo import Foo" is just a special case of "from foo import
*". :) It still breaks reload. It still means you're restarting your
interpreter even to do the most trivial development cycle.
I wonder what environments you worked in before that actually had a
reliable and gotcha free version of reload?

I'm perfectly well aware that I'm not going to be able to reload a
widget in the middle of a running GUI app, for example. I'm not
looking for gotcha free, I'll settle for minimally useful.

Here's an analogy. In C, you can do an incremental build and run your
modified application without having to first reboot your computer. In
Python, where reload() is essentially the incremental build process,
and the interpreter is essentially a virtual machine, you guys are
saying that my best option is to just "reboot" the virtual machine to
make sure I have a "clean slate". It may be the path of least
resistance, but to say that it is necessary or inevitable is 1960s
mainframe thinking.

Martin
 
M

Martin Unsal

Your own experience *with Python* ?

No, my experience with Visual Basic. ;)

Of course my experience with Python!

Sorry, I can continue writing snarky replies to your snarky comments
but that won't get us anywhere productive. Instead I think the
following really gets to the crux of the issue.
May I say that the problem here comes from your insistance on putting
each class in a single module ?

No, it doesn't.

It really doesn't matter how many classes you have in a module; either
you use "from foo import bar", or you are stuck with a file structure
that is isomorphic to your design namespace.

The former breaks reload; the latter breaks large projects.

Martin
 
M

Martin Unsal

My favorite way of working: add a test (or a limited set of tests) for
the new or changed feature, run it, check that it fails, change the
code, rerun the test, check that the test now runs, rerun all tests to
see that nothing broke, add and run more tests to make sure the new code
is excellently covered, rinse, repeat. Occasionally, to ensure the code
stays clean, stop to refactor, rerunning tests as I go.
From the way you describe your workflow, it sounds like you spend very
little time working interactively in the interpreter. Is that the case
or have I misunderstood?

Martin
 
G

Gabriel Genellina

My favorite way of working: add a test (or a limited set of tests) for
the new or changed feature, run it, check that it fails, change the
code, rerun the test, check that the test now runs, rerun all tests to
[...]
From the way you describe your workflow, it sounds like you spend very
little time working interactively in the interpreter. Is that the case
or have I misunderstood?

FWIW, I only work interactively with the interpreter just to test some
constructs, or use timeit, or check code posted here... Never to develop
production code. That's why I don't care at all about reload(), by example.
 
B

Bruno Desthuilliers

Martin Unsal a écrit :
No, my experience with Visual Basic. ;)

Of course my experience with Python!

Sorry but this was really not obvious.
Sorry, I can continue writing snarky replies to your snarky comments
but that won't get us anywhere productive.

You're right - sorry.
Instead I think the
following really gets to the crux of the issue.


No, it doesn't.

It really doesn't matter how many classes you have in a module; either
you use "from foo import bar", or you are stuck with a file structure
that is isomorphic to your design namespace.

The former breaks reload;

<imho>
Which is not a problem. reload() is of very limited use for any
non-trivial stuff.
</imho>
 
J

Jorge Godoy

My favorite way of working: add a test (or a limited set of tests) for
the new or changed feature, run it, check that it fails, change the
code, rerun the test, check that the test now runs, rerun all tests to
see that nothing broke, add and run more tests to make sure the new code
is excellently covered, rinse, repeat. Occasionally, to ensure the code
stays clean, stop to refactor, rerunning tests as I go.

I believe this is a distinct case. When we write tests we're worried with the
system itself. When using the interactive interpreter we're worried with how
to best use the language. There might be some feature of the system related
to that investigation, but there might be not. For example: "what are the
methods provided by this object?" or "which approach is faster for this loop?"

I won't write a test case to test loop speed. But I'd poke with the
interpreter and if the environment gets a bit big to setup then I'd go to the
text editor as I said.
 
J

Jorge Godoy

Martin Unsal said:
Then we're in total agreement. I'm not sure why you thought my
opinions were the result of baggage from other languages when you
don't seem to actually disagree with me.

I believe the reason was that you were advocating one class per file. "big
enough" might be more classes. Or fewer... :)
I agree that there is some subtlety there, and I appreciate your
example. However the fact that Python's module system essentially
forces you to use "from foo import *" and that reload is almost
entirely imcompatible with "from foo import *", I would say that
reload is essentially useless.

The don't force you to that... There are many modules that do, but they are
generally glueing your Python code to some other language (usually C) written
code. This is common for GUI development, for example.

In fact, it is rare to me -- mathematics, statistics, database, web
development, testing -- to use this construction. There are no modules that
demand that.

And you can also write:

from foo import Bar, Baz

or even

from foo import Bar as B1, Baz as B2 # OUCH! ;-)
Well "from foo import Foo" is just a special case of "from foo import
*". :) It still breaks reload. It still means you're restarting your
interpreter even to do the most trivial development cycle.

That's what you get when you're working with instances of Foo... I believe
that for classmethods this would work right. So, again, it depends on your
code, how it is structured (and how it can be structured), etc.
Here's an analogy. In C, you can do an incremental build and run your
modified application without having to first reboot your computer. In
Python, where reload() is essentially the incremental build process,
and the interpreter is essentially a virtual machine, you guys are
saying that my best option is to just "reboot" the virtual machine to
make sure I have a "clean slate". It may be the path of least
resistance, but to say that it is necessary or inevitable is 1960s
mainframe thinking.

How can you reload C code that would affect already running code --
ie. existing data, pointers, etc. -- without reloading the full program? Even
changing and reloading a dynamic library wouldn't do that to already existing
code, so you'd have to "reboot" your application as well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top