Is there a "Large Scale Python Software Design" ?

A

Alex Martelli

Peter Hansen said:
And you've reemphasized my point. "Testing" is not test-driven
development. In fact, test-driven development is about *design*,
not just about testing. The two are related, but definitely not

Hmmm... the way I see it, it's about how one (or, better!!!, two: pair
programming is a GREAT idea) proceeds to _implement_ a design. The fact
that the package (or other kind of component) I'm writing will offer a
class Foo with a no-parameters constructor, methods A, B, and C with
parameters thus and thus, etc, has hopefully been determined and agreed
beforehand -- the people who now write that package, and other teams who
write other code using the package, have presumably met and haggled
about it and coded one or more mock-up versions of the package (or used
other lesser way to clarify the specs), so that code depending on the
package can be tested-and-coded (with the mock-ups) even while the
package itself is being tested and coded...

I know Kent Beck's book shows a much more 'exploratory' kind of TDD, but
in a large project that would lead to its own deleterious equivalent of
"waterfall": everything must proceed bottom-up because no mid-level
components can be coded until the low-level components are done, and so
forth. I don't think that's acceptable in this form, in general.

_Design_, in large scale software development, is mainly about sketching
reasonable boundaries between components to allow testing and coding to
proceed with full advantage of the fact that the team of developers is
of substantial size. Indeed there may be several sub-teams, or even
several full-fledged teams, though the latter situation (over, say,
around 20 developers, assuming colocation... that's the very maximum you
could possibly, sensibly cram into a single team) suddenly begets its
own sociopolitical AND technical problems... I have not been in that
situation with Python, yet, only with Fortran, C, C++.

Of course when both components are nominally done there comes
integration testing time, and the sparks fly;-). Designing integration
tests ahead of time, at the same time as the mock-ups, _would_ help, but
somehow or other it never really seems to happen (I'd love hearing
real-life experiences from somebody who DID manage to make it happen,
btw; maybe I'd learn how to "facilitate" its happening, too!-).

If and when you're lucky there's some 'customer' (in the
extreme-programming sense) busy writing _acceptance_ tests for the
system, making the user-stories concrete, at the same time -- but good
acceptance tests are NOT the same thing as good integration tests... you
need both kinds (at least if the system is truly large). Anyway, at
integration-testing time and/or acceptance-testing time, there is
typically at least one iteration where the mock-ups/specs are updated to
take into account of what we've learned while implementing the component
and consuming it, and it's back to the pair-programming parts with TDD.

But these fascinating AND crucially important issues are about far wider
concerns than "static type testing" can help with. "Design by
Contract", where the mock-up includes preconditions and postconditions
and invariants, can be helpful, but such DbC thingies are to be checked
at runtime, anyway (they're great in pinpointing more problems during
integration testing, etc, etc, they don't _substitute_ for testing
though, they simply amplify its effectiveness, which is good enough).

TDD may surely help defining the internal structures and algorithms
within a single component, of course, if that's what you mean by design.

But (with decent partitioning) a single component should be _at most_ a
few thousand lines of Python code -- very offhand I'd say no more than
2/3 thousand lines of functional code, as much again of unit tests, and
a generous addition of comments, docstrings, and blank lines, to a total
line count, as "wc *.py" gives it of no more than 6k, 7k tops. If it's
bigger, there are problems -- docstrings are trying to become user
reference manuals, comments are summarizing whole books on data
structures and algorithms rather than giving the URLs to them, or, most
likely, there was a mispartitioning and this poor "component" is being
asked to do far too much, way more than one cohesive set of
responsibilities which need to be well-coordinated. Time to call an
emergency team meeting and repartition a little bit.

Hmmm, this has relatively little to do with static type checks, but is
extremely relevant to the 'Subject' - indeed, it's one (or two;-) of the
many sets of issues that (IMHO) need to be addressed in a book or course
on large scale software development (not JUST design, mind you: the
process whereby the design is defined, how it's changed during the
development, and how it is implemented by the various components, is at
least as important as the technical characteristics that the design
itself, seen as a finished piece of work, should exhibit...).
the same thing, and eliminating TDD with a wave of a hand intended
to poo-poo mere testing is to miss the point. Once someone has

Absolutely -- I do fully agree with you on this.
tried TDD, they are unlikely to lump it in with simple "unit testing"
as it has other properties that aren't obvious on the surface.

It sure beats "retrofitting" unit tests post facto. But I'm not sure
what properties you have in mind here; care to expand?

The topic of the thread was large projects with _large teams_,
I thought, so I won't focus on my personal work. The team I

Yeah, I think the intended meaning was "in which you personally have
taken part" rather than any implication of "single-handedly" -- the
mention of "more than a handful of developers" being key.

BTW, a team with 5-6 colocated developers, plus support personnel for
GUI painting/design (in the graphical sense) and/or webpage/style ditto,
system administration, documentation, acceptance testing, etc, can build
QUITE a large system, if the team dynamics and the skills of the people
involved are right. So the "more of a handful of developers" doesn't
seem a necessary part of the definition of "large scale software
development". 5-6 full-time developers already require the kind of
coordination and component design (partitioning) that 10-12 will,
there's no big jump there in my experience. The jump does come when you
double again (or can't have colocation, even with just, say, 6 people),
because that's when the one team _must_ split into cooperating teams
(again in my experience: I _have_ seen -- thankfully not participated in
-- alleged single "teams" of 50 or more people, but I am not sure they
actually even managed to deploy any working code... whatever language
we're talking about, matters little, as here we're clashing with a
biological characteristic of human beings, probably connected to the
prehistorical size of optimal hunting bands or something!-).
was leading worked on code that, if I recall, was somewhat over
100,000 lines of Python code including tests. I don't recall
whether that number was the largest piece, or combining several
separate applications which ran together but in a distributed
system... I think there were close to 20 man years in the main
bit.

I think this qualifies as large, assuming the separate applications had
to cooperate reasonably closely (i.e. acting as "components", even
though maybe in separate processes and "only" touching on the same
database or set of files or whatever).
(And remembering that 1 line of Python code corresponds to
some larger number, maybe five or ten, of C code, that should
qualify it as a large project by many definitions.)

I agree. There IS a persistent idea that codebase size is all that
really matters, so 100,000 lines of code are just as difficult to
develop and maintain whether they're assembly, C, or Python. I think
this common idea overstates the case a bit (and even Capers Jones
agrees, though he tries to do so by distinguishing coding from other
development activities, which isn't _quite_ my motivation).

Part of why I recommend having no more than 2-3 k lines of functional
code in a single Python component (plus about as much again of unit
test, etc, to no more than 6-7k lines including blanks/cmts/docstrings,
as above explained) is that those (say) 2.5k lines can do a hell of a
_LOT_ of stuff, quite comparable in my experience to 10k-15k lines of
C++ or Java (and more than that of C, of course) -- on the order of
magnitude of 200-300 function points at least. If you go much above
that, keeping the characteristics of cohesion and coherence becomes way
too hard. So, a 100kSLOC Python project will have at least about 40
components, and 10k or so FPs, where a Java project with the same line
count might typically have 2-3K FPs spread into, say, 15 components.
(I'm thinking of functional effective lines, net of testing, comments,
docstrings, or any kind of code instrumentation for debug/profile/&c).

In other words: the Python project is _way_ bigger in functionality, and
therefore in needed/opportune internal granularity, than the Java one
with the same SLOCs. Jones' estimates for Java's language level; are
"10 to 20 function points per staff month". He doesn't estimate Python,
but if I'm right and the language level (FP/SLOC) is about 4-5 times
Java's, nevertheless according to Jones' tables that, per se, would only
push productivity to "30 to 50 function points per staff month" -- a
factor of less than three.

(( Of course, for both Java and Python, and also C, C++, etc,
superimposed on all of these productivity estimates there _is_ the
possibility of reuse of the huge libraries of code available for these
languages -- most of all for Python, who's well supplied with tools and
tecnologies to leech^H^H^H^H^H ahem, I mean, fruitfully reuse good
existing libraries almost regardless of what language the libraries were
originally made _for_. A reuse-oriented culture, particularly now that
so many good libraries are available under open-source terms, CAN in my
opinion easily boost overall productivity, in terms of functionality
delivered and deployed, by _AT LEAST_ a factor of 2 in any of these
languages. But this, in a way, is a different issue... ))

Neither do I. The above project also involved some C and
some assembly, plus some Javascript and possibly something else
I've forgotten by now. We just made efforts to use Python *as
much as possible* and it paid off.

Hmmmm, yes, assembly may be unusual these days, but C extensions are
very common, pyrex ones rightfully becoming more so, Javascript quite
typical when you need to serve webpages that are richly interactive
without requiring round-trips to the server, and we shouldn't ignore the
role of XSLT and friends too. And what large project is without some
SQL? Exceedingly few, I think.

But Python can fruitfully sit in the center and easily amount to 80% or
90% of the codebase even in projects needing all of these other
technologies for specialized purposes...

But what if you already had tests which allowed you to do exactly
the thing you describe? Is there a need for "better options"
at that point? Are they really better? When I do TDD, I can
*trivially* catch all the cases where Class.Foo is used
because they are all exercised by the tests. Furthermore, I

Absolutely. The main role of the unit tests is exactly to define all
the use cases of Foo and the expected results of such uses. If the unit
tests are decent, and with TDD they _will_ be, they suffice to let you
change Foo's internals without breaking Foo's uses (refactoring).

One thing unit tests can't do, and Foo's documentation cannot either, is
to find out if any of Foo's abilities are _totally unused_ -- for that,
you do need to scour the codebase. Trimming functionality that had
originally seemed necessary and was negotiated to be included, but turns
out to be overdesigned, is not a crucial activity (it's sure not worth
distorting a language to make such trimming faster), but it's a nice
periodic exercise. Anything that's excised from the code, and tests,
and internal docs, is so much less to maintain in the future. Of
course, you can't do that anyway if you "publish" components for outside
consumption by code you can't check or control; and even in a single
team situation you still need to check with others if they weren't
planning to use just tomorrow one of the capabilities you'd like to
remove today.

One interesting possibility is to instrument Foo to record all the uses
it gets, tracing them into a file or wherever, then run the system
through its paces -- all the unit tests of every component that depends
(even indirectly) on the one containing Foo, and all the existing
integration and acceptance tests. A profiler can typically do it for
you, in any language, when used in "code coverage" mode. If any part of
Foo's code has 0 coverage _except_ possibly by Foo's own unit tests,
that _does_ tell you something. And it need have nothing to do with
typing, of course. One case I recall from many years ago was something
like:

int foo(int x, int y) {
if (x<23) { /* small fast case, get out of the way quick */
/* a dozen lines of code for the small fast case */
} else { /* the real thing, get to work! */
/* six dozen lines of code for the real thing */
}
}

where the whole 'real thing' _never_ happened to be exercised. With a
little checking around, changing this to return an error code if x>=23
(it should never have happened, just as it never did) was a really nice
_snip_ (excised code goes to a vault and a pointer to it is left in a
comment here, of course, in case it's needed again in the future; but
meanwhile it doesn't need to get maintained or tested, maybe for years,
maybe forever...).
can catch real bugs, not just typos and simple things involving
using the wrong type. A superset of the bugs your statically
typed language tools are letting you catch. But obviously
I'm rehashing the argument, and one which has been discussed
here many times, so I should let it go.

You surely won't get any disagreement from me about this -- and I don't
believe any static-typing enthusiast argues _against_ unit tests and
TDD, they just want BOTH, even though we claim (and C++/Java guru Robert
Martin himself strongly claims) that TDD and systematic unit testing
really makes static-typing rather redundant... you keep paying all the
price for that language feature, don't get much benefit in return.

I assumed no such thing, just that you were unfamiliar with
large projects in Python and yet were advising the OP on its
suitability in that realm. You're bright and experienced, and
your comments have substance, but until you've actually
participated in a large project with Python and seen it fail
gloriously *because it was not statically typed*, I wouldn't
put much weight on your comments in this area if I were the
OP. That's all I was saying...

I would gladly accept as relevant experiences with other languages that
are strictly but dynamically typed, such as, say, Smalltalk or Ruby or
Erlang, if project failures (or even, short of failures, severe
productivity hits) can indeed be traced, despite proper TDD/unit
testing, to the lack of statically checked typing. I try to keep up
with the relevant literature (can't possibly manage for _all_ of it of
course) and don't recall any such experiences, but of course I may well
have missed some, particularly since not everything gets published.


Alex
 
G

GerritM

Alex Martelli said:
GerritM <[email protected]> wrote: large

Not as much as one might hope, in my experience. Protocol Adaptation
_would_ help (see PEP 246), but it would need to be widely deployed.
I think that I understand how PEP 246 might be an improvement over the
current situation. However, I think that Python 2.3 capabilities already
result in smaller programs and presumably less modules than their equivalent
in Java (or C++). The Objective-C system that we created (360kloc in 1992,
600kloc in 1994) did have a signficant amount of classes that are today
covered by the standard build-ins.I expect that Java and C++ suffer from the
same problem. The packages that I used a long time ago in Java were less
natural than todays Python packages (this might be entirely different today,
I haven't touched Java for centuries, ehh years). My assumption is that
integration problems are at least proportional with the implementation size
(in kloc). So my unproven hypothesis is that since Python programs tend to
be smaller than their Java equivalent that the integration problems are
smaller, purely due to the size.
But the extreme difficulty in keeping track of what amount of memory
goes where in what cases is a big minus. I recall similar problems with
Java, in my limited experience with it, but for Java I see now there are
commercial tools specifically to hunt down memory problems. In C++
there were actual _leaks_ which were a terrible problem for us, but
again pricey commercial technology came to the rescue.
In the same system mentioned above we build our own instrumentation. The
main part was based on insering a small piece of adminstrative code at every
object creation and deletion. This Object Instantation Tracing proved to be
a goldmine of information, including memory use. For instance the memory use
of Lists and Dictionaries could be traced for well defined use cases.
Besides this instrumentation we did the memory management of "bulkdata",
such as images, explicitly. This helps to keep the memory consumption within
specified boundaries and it helps to prevent memory fragmentation problems.
But memory is a _big_ problem, in my experience so far, with servers
meant to run a long time and having very large code bases. I'm sure
there IS a commercial niche for a _good_ general purpose Python tool to
keep track of memory consumption, equivalent to those available for C,
C++ and Java...
The investment in the tools mentioned above were relatively small. However,
this works only if the entire system is based on the same architectural
rules.

The additional challenge of Python relative to Objective-C is its garbage
collection. This provides indeed a poorly predictable memory behavior.

Some of the design aspects mentioned here are described in this chapter of
my PhD thesis:
http://www.extra.research.philips.com/natlab/sysarch/MIconceptualViewPaper.p
df

kind regards, Gerrit
 
P

Peter Hansen

Alex Martelli wrote (a hell of a lot, as usual, and I do hope
he'll forgive me that I chose to skip/skim some material and
try merely to catch the highlights, doubtless missing some
interesting bits in the process):
Hmmm... the way I see it, it's about how one (or, better!!!, two: pair
programming is a GREAT idea) proceeds to _implement_ a design. The fact
that the package (or other kind of component) I'm writing will offer a
class Foo with a no-parameters constructor, methods A, B, and C with
parameters thus and thus, etc, has hopefully been determined and agreed

We don'really disagree on this point. I'd clarify my comments just
by saying that depending on what stage you are looking at, there
is always a preceding decision that could be called "design" and some
subsequent work that implements that design. If you are figuring out
what requirements your system should have, you are "designing" it for
your eventual users in a sense. If you are analysing requirements
later on and blocking out the major architectural areas and interfaces,
you are doing design, but then the traditional "designers" might still
have to go to work. Those designers (being the ones we usually saddle
with the title) then do "detailed design" and specify interfaces and
such as you note above, but they aren't yet doing implementation. Along
comes the programmer pair and they "design" the implementation in their
heads as they come to a failing acceptance test case, then conceive of
some units tests and some code, designing as they go.

In a nutshell, I was talking about that portion of design that occurs
when a good programmer goes to work figuring out just *how* she will
implement that method A with parameters x and y and a failing test
case that says it should act suchlike... Certainly TDD is not as
much about the more traditional design, the implementation of which
you refer to above.
TDD may surely help defining the internal structures and algorithms
within a single component, of course, if that's what you mean by design.

Yep.. saw this while pruning your text. Had I read more thoroughly
the first time it would have saved all that typing, which I'm now
loathe to remove. :-(
It sure beats "retrofitting" unit tests post facto. But I'm not sure
what properties you have in mind here; care to expand?

You've forgotten them at the moment, but I know you know about those
properties such as how TDD *forces* testability on the design/
implementation, and thus improves modularity, how it greatly reduces
the incentive and opportunity to gold-plate, how the most critical
tests are run hundreds or thousands of times during a project instead
of a handful of times just prior to shipping, and so forth.
I think this qualifies as large, assuming the separate applications had
to cooperate reasonably closely (i.e. acting as "components", even
though maybe in separate processes and "only" touching on the same
database or set of files or whatever).

It was a true distributed system, so yes the components
closely cooperated. Acceptance tests actually ran both pieces
simultaneously, for the more complex tests, and in some few cases
even involved a simulator of the 16-bit embedded devices so that
the test case spanned four levels (web browser, server, third
piece, and the simulator for smaller gadgets). The simulator, of
course, was written in Python...

-Peter
 
A

Alex Martelli

GerritM said:
I think that I understand how PEP 246 might be an improvement over the
current situation. However, I think that Python 2.3 capabilities already
result in smaller programs and presumably less modules than their equivalent

If you are aiming at a given fixed amount of functionality, yes: smaller
programs, and fewer modules (not in proportion, because each module
tends to be smaller). Modules aren't really the problem in _system
integration_, though; the unit that's developed together, tested
together, released together, is something a bit less definite that is
sometimes called a "component". It could be a module, more likely it
will be a small number of modules, perhaps grouped into a package.

One of my ideas is that a component needs to be cohesive and coherent.
I'm not alone in thinking that, at any rate. Therefore, the number of
components in a system with a given number of FP is weakly affected by
the language level of the chosen implementation language, because
each component cannot/shouldn't really have more than X function points,
even if using a very high level language means each component is
reasonably small. To get concrete, already in my previous post I gave
some numbers (indicative ones, of course): 200-300 FP per component,
meaning about 2k-3k SLOCs in Python (functional _application_ code, net
of tests, instrumentation, docs, comments, etc -- about 6k-7k lines as
wc counts them might be a reasonable rules of thumb, about half of them
being tests).

So, if you're building a 5000-FP system, you're going to end up with
about 20 components to integrate -- even though in Python that means 50k
lines of application code, and in Java or C++ it might well be 200k or
more. The design problem (partitioning) and the system integration may
end up being in the same order of magnitude, or the Python benefit might
be 20%, 30% tops, nothing like the 4:1 or 5:1 advantage you get in the
coding and testing of the specific single components.

My numbers may well be off (I'm trying to be concrete because it's too
easy to handwave away too much, in this field;-) but even if you double
component size and thus halve number of components in each language the
relative ratio remains the same. Python may gain some advantage by
making components that are a bit richer than the optimal size for Java
or C++ coded ones, but it's still not a many-times-to-one ratio as it is
for the pure issue of coding, in my experience.

I haven't touched Java for centuries, ehh years). My assumption is that
integration problems are at least proportional with the implementation size
(in kloc). So my unproven hypothesis is that since Python programs tend to
be smaller than their Java equivalent that the integration problems are
smaller, purely due to the size.

This is the crux of our disagreement. For a solid component built by
TDD, it's a second-order issue, from the POV of integrating it with the
other components with which it must interact in the overall system, how
big it is internally: the first order issue is, how rich is the
functionality the component supplies to other components, consumes from
them, internally implements. Integrating two components with the same
amount of functionality and equivalent interfaces between them, assuming
they're both developed solidly wrt the specs that are incarnated in each
component's unit-tests, is weakly dependent on the level of their
implementation languages.

Maybe I'm taking for granted a design approach that requires system
functionality to be well-partitioned among components interacting by
defined interfaces. But that's not a Python-specific issue: that's what
we were doing, albeit without a fully developed "ideology" well
developed to support it, when in the 2nd half of the '90s Lakos'
milestone book (whose title is echoed in this thread's subject) arrived
to confirm and guide our thinking and practice on the subject. I'm sure
_survivable_ large systems must be developed along this kind of lines
(with many degrees of variation possible, of course) in any language.

In the same system mentioned above we build our own instrumentation. The
main part was based on insering a small piece of adminstrative code at every
object creation and deletion. This Object Instantation Tracing proved to be ...
The investment in the tools mentioned above were relatively small. However,
this works only if the entire system is based on the same architectural
rules.

Well, this last sentence might be the killer, since it looks like it
will in turn kill the project's ability to reuse the huge amount of good
code that's out there for the taking. If you have to invasively modify
the code you're reusing, reuse benefits drop and might disappear.

So I want instrumentation that need not be in the Python sources of
application and library and framework components (multiframework reuse
is also a crux for PEP 246), much as I have for coverage or profiling.
If all it takes is hacking on the Python internals to provide a mode
(perhaps a separate compilation) that calls some sys.newhook at every
creation, sys.delhook at every deletion, etc, then that would IMHO be a
quite reasonable price to pay, for example.

The additional challenge of Python relative to Objective-C is its garbage
collection. This provides indeed a poorly predictable memory behavior.

Obj-C uses mark-and-sweep, right? Like Java? I'm not sure why
(reference counting bugs in badly tested extensions apart) Python's mix
of RC normally plus MS occasionally should be a handicap here.

Some of the design aspects mentioned here are described in this chapter of
my PhD thesis:
http://www.extra.research.philips.com/natlab/sysarch/MIconceptualViewPaper.p
df

Tx, I'll be happy to study this.


Alex
 
A

Aahz

What is the biggest system you have built with python personally? I'm
happy to be proven wrong, but honestly, the most enthusiastic "testing
solves all my problem" people I have seen haven't worked on anything
"large" -- and my definition of large agrees with Alex's; over 100
kloc, more than a handful of developers.

So you're saying that both attributes are necessary? (We're essentially
three programmers, but the codebase seems to be on the order of 150kloc,
about 2/3 of which is Python and the rest is HTML templates. I didn't
bother doing an exact check 'cause I'm in the middle of something else.)
 
A

Alan Gauld

So you're saying that both attributes are necessary?

I think they are because both people and code issues arise on
'large' projects (see the Mythoinal Man Month for examples
of each type), although all things are relative. Our local
definition of project size is:

< 100Kloc = small
100K-1Mloc = Medium
1Mloc = large

We try to keep the large projects to less than 10 at any one
time...

Staffing sizes are 1-6 on small projects
4-30 on medium and typically 30-500 on large ones
(I'd guess most large projects are actually around 2-3 MLoc
and have about 60-100 developers (inc dedicated testers).

Our most common project size is 200-300K with about 10-20
developers. (and the preferred methodology is DSDM) We probably
have about 30-50 such projects running at any one time.

On that scale I use Python for prototyping "components" on
the medium-large stuff but it all gets built in C++ or Java.
The small projects could be in Perl, VB/ASP, PL/SQL or Java.
(Sadly Python is not an approved language for production -
yet...I'm working on it :)

Alan G
Author of the Learn to Program website
http://www.freenetpages.co.uk/hp/alan.gauld/tutor2
 
J

Jack Diederich

< 100Kloc = small
100K-1Mloc = Medium

We try to keep the large projects to less than 10 at any one
time...

Staffing sizes are 1-6 on small projects
4-30 on medium and typically 30-500 on large ones
(I'd guess most large projects are actually around 2-3 MLoc
and have about 60-100 developers (inc dedicated testers).

Our most common project size is 200-300K with about 10-20
developers. (and the preferred methodology is DSDM) We probably
have about 30-50 such projects running at any one time.
Holy such-and-such, how many developers do you have? and
isn't it more like thirty+ companies under one roof?

I've mainly worked for dot-coms (and most of them startups)
but your coordination overhead must be just staggering. I work/worked
for small companies because I prefer it, but whoa...

-Jack
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top