Thoughts on some stdlib modules

V

vegetax

I was messing around in google looking for the available python form
validation modules when i found this:
http://www.jorendorff.com/articles/python/path/, and i realized that is very
similar to my python fileutils module,which encapsulate,path
operations,file operations,etc.

And those thoughts comes to mind again, if python is such a great language
why does the stdlib is so bloated with duplication,bad bad library
design,clumsy to use modules,etc.

Why does people have to put wrappers around about half of the standart
library modules? i have wrappers for urllib,urllib2,urlparse in urlutils.
glob,shutils,os.path,os,filecmp,etc in fileutils.
time,datetime.time,date,datetime.datetime,time.date,etc,etc in DateTime.
And so the list goes on.

I mean is this normal? i dont think so.I havent seen such a messy stdlib in
any language.Is it because of legacy code and backwards compatibility or
because not too much people in the python-dev cares about library design? i
admit the python language design is really really great but the stdlib is
totally forgotten.

Are those issues being considered right now? i cant find any PEP addressing
the issue especifically, at least cooking it for python 3000.

specific topics could be:

grouping related modules.
removing useless legacy modules.
refactoring duplicated functionality.
removing/redesigning poorly written modules.
adding a module versioning system.
 
N

Nick Efford

vegetax said:
And those thoughts comes to mind again, if python is such a great language
why does the stdlib is so bloated with duplication,bad bad library
design,clumsy to use modules,etc.
I mean is this normal? i dont think so.I havent seen such a messy stdlib in
any language.

Perl hardly covers itself with glory in this regard.

And what of Java? AWT & Swing, Date & Calendar, Streams, Readers
and java.nio... There's a lot of complex layering going on there,
with many older features being buried and then deprecated (actually
or effectively). The net result may be interesting for software
archaeologists, but hardly inspires the notion of a coherently
designed library.
Is it because of legacy code and backwards compatibility or

The full benefits and limitations of particular design decisions take
a while to emerge, after which point people are depending on the
code and you are limited to refactoring the implementation without
changing the interface - unless you are prepared for the howls
of protest from those whose code breaks. So to some extent the
problems you mention are unavoidable - but I think you overstate
your case.
because not too much people in the python-dev cares about library design?

I doubt that.
admit the python language design is really really great but the stdlib is
totally forgotten.

This is a very extreme view. The standard library isn't perfect,
but it is far from being the mess you imply.

My own personal bugbear is the issue of consistency. Java's standard
library might be a huge and clumsy beast with more than its fair share
of overloading and obsolescence, but it at least has the virtue of more
consistently following conventions on how classes and methods are
named, for instance.


Nick
 
R

Ron_Adam

Are those issues being considered right now? i cant find any PEP addressing
the issue especifically, at least cooking it for python 3000.

specific topics could be:

grouping related modules.
removing useless legacy modules.
refactoring duplicated functionality.
removing/redesigning poorly written modules.
adding a module versioning system.

I've been thinking that the lib directory could be better named and
rearranged a bit. I sometimes mistakenly open the libs directory
instead of lib because of the name similarity.

An alternative might be to use the name "packs" or "packages" in place
of "lib", which would emphasize the use of packages as the primary
method of extending python. The standard library could then be a
package called "stdlib" within this directory. Third party packages
would then be along side "stdlib" and not within a directory that is
within the standard library.

It would be mostly a cosmetic change, but I believe it would be worth
doing if it could be done without breaking programs that may have hard
coded path references to the library. :-/

Ron
 
S

Steve Holden

Ron_Adam said:
I've been thinking that the lib directory could be better named and
rearranged a bit. I sometimes mistakenly open the libs directory
instead of lib because of the name similarity.

An alternative might be to use the name "packs" or "packages" in place
of "lib", which would emphasize the use of packages as the primary
method of extending python. The standard library could then be a
package called "stdlib" within this directory. Third party packages
would then be along side "stdlib" and not within a directory that is
within the standard library.

It would be mostly a cosmetic change, but I believe it would be worth
doing if it could be done without breaking programs that may have hard
coded path references to the library. :-/

Ron
Ron:

You do a lot of thinking, don't you? :)

This is a *very large* change, not a cosmetic one, requiring changes to
many installation routines (including, probably, distutils) and causing
problems for software that attempts to operate with multiple versions of
Python - and those projects have problems enough as it is despite
Python's quite fine record of careful development.

This seems a rather high price to pay just to avoid having you
mistakenly avoid opening "libs" instead of "lib" - a distinction that is
only meaningful on Windows platforms anyway, I believe.

You are correct in suggesting that the library could be better organized
than it is, but I felt we would be better off deferring such change
until the emergence of Python 3.0, which is allowed to break backwards
compatibility. So, start working on your scheme now - PEP 3000 needs
contributions. My own current favorite idea is to have the current
standard library become the "stdlib" package, but I'm sure a lot of
people would find that suggestion at least as half-baked as yours.

{If an idea is more-half-baked than something exactly half-baked is it
0.4-baked or 0.6-baked? Does "more half-baked" actually mean "less baked"?)

regards
Steve
 
R

Ron_Adam

Ron:

You do a lot of thinking, don't you? :)

Just the way my mind works. ;-)
This is a *very large* change, not a cosmetic one, requiring changes to
many installation routines (including, probably, distutils) and causing
problems for software that attempts to operate with multiple versions of
Python - and those projects have problems enough as it is despite
Python's quite fine record of careful development.

I thought it might be more involved than it seemed.
This seems a rather high price to pay just to avoid having you
mistakenly avoid opening "libs" instead of "lib" - a distinction that is
only meaningful on Windows platforms anyway, I believe.

That's not surprising on windows.
You are correct in suggesting that the library could be better organized
than it is, but I felt we would be better off deferring such change
until the emergence of Python 3.0, which is allowed to break backwards
compatibility. So, start working on your scheme now - PEP 3000 needs
contributions. My own current favorite idea is to have the current
standard library become the "stdlib" package, but I'm sure a lot of
people would find that suggestion at least as half-baked as yours.

Yes, I agree, the "stdlib" should be a package. So I don't find it
half-baked at all. Packages are part of python, so python should take
advantage of them.

As far as a organizing scheme, I've come to the conclusion, files
should be organized by who's responsible for them, as in who to
contact if something doesn't work correctly. And not allowing files to
be intermixed from different sources is definitely worth doing if
possible. Something Windows does very very badly.

For Python, that would mean packages should be fully self contained
and don't move any files to other directories if possible. Which
simplifies installs, uninstalls, and upgrades. But it would require
much more than a cosmetic change, and more than the simple, or not so
simple, directory changes I suggested.

One of the tools I wrote in C (early 90's), was a make file maker. I
still have the source code here somewhere. Starting with the main
source file and a template with the compile options in it, it searched
all included files recursively for references and built the make file
using the template. It really made large projects easy. I don't
think that's anything new now. Dist tools should do something like
that to find needed files. It shouldn't matter what directories they
are in as long as it has read access rights to them, or they are in
the search path, or there's a direct or indirect reference to them in
the source code someplace.
{If an idea is more-half-baked than something exactly half-baked is it
0.4-baked or 0.6-baked? Does "more half-baked" actually mean "less baked"?)

regards
Steve

All new ideas are un-baked, they aren't fully baked until they are old
ideas which have been implemented. So 0.6 baked is more than half
baked, and 0.4 baked is ... pudding. ;-)

I'll consider working on that PEP. It sounds like it might be a good
project for me.

Cheers,
Ron
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

vegetax said:
Why does people have to put wrappers around about half of the standart
library modules? i have wrappers for urllib,urllib2, [... many more ...]

I mean is this normal?

Not sure what "this" is :) Is it normal that people write wrappers
around libraries? Yes, most certainly so. I see people writing wrappers
around the C++ library, around Java collection classes (*), around .NET
classes, ... essentially for every library that somebody writes,
somebody else will write a wrapper - unless the library is useless.

Is it normal that *you* write this many wrappers? I don't know you
good enough to answer that question :)

Is it good that people write these wrappers? I don't think so. Is it
the library's fault? I don't think so, either. People write these
wrappers because they cannot remember how to use the library. This
is not because the library is hard to remember, but because people
cannot remember much at all - except for the things they have done
themselves (**). So they write a library wrapper, and doing so takes
them enough time to ingrain their own API into their own memory.
If somebody else wrote the very same wrapper for them, they still
would not like to use it.

So what can be done? Not much, I think - no matter what the
library reorganization is, people will continue to write wrappers
around it. That is not to say that a library reorganization couldn't
be helpful to some people - if people find enough energy, a library
reorganization will happen. Some will like it, some will hate it.

I personally try to avoid library wrappers like the plague. It makes
my code harder to write, but easier to read.

Regards,
Martin

(*) Just look at the Apache Commons library. People write wrappers
for stuff like

bool isEmptyString(String s){
return s==null || s.length()==0;
}

(**) of course, most people can only remember a subset of these,
as well.
 
K

Kay Schluehr

Martin said:
Is it good that people write these wrappers?
No.

I don't think so. Is it
the library's fault? I don't think so, either. People write these
wrappers because they cannot remember how to use the library.

I slightly disagree. It's a symptom. The example of the path-module is
instructive because it actually represents path objects not just a
bunch of functions in isolation that manipulate strings that can be
interpreted as paths. I started to read in the lib.py documentation of
pypy and again I found the reference to the path-module. Different
people seem to share similar ideas and claim about inconveniencies. I
use the path-module by myself for a year. This indicates that there is
something wrong or missing in the std-library.

Regards,
Kay
 
K

Kay Schluehr

Steve said:
You are correct in suggesting that the library could be better organized
than it is, but I felt we would be better off deferring such change
until the emergence of Python 3.0, which is allowed to break backwards
compatibility. So, start working on your scheme now - PEP 3000 needs
contributions.

I fear that Python 3.0 becomes some kind of vaporware in the Python
community that paralyzes all redesign efforts on the std-lib. The
argument goes like this: one has to wait until the BDFL has made his
syntax/feature/builtin decisions. It is not usefull to redesign a
std-library of a language that becomes somehow deprecated - but the
BDFL thinks that Python 3.0 is still py-in-the-sky. PEP 3000 seems to
be nothing more than a summary of the BDFLs musings about Python warts
and some wishfull but highly controversial features like type guards
that would have a great overall impact on the std-lib.

Regards,
Kay
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Kay said:
I slightly disagree. It's a symptom. The example of the path-module is
instructive because it actually represents path objects not just a
bunch of functions in isolation that manipulate strings that can be
interpreted as paths. I started to read in the lib.py documentation of
pypy and again I found the reference to the path-module. Different
people seem to share similar ideas and claim about inconveniencies. I
use the path-module by myself for a year. This indicates that there is
something wrong or missing in the std-library.

I personally find the notion of "path objects" confusing. Is this a
real path name on the local system, or merely a potential one? If
potential, why does it have a .size attribute (what is the size of
a non-existing file)? If real, how do I construct the name of a new
file?

What about relative paths? It says they are relative to the current
directory. Do I never need paths relative to some other directory?
What about the notion of "current drive" that Windows has?

That said, if the people who want a path object would get together
and contribute one, the library could be extended. I don't know
whether this would be an improvement, though - os.path could not
go away for backwards compatibility, so users would now have *two*
ways of dealing with path names.

I also think that one of the reasons why there isn't a path object
in the standard library yet is that nobody who has written one in
the past trusts his own code enough to make it "official". Jason
Orendorff writes "I find it a joy to use, and I hope you will too."
This is the standard open source attitude (and rightly so): if
you don't like it, don't use it. This attitude could not be
preserved anymore if this would be in the standard library (or
else we would not have this discussion).

For a number of libraries added recently, I heard lots of complaints
how terrible they are to use, including, in particular, the XML
and logging libraries, and the Unicode type, and that "something
else" is so much easier to use. Why is it that the authors of
"something else" never contribute it to Python?

Regards,
Martin
 
K

Kay Schluehr

Martin said:
That said, if the people who want a path object would get together
and contribute one, the library could be extended. I don't know
whether this would be an improvement, though - os.path could not
go away for backwards compatibility, so users would now have *two*
ways of dealing with path names.

That's true. But it indicates that the idea of a "standard library" or
"batteries included" approach in Python which collects the most common
and important computing ideas and expresses them in a mandatory
Pythonic implementation is beginning to die or at least changes it's
meaning into: an exposition library of the CPython interpreter which
remains clearly indispensable.
I also think that one of the reasons why there isn't a path object
in the standard library yet is that nobody who has written one in
the past trusts his own code enough to make it "official". Jason
Orendorff writes "I find it a joy to use, and I hope you will too."
This is the standard open source attitude (and rightly so): if
you don't like it, don't use it. This attitude could not be
preserved anymore if this would be in the standard library (or
else we would not have this discussion).

Jason Orendorff seems not to be the only one who finds it a "joy to
use".
For a number of libraries added recently, I heard lots of complaints
how terrible they are to use, including, in particular, the XML
and logging libraries, and the Unicode type, and that "something
else" is so much easier to use. Why is it that the authors of
"something else" never contribute it to Python?

You have already given the arguments in Your discussion above. I
personally never use the standard-lib XML parser, but pyRXP/pyRXPU
which is fast, stores objects in pythonic list/tuple/dict structures
and provides access by lazy tagging, implemented in the small
xmlutils.py module which I extended for my own comfort. It dispenses
both the slow javaesque W3C-DOM parser implementation and the
statemachine oriented SAX. But has this module some place in the
standard-lib. No, because it promotes double implementation i.e. not
only one way to do it. And throwing away old stuff is impossible
because downwards incompatibilities. No one of these axioms can be
skipped for a standard lib. So almost everyone deals with cute 3-rd
party packages.

Regards,
Kay
 
F

Fredrik Lundh

Kay said:
You have already given the arguments in Your discussion above. I
personally never use the standard-lib XML parser, but pyRXP/pyRXPU
which is fast, stores objects in pythonic list/tuple/dict structures
and provides access by lazy tagging

it's also GPL:ed, and the namespace support is totally broken. there
are faster solutions out there with Python-compatible licenses.

</F>
 
F

Fredrik Lundh

Kay said:
I fear that Python 3.0 becomes some kind of vaporware in the Python
community that paralyzes all redesign efforts on the std-lib.

that, combined with the old observation that CPython developers,
when given a choice, prefer to write C code over Python code, is
making the standard library a lot less useful than it could be.

(if you look at recent releases, most standard lib additions are things
that are fun for language tinkerers and people looking for many ways
to write simple algorithms, but very little stuff that's useful for
scripters
and application builders. a C implementation of _bisect. hello?)

if I were in charge, I'd separate 90% of the standard library from the
core distribution, made sure it ran on multiple implementions (at least
the two latest CPython implementations, plus what's needed to make
as much as possible available on the latest Jython and IronPython
releases), bundled a number of carefully selected external libraries
(without forcing developers to give up rights and loose control over
maintenance), refactor the test suite so it could be used both to test
the library and to see what parts worked properly on your platform,
and make new releases (for testers and early adopters) available
regularily.

</F>
 
K

Kay Schluehr

Fredrik said:
it's also GPL:ed, and the namespace support is totally broken. there
are faster solutions out there with Python-compatible licenses.

</F>

Interesting. Which implementation "out there" ( so not in the std-lib )
that maps the whole XML into an internal structure and makes it easily
accessible is currently faster? Remark: I have a clearly limited
perspective on this issue. I did not recognize that namespaces are
broken in pyRXP because the XML-docs with which I deal in my company
are used for storing data that are used in industrial production where
one XML-doc matches one system. Those XML are tool-generated and used
as a replacement for initialization-files ( size ~ 1MB - 5MB ) and are
never broken because they are self contained.

Regards,
Kay
 
F

Fredrik Lundh

Kay said:
Interesting. Which implementation "out there" ( so not in the std-lib )
that maps the whole XML into an internal structure and makes it easily
accessible is currently faster?

here are three alternatives:

ltree (http://codespeak.net/lxml/)
libxml2 (http://xmlsoft.org/downloads.html)
celementtree (http://effbot.org/zone/celementtree.htm)

(ltree is an alternative binding to libxml2).

for raw parsing performance (disk to memory), see:

http://effbot.org/zone/celementtree.htm#benchmarks

as for access, celementtree builds a tree of Python objects, while
the others have to convert the internal structures to Python objects
as you access them. if you need to access large parts of the tree,
celementtree can be a lot faster (the yum developers report a 2-3x
speedup, for example). on the other hand, libxml2/ltree supports a
lot more XML standards (XPath, XSLT, various validation models,
etc) which can come in handy in many cases.

</F>
 
R

Ron_Adam

Even if Python 3.0 never materializes, The documented PEP may still
have an impact on further development of Python. So it might also be
referred to as PEP __future__ .

Has there been any suggestion of a time line? If there is a new
release every 18 months, v2.4 to v3.0 would be, 108 months?, Or would
there be a jump from v2.5 or 2.6, to v3.0?
that, combined with the old observation that CPython developers,
when given a choice, prefer to write C code over Python code, is
making the standard library a lot less useful than it could be.

(if you look at recent releases, most standard lib additions are things
that are fun for language tinkerers and people looking for many ways
to write simple algorithms, but very little stuff that's useful for
scripters
and application builders. a C implementation of _bisect. hello?)

"Fun" things and demos could be put in an "extras". That might do a
lot to clean up the library so that the rest of it can be put in
better perspective. Also looking at my python24 directory there is a
'tools' dir that probably could be put in the extra package as well.
if I were in charge, I'd separate 90% of the standard library from the
core distribution, made sure it ran on multiple implementions (at least
the two latest CPython implementations, plus what's needed to make
as much as possible available on the latest Jython and IronPython
releases), bundled a number of carefully selected external libraries
(without forcing developers to give up rights and loose control over
maintenance), refactor the test suite so it could be used both to test
the library and to see what parts worked properly on your platform,
and make new releases (for testers and early adopters) available
regularily.

</F>

Larger utility packages could be moved from the "stdlib" but still be
included as separate packages that can optionally be installed from
the python installer. Idle, distutils, tcl/tk, .. ?

Probably a clearer definition of purpose for the different parts is
needed. I haven't seen anything documented on that specifically. Has
it been upto recently, 'more is better' as long as it doesn't break
anything? With the emphasis on growing the language?

What would definitions of 'purpose' be for?
'__builtin__'
'__builtins__'
packages:
'stdlib' ... 'stdlib23', 'stdlib24' # Versions?
'extras' # examples, demos, and fun stuff
other packages included in the install
packages available at 'pythonpacks.com' (possible?)

seperately installed applications ?

Is it possible to get some sort of overview on extending python? Does
one already exist?

Cheers,
Ron
 
F

Fredrik Lundh

Martin said:
For a number of libraries added recently, I heard lots of complaints
how terrible they are to use, including, in particular, the XML
and logging libraries, and the Unicode type, and that "something
else" is so much easier to use. Why is it that the authors of
"something else" never contribute it to Python?

because we're not willing to go through endless PEP processes,
generate patches that end up sitting on sourceforge for years, deal
with shitstorms initiated by developers of "competing" libraries, sign
over our copyrights to the PSF, loose control over the code base,
seeing the code being forked into several incompatible versions,
and so on ?

(I'd rather spend that energy on developing new stuff.)

</F>
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Fredrik said:
because we're not willing to go through endless PEP processes,
generate patches that end up sitting on sourceforge for years, deal
with shitstorms initiated by developers of "competing" libraries, sign
over our copyrights to the PSF, loose control over the code base,
seeing the code being forked into several incompatible versions,
and so on ?

I can see why you are not willing to do some of these things. But
I can't see why that is for other things. For example, why are you
not willing to license your contribution to the PSF (nobody asks
you to "sign it over")?

As for patches sitting on sourceforge for years: regular contributors
can commit to the CVS without putting a patch on SF first (as you
certainly know), so an alternative to waiting for years is to become
a regular contributor.

As for "losing control" - you seem to have a notion of control that
truly makes it difficult to contribute to Python. In this specific
case, I don't think Python should change, though; it should be
possible to accept changes to your code before asking for
your permission to make that change.

Regards,
Martin
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Fredrik said:
because we're not willing to go through endless PEP processes,
generate patches that end up sitting on sourceforge for years, deal
with shitstorms initiated by developers of "competing" libraries, sign
over our copyrights to the PSF, loose control over the code base,
seeing the code being forked into several incompatible versions,
and so on ?

I can see why you are not willing to do some of these things. But
I can't see why that is for other things. For example, why are you
not willing to license your contribution to the PSF (nobody asks
you to "sign it over")?

As for patches sitting on sourceforge for years: regular contributors
can commit to the CVS without putting a patch on SF first (as you
certainly know), so an alternative to waiting for years is to become
a regular contributor.

As for "losing control" - you seem to have a notion of control that
truly makes it difficult to contribute to Python. In this specific
case, I don't think Python should change, though; it should be
possible to accept changes to your code before asking for
your permission to make that change.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top