pre-PEP: Object-oriented file module

K

Kenneth McDonald

I'd like to propose a new PEP [no, that isn't a redundant 'process'
in there :)--pre-PEP is a different process than PEP], for a
standard library module that deals with files and file paths in an
object oriented manner. I believe this module should be included as
part of the standard Python distribution.

Background
==========
Some time ago, I wrote such a module for myself, and have found it
extremely useful. Recently, I found a reference to a similar module,
http://www.jorendorff.com/articles/python/path/ by Jeff Orendorff.
There are of course differences--I think mine is more comprehensive
but probably less stable--but the similarities in thought are
striking. Both work by creating a class representing file paths, and
then using that class to unify methods from shutil, os.path, and some
builtin functions such as 'open' (and maybe some other stuff I can't
remember).

I haven't looked at Jeff's code yet, but for my own, a major enabler
of the enhanced functionality has been the inclusion of generators in
Python. This allows, for example, a method which yields all of the
lines in a file and automatically closes that file after. The
availability of attributes also makes certain things cleaner than was
the case in previous versions of python.

Fit With Python Philosophy
=========================
One of the strengths of Python is that it is a highly object-oriented
language, but this is not true when it comes to handling files. As
far as python is concerned a file path is just a string, and there
are a bunch of things you can do with it, but they all have to be
done with function calls (not methods) since there is no concept of a
file path object. Even worse, these functions are spread out across
various modules, and often have cryptic names that hardly make it
obvious what they do.

Given that two different people concluded that such a module was
desirable, and independently implemented modules that are actually
very similar, I suspect there is an 'object-oriented mindset' to
which this way of addressing files and file paths is natural. And
that should be part of Python.

Pragmatic Justification
=================
I've been using my module for about a year and a half now. The ease-
of-use and uniformity make a huge (I'm tempted to say 'vast')
difference in dealing with files. I believe other users would
experience an increase in efficiency when dealing with files ranging
from 'significant' to 'very large' (in precise technical terms :) )
Also, I think this type of API would be much easier for new users to
learn and use.

Examples
========
A few examples are in order. Again, these are from my own library,
since I'm not too familiar with Jeff's. Also, this is stuff I'm just
typing in right now as an illustration--there may be syntactic
errors. (However, all of this functionality is present.) And these by
no means represent the full functionality that is already defined.

# define a new path object
mydir = filepath("#&*$directory")

# Note that special characters are automatically escaped
# by filepath, as necessary for the current OS. If a character
# is illegal in a file name no matter what (cannot be escaped),
# an exception will be raised.


# A file in that directory
f = mydir / "some.txt"

# Go through the lines in the file. When all lines are done,
# the file will be closed automatically. If the file does not
# actually exist, an appropriate exception will be raised.
for line in f.iterlines():
...do something...

#The directory containing f is, of course, 'mydir'
assert f.parent == mydir

#Another path
aPath = filepath(....)

#In my module (not in Jeff's), a file path is considered
# semantically as a sequence of directory names terminated
# by the name of a file or directory. This makes it easy to
# obtain the name of the file at the end of a path:
theFile = aPath[-1]

# or the directory leading to that file
parentDir = aPath[0:-1]

#of course, these two common indexes/slices are accessible through
attributes
theFile = aPath.basename
parentDir = aPath.parent

# A more powerful 'walk'-type method is included. Below,
# the 'recursive' indicates that directories should be recursively
# walked, and the 'preorder' indicates that directories should
# be included in the iteration _before_ their contents are given.
# There is also a 'postorder' argument, and both may be used to
# yield directories both before and after their contents.
aPath.iterfiles(recursive=True, preorder=True)

# With the advent of the 'itertools' module in python, there is no
# need to provide an argument taking a function that is applied
# during the walk process, so in that sense, iterfiles is actually
simpler than walk.

#...and more. All of the various file capabilities available in
Python are provided
# in a unified package in this module.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Kenneth said:
I'd like to propose a new PEP [no, that isn't a redundant 'process' in
there :)--pre-PEP is a different process than PEP], for a standard
library module that deals with files and file paths in an object
oriented manner. I believe this module should be included as part of
the standard Python distribution.

See the discussions at

http://python.org/sf/1226256
http://mail.python.org/pipermail/python-dev/2005-June/054439.html
http://mail.python.org/pipermail/python-dev/2005-July/054535.html

I'd be personally curious as to how you would be dealing with
Unicode file names. How to access file attributes is also
an interesting question (e.g. how to find out whether it is
a symlink, whether it is a hidden file, what the POSIX ACL
is, and what the 8.3 short name is)

Regards,
Martin
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Kenneth said:
Why would any of the issues below be any more difficult than they are with
the current file functions? I'm not proposing a C replacement for current
functions, merely a Python module that wraps all of those functions (and
adds some additional ones) in an appropriate class.

I'm not saying they are difficult. I want to know how your library deals
with them. There is a good chance that some of these questions remain
unanswered in the PEP, and I just want to indicate that I would be
unhappy if they are. Specifying this API is a huge task, much more so
than coming up with an implementation that does "something".

This is one of the reasons why nothing like this has made it to the
standard library: as a library module, it would have to face many
more scenarios that the authors of the module originally did not
consider. Therefore, the documentation must be complete and consistent,
and there should be an agreement as to what this library can do and
what it cannot do.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top