K
Kenneth McDonald
I'd like to propose a new PEP [no, that isn't a redundant 'process'
in there --pre-PEP is a different process than PEP], for a
standard library module that deals with files and file paths in an
object oriented manner. I believe this module should be included as
part of the standard Python distribution.
Background
==========
Some time ago, I wrote such a module for myself, and have found it
extremely useful. Recently, I found a reference to a similar module,
http://www.jorendorff.com/articles/python/path/ by Jeff Orendorff.
There are of course differences--I think mine is more comprehensive
but probably less stable--but the similarities in thought are
striking. Both work by creating a class representing file paths, and
then using that class to unify methods from shutil, os.path, and some
builtin functions such as 'open' (and maybe some other stuff I can't
remember).
I haven't looked at Jeff's code yet, but for my own, a major enabler
of the enhanced functionality has been the inclusion of generators in
Python. This allows, for example, a method which yields all of the
lines in a file and automatically closes that file after. The
availability of attributes also makes certain things cleaner than was
the case in previous versions of python.
Fit With Python Philosophy
=========================
One of the strengths of Python is that it is a highly object-oriented
language, but this is not true when it comes to handling files. As
far as python is concerned a file path is just a string, and there
are a bunch of things you can do with it, but they all have to be
done with function calls (not methods) since there is no concept of a
file path object. Even worse, these functions are spread out across
various modules, and often have cryptic names that hardly make it
obvious what they do.
Given that two different people concluded that such a module was
desirable, and independently implemented modules that are actually
very similar, I suspect there is an 'object-oriented mindset' to
which this way of addressing files and file paths is natural. And
that should be part of Python.
Pragmatic Justification
=================
I've been using my module for about a year and a half now. The ease-
of-use and uniformity make a huge (I'm tempted to say 'vast')
difference in dealing with files. I believe other users would
experience an increase in efficiency when dealing with files ranging
from 'significant' to 'very large' (in precise technical terms )
Also, I think this type of API would be much easier for new users to
learn and use.
Examples
========
A few examples are in order. Again, these are from my own library,
since I'm not too familiar with Jeff's. Also, this is stuff I'm just
typing in right now as an illustration--there may be syntactic
errors. (However, all of this functionality is present.) And these by
no means represent the full functionality that is already defined.
# define a new path object
mydir = filepath("#&*$directory")
# Note that special characters are automatically escaped
# by filepath, as necessary for the current OS. If a character
# is illegal in a file name no matter what (cannot be escaped),
# an exception will be raised.
# A file in that directory
f = mydir / "some.txt"
# Go through the lines in the file. When all lines are done,
# the file will be closed automatically. If the file does not
# actually exist, an appropriate exception will be raised.
for line in f.iterlines():
...do something...
#The directory containing f is, of course, 'mydir'
assert f.parent == mydir
#Another path
aPath = filepath(....)
#In my module (not in Jeff's), a file path is considered
# semantically as a sequence of directory names terminated
# by the name of a file or directory. This makes it easy to
# obtain the name of the file at the end of a path:
theFile = aPath[-1]
# or the directory leading to that file
parentDir = aPath[0:-1]
#of course, these two common indexes/slices are accessible through
attributes
theFile = aPath.basename
parentDir = aPath.parent
# A more powerful 'walk'-type method is included. Below,
# the 'recursive' indicates that directories should be recursively
# walked, and the 'preorder' indicates that directories should
# be included in the iteration _before_ their contents are given.
# There is also a 'postorder' argument, and both may be used to
# yield directories both before and after their contents.
aPath.iterfiles(recursive=True, preorder=True)
# With the advent of the 'itertools' module in python, there is no
# need to provide an argument taking a function that is applied
# during the walk process, so in that sense, iterfiles is actually
simpler than walk.
#...and more. All of the various file capabilities available in
Python are provided
# in a unified package in this module.
in there --pre-PEP is a different process than PEP], for a
standard library module that deals with files and file paths in an
object oriented manner. I believe this module should be included as
part of the standard Python distribution.
Background
==========
Some time ago, I wrote such a module for myself, and have found it
extremely useful. Recently, I found a reference to a similar module,
http://www.jorendorff.com/articles/python/path/ by Jeff Orendorff.
There are of course differences--I think mine is more comprehensive
but probably less stable--but the similarities in thought are
striking. Both work by creating a class representing file paths, and
then using that class to unify methods from shutil, os.path, and some
builtin functions such as 'open' (and maybe some other stuff I can't
remember).
I haven't looked at Jeff's code yet, but for my own, a major enabler
of the enhanced functionality has been the inclusion of generators in
Python. This allows, for example, a method which yields all of the
lines in a file and automatically closes that file after. The
availability of attributes also makes certain things cleaner than was
the case in previous versions of python.
Fit With Python Philosophy
=========================
One of the strengths of Python is that it is a highly object-oriented
language, but this is not true when it comes to handling files. As
far as python is concerned a file path is just a string, and there
are a bunch of things you can do with it, but they all have to be
done with function calls (not methods) since there is no concept of a
file path object. Even worse, these functions are spread out across
various modules, and often have cryptic names that hardly make it
obvious what they do.
Given that two different people concluded that such a module was
desirable, and independently implemented modules that are actually
very similar, I suspect there is an 'object-oriented mindset' to
which this way of addressing files and file paths is natural. And
that should be part of Python.
Pragmatic Justification
=================
I've been using my module for about a year and a half now. The ease-
of-use and uniformity make a huge (I'm tempted to say 'vast')
difference in dealing with files. I believe other users would
experience an increase in efficiency when dealing with files ranging
from 'significant' to 'very large' (in precise technical terms )
Also, I think this type of API would be much easier for new users to
learn and use.
Examples
========
A few examples are in order. Again, these are from my own library,
since I'm not too familiar with Jeff's. Also, this is stuff I'm just
typing in right now as an illustration--there may be syntactic
errors. (However, all of this functionality is present.) And these by
no means represent the full functionality that is already defined.
# define a new path object
mydir = filepath("#&*$directory")
# Note that special characters are automatically escaped
# by filepath, as necessary for the current OS. If a character
# is illegal in a file name no matter what (cannot be escaped),
# an exception will be raised.
# A file in that directory
f = mydir / "some.txt"
# Go through the lines in the file. When all lines are done,
# the file will be closed automatically. If the file does not
# actually exist, an appropriate exception will be raised.
for line in f.iterlines():
...do something...
#The directory containing f is, of course, 'mydir'
assert f.parent == mydir
#Another path
aPath = filepath(....)
#In my module (not in Jeff's), a file path is considered
# semantically as a sequence of directory names terminated
# by the name of a file or directory. This makes it easy to
# obtain the name of the file at the end of a path:
theFile = aPath[-1]
# or the directory leading to that file
parentDir = aPath[0:-1]
#of course, these two common indexes/slices are accessible through
attributes
theFile = aPath.basename
parentDir = aPath.parent
# A more powerful 'walk'-type method is included. Below,
# the 'recursive' indicates that directories should be recursively
# walked, and the 'preorder' indicates that directories should
# be included in the iteration _before_ their contents are given.
# There is also a 'postorder' argument, and both may be used to
# yield directories both before and after their contents.
aPath.iterfiles(recursive=True, preorder=True)
# With the advent of the 'itertools' module in python, there is no
# need to provide an argument taking a function that is applied
# during the walk process, so in that sense, iterfiles is actually
simpler than walk.
#...and more. All of the various file capabilities available in
Python are provided
# in a unified package in this module.