PEP on path module for standard library

S

Scott David Daniels

Duncan said:
BTW, does it matter at all in practical use that the base class of path
varies between str and unicode depending on the platform?

Isn't it even worse than this?
On Win2K & XP, don't the file systems have something to do with the
encoding? So D: (a FAT drive) might naturally be str, while C:
(an NTFS drive) might naturally be unicode. Even worse, would be a
path that switches in the middle (which it might do if we get to a
ZIP file or use the newer dir-in-file file systems.

--Scott David Daniels
(e-mail address removed)
 
M

Michael Hoffman

John said:
However, a path as a sequence of characters has even less
meaning - I can't think of a use, while I have an application
where traversing a path as a sequence of path elements makes
perfect sense: I need to descend the directory structure, directory
by directory, looking for specific files and types.

I *have* used a path as a sequence of characters before. I had to deal
with a bunch of filenames that were formatted like "file_02832.a.txt"

I can see the case for a path as a sequence of elements, although in
practice, drive letters, extensions, and alternate streams complicate
things.

But as the discussion here unfolds I'm starting to feel that the
advantages of using a possibly more "meaningful" approach to path as a
sequence of elements are overwhelmed by the practical advantages of
using a basestring. Mainly, that you can use it anywhere a basestring
would be used today and it Just Works.
 
T

Terry Reedy

Glad I'm not the only oddball.
Maybe it's nitpicking, but I don't think that a path object should be a
'sequence of path elements' in an iterator context.

This means that

for element in pathobject:

has no intuitive meaning for me, so it shouldn't be allowed.

???? The internal equivalent of (simplified, omitting error checking,
etc.)

for dir in pathobject:
if isdir(dir): cd(dir)

*is*, in essence, what the OS mainly does with paths (after splitting the
string representation into pieces).

Directory walks also work with paths as sequences (stacks, in particular).

Terry J. Reedy
 
P

Peter Hansen

Duncan said:
BTW, does it matter at all in practical use that the base class of path
varies between str and unicode depending on the platform?

I haven't seen any problem. I confess I can't even imagine exactly what
the problem might be, since they're both subclasses of basestring,
aren't they?

And current code should have exactly the same issues when using str or
unicode in all the calls that path() merely wraps.

So does it matter in practical use when one faces this issue and is
*not* using "path"?

-Peter
 
P

Peter Hansen

Stefan said:
(It would be nice to get `path`(s) easily from a `file`, at the moment
there is only file.name if I'm not mistaken).

When files are opened through a "path" object -- e.g.
path('name').open() -- then file.name returns the path object that was
used to open it.

-Peter
 
P

Peter Hansen

Duncan said:
As I said elsewhere I haven't
used path for anything real, so I'm still finding surprises such as why
this doesn't do what I expect:


path(u'a/bc/d')

Just a note, though you probably know, that this is intended to be
written this way with path:
path(u'a/b/c/d')

-Peter
 
P

Peter Hansen

Reinhold said:
FYI: I modified the path module a bit so that it fits many of the suggestions
from python-dev, and put the result in the Python CVS tree under
nondist/sandbox/path.

By the way, thanks for doing this Reinhold!
Most prominent change is that it doesn't inherit from str/unicode anymore.
I found this distinction important, because as a str subclass the Path object
has many methods that don't make sense for it.

On this topic, has anyone ask the original author (Jason Orendorff)
whether he has some background on this decision that might benefit the
discussion? Given the elegance of the design of the path module, I
would think if he has an opinion on the matter it is probably based on
more thought than any of us have given it so far.

And maybe he would even say that it was a wrong decision at the time and
he'd do it differently the next time.

-Peter
 
O

Oliver Andrich

Hi,

2005/7/22 said:
What is this Java Path class? I have been STFWing and have found nothing
on it in the. Indeed if you search for "Java Path class" (with quotes)
almost half of the pages are this message from Guido. ;)

Any Java hackers here want to tell us of the wonders of the Java Path
class? I would be interested in seeing how other OO languages deal with
paths.

I guess the nearest Java class comparable with path is the File class.

http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html

And as I am a so called Java hacker, I highly appreciate path as a
module for my python projects and in my eyes it is the natural way to
address files/paths. At least it is more natural to me then the os,
os.path, etc. pp. bundle, that has grown over the time.

I would love to see path inside Python's stdlib.

Best regards,
Oliver
 
T

Terry Reedy

Michael Hoffman said:

Very interesting. Java's File class is a system-independent "abstract
representation of file and directory pathnames" which is constructed from
and converted to system-dependent string-form pathnames (including URL/URI
file:... forms). A File consist of an optional prefix and a *sequence* of
zero or more string names.

In other words, Java's File class is what Duncan and I thought Python's
Path class might or possibly should be. So this internal representation
might be worth considering as an option. Of course, if the interface is
done right, it mostly should not make too much difference to the user.

Terry J. Reedy
 
M

Michael Hoffman

Scott said:
Isn't it even worse than this?
On Win2K & XP, don't the file systems have something to do with the
encoding? So D: (a FAT drive) might naturally be str, while C:
(an NTFS drive) might naturally be unicode.

The current path module handles these situations at least as well as the
libraries that come with Python do. ;-)
 
M

Michael Hoffman

Peter said:
When files are opened through a "path" object -- e.g.
path('name').open() -- then file.name returns the path object that was
used to open it.

Also works if you use file(path('name')) or open(path('name')).
 
R

Reinhold Birkenfeld

Andrew said:
I disagree. I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, "open" and "mkdir" take strings as input. There is no
automatic coercion.

Well, as a Path object provides both open() and mkdir() functions, these use
cases are covered. And that's the point of the Path class: Every common use
you may have for a path is implemented as a method.

So, it's maybe a good thing that for uncommon uses you have to explicitly
"cast" the path to a string.
... def __getattr__(self, name):
... print "Want", repr(name)
... raise AttributeError, name
...
Traceback (most recent call last):

Traceback (most recent call last):


The solutions to this are:
1) make the path object be derived from str or unicode. Doing
this does not conflict with any OO design practice (eg, Liskov
substitution).

2) develop a new "I represent a filename" protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like


if isinstance(input, basestring):
input = open(input, "rU")
for line in input:
print line

I showed several places in the stdlib and in 3rd party packages
where this is used.

That's a valid point. However, if Path is not introduced as a string,
most new users will not try to use a Path instance where a string is
needed, just as you wouldn't try to pass a list where a string is wanted.
You are broadening the definition of a file path to include URIs?
That's making life more complicated. Eg, the rules for joining
file paths may be different than the rules for joining URIs.
Consider if I have a file named "mail:[email protected]" and I
join that with "file://home/dalke/badfiles/".

Additionally, the actions done on URIs are different than on file
paths. What should os.listdir("http://www.python.org/") do?

I agree. Path is only for local filesystem paths (well, in UNIX they could
as well be remote, but that's thanks to the abstraction the filesystem
provides, not Python).
As I mentioned, I tried some classes which emulated file
paths. One was something like

class TempDir:
"""removes the directory when the refcount goes to 0"""
def __init__(self):
self.filename = ... use a function from the tempfile module
def __del__(self):
if os.path.exists(self.filename):
shutil.rmtree(self.filename)
def __str__(self):
return self.filename

I could do

dirname = TempDir()

but then instead of

os.mkdir(dirname)
tmpfile = os.path.join(dirname, "blah.txt")

I needed to write it as

os.mkdir(str(dirname))
tmpfile = os.path.join(str(dirname), "blah.txt"))

or have two variables, one which could delete the
directory and the other for the name. I didn't think
that was good design.

I can't follow. That's clearly not a Path but a custom object of yours.
However, I would have done it differently: provide a "name" property
for the object, and don't call the variable "dirname", which is confusing.
If I had derived from str/unicode then things would
have been cleaner.

Please note, btw, that some filesystems are unicode
based and others are not. As I recall, one nice thing
about the path module is that it chooses the appropriate
base class at import time. My "str()" example above
does not and would fail on a Unicode filesystem aware
Python build.

There's no difference. The only points where the type of a Path
object' underlying string is decided are Path.cwd() and the
Path constructor.


Reinhold
 
M

Michael Hoffman

Reinhold said:
Well, as a Path object provides both open() and mkdir() functions, these use
cases are covered. And that's the point of the Path class: Every common use
you may have for a path is implemented as a method.

Except when you pass the path to a function written by someone else
So, it's maybe a good thing that for uncommon uses you have to explicitly
"cast" the path to a string.

Where uncommon uses include passing the path object to any code you
don't control? The stdlib can be fixed, this other stuff can't.
That's a valid point. However, if Path is not introduced as a string,
most new users will not try to use a Path instance where a string is
needed, just as you wouldn't try to pass a list where a string is wanted.

But many functions were written expecting lists as arguments but also
work for strings, and do not require an explicit list(mystring) before
calling the function.
 
M

Michael Hoffman

Peter said:
On this topic, has anyone ask the original author (Jason Orendorff)
whether he has some background on this decision that might benefit the
discussion?

My impression is that he doesn't have a lot of spare cycles for this. He
didn't have anything to add to the python-dev discussion when I informed
him of it. I'd love to hear what he had to say about the design.
 
J

John Machin

Daniel said:
Maybe it's nitpicking, but I don't think that a path object should be a
'sequence of path elements' in an iterator context.

This means that

for element in pathobject:

has no intuitive meaning for me, so it shouldn't be allowed.

Try this:

A file-system is a maze of twisty little passages, all alike. Junction
== directory. Cul-de-sac == file. Fortunately it is signposted. You are
dropped off at one of the entrance points ("current directory", say).
You are given a route (a "path") to your destination. The route consists
of a list of intermediate destinations.

for element in pathobject:
follow_sign_post_to(element)

Exception-handling strategy: Don't forget to pack a big ball of string.
Anecdotal evidence is that breadcrumbs are unreliable.

Cheers,
John
 
J

John Machin

Michael said:
I *have* used a path as a sequence of characters before. I had to deal
with a bunch of filenames that were formatted like "file_02832.a.txt"

Ya, ya , ya, only two days ago I had to lash up a quick script to find
all files matching r"\d{8,8}[A-Za-z]{0,3}$" and check that the expected
number were present in each sub-directory (don't ask!).

BUT you are NOT using "a path as a sequence of characters". Your
filename is a path consisting of one element. The *element* is an
instance of basestring, to which you can apply all the string methods
and the re module.
 
P

Peter Hansen

Michael said:
Also works if you use file(path('name')) or open(path('name')).

Since that's exactly what the path module does, it's not surprising.
Practically everything that path does, with a few useful exceptions, is
a thin wrapper around the existing calls. path.open, for example is
merely this:

def open(self, mode='r'):
return file(self, mode)

-Peter
 
G

George Sakkis

Andrew Dalke said:
How did you decide it's "has-a" vs. "is-a"?

All C calls use a "char *" for filenames and paths,
meaning the C model file for the filesystem says
paths are strings.

Bringing up how C models files (or anything else other than primitive types for that matter) is not
a particularly strong argument in a discussion on OO design ;-)
Paths as strings fit the Liskov substitution principle
in that any path object can be used any time a
string is used (eg, "loading from " + filename)

Liskov substitution principle imposes a rather weak constraint on when inheritance should not be
used, i.e. it is a necessary condition, but not sufficient. Take for example the case where a
PhoneNumber class is subclass of int. According to LSP, it is perfectly ok to add phone numbers
together, subtract them, etc, but the result, even if it's a valid phone number, just doesn't make
sense.
Good information hiding suggests that a better API
is one that requires less knowledge. I haven't
seen an example of how deriving from (unicode)
string makes things more complicated than not doing so.

I wouldn't say more complicated, but perhaps less intuitive in a few cases, e.g.:
path(r'C:\Documents and Settings\Guest\Local Settings').split()
['C:\\Documents', 'and', 'Settings\\Guest\\Local', 'Settings']
instead of
['C:', 'Documents and Settings', 'Guest', 'Local Settings']

I just noted that conceptually a path is a composite object consisting of many properties (dirname,
extension, etc.) and its string representation is just one of them. Still, I'm not suggesting that a
'pure' solution is better that a more practical that covers most usual cases.

George
 
M

Michael Hoffman

Peter said:
Practically everything that path does, with a few useful exceptions, is
a thin wrapper around the existing calls.

If the implementation is easy to explain, it may be a good idea.

OT: I just realized you can now type in "python -m this" at the command
line, which is convenient, but strange.
 
N

Neil Hodgson

Scott David Daniels:
Isn't it even worse than this?
On Win2K & XP, don't the file systems have something to do with the
encoding? So D: (a FAT drive) might naturally be str, while C:
(an NTFS drive) might naturally be unicode.

This is generally safe as Windows is using unicode internally and
provides full-fidelity access to the FAT drive using unicode strings.
You can produce failures if you try to create files with names that can
not be represented but you would see a similar failure with byte string
access.
> Even worse, would be a
path that switches in the middle (which it might do if we get to a
ZIP file or use the newer dir-in-file file systems.

If you are promoting from byte strings with a known encoding to
unicode path objects then this should always work.

Neil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,167
Latest member
SusanaSwan
Top