PEP on path module for standard library

D

Duncan Booth

George said:
Or rather "I prefer a single existing mediocre solution than two
solutions (even if the second was better)".
Except that he is open to persuasion, so the PEP has to demonstrate that
the duplication is worth the benefit.

Personally I think the concept of a specific path type is a good one, but
subclassing string just cries out to me as the wrong thing to do. In other
words, to me a path represents something in a filesystem, the fact that it
has one, or indeed several string representations does not mean that the
path itself is simply a more specific type of string.

You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.

It may even be that we need a hierarchy of path classes: URLs need similar
but not identical manipulations to file paths, so if we want to address the
failings of os.path perhaps we should also look at the failings of urlparse
at the same time.
 
M

Michael Hoffman

Duncan said:
You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.

Egad. I'm not sure if that will really make people's lives easier.
 
T

Thomas Heller

I really love Jason's 'path' module. Sadly, I've encountered a serious
problem with using it. When you try to 'freeze' an application module,
and Jason's 'path' module is present in any of the directories that are
looked at by freeze's module finder (your app doesn't have to import
it), freeze goes into an infinite loop of imports, eventually getting a
'maximum recursion depth' exception. This seems to be related to
freeze getting confused between 'os.path' and Jason's 'path'.

I encountered this using Jason's latest 'path' module and Python 2.3.2.
I was able to solve it for my use by renaming path.py to newpath.py
and using 'from newpath import path' in my modules.

I've just notified Jason about this. I presume a solution like mine
will be used, and look forward to seeing Jason's module in stdlib.

That was a bug in modulefinder, which was fixed in Python 2.4 and 2.3.4.
See http://www.python.org/sf/876278.

Thomas
 
H

Harald Armin Massa

When you try to 'freeze' an application module,
and Jason's 'path' module is present in any of the directories that are
looked at by freeze's module finder (your app doesn't have to import
it), freeze goes into an infinite loop of imports, eventually getting a
'maximum recursion depth' exception. This seems to be related to
freeze getting confused between 'os.path' and Jason's 'path'.

This is a bug in distutils. Thomas Hellers py2exe encounters the same
bug. As much as I remember our conversation, he submitted a patch to
distutils.

In the meanwhile I renamed path.py to jpath.py, usings Jason's first
letter in a motion of honour while circumventing this bug.

Harald
 
H

Harald Armin Massa

Having path descend from str/unicode is extremely useful since I can
then pass a path object to any function someone else wrote without
having to worry about whether they were checking for basestring.
I use path.py from Jason to encapsulate a lot of the windows plattform
specialities of path dealing.
Being able to use path-opjects at every place where I would use str or
unicode is very essential, because I often use Python to tame Excel and
Word. To open files within these programms needs some "plain str" as
"PATH" for the file. (which, of course, can also be down by ways to
"convert" PATH to STRING.

Harald
 
P

Peter Hansen

Reinhold said:
It made sense to me at the time I changed this, although at the moment
I can't exactly recall the reasons.

Along with some of the others (and as a fairly heavy user of "path"), I
would caution strongly against jumping to do make this change.

Given that a strong part of the justification for path's inclusion in
the standard library (as expressed here in the past) is that it is
stable and widely used, making such a fundamental change at this late
stage just prior to its possible acceptance seems to me very risky.

I have noticed in a number of cases where a "path" was usable as a
drop-in replacement for strings that previously contained paths. I
can't say for sure, but I strongly suspect some of that could would be
broken if "paths" weren't basestrings. I'll attempt to check in my own
code.

Even if those uses don't actually break, it can *also* be useful to have
the string methods available on a path object, so I'm also uncertain
what you gain by removing that connection, though it's clear what things
you might be losing.

-2 for this idea.

-Peter
 
P

Peter Hansen

Duncan said:
Personally I think the concept of a specific path type is a good one, but
subclassing string just cries out to me as the wrong thing to do. In other
words, to me a path represents something in a filesystem, the fact that it
has one, or indeed several string representations does not mean that the
path itself is simply a more specific type of string.

You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.

Duncan, are you another formerly non-user of path who has this opinion,
or have you already attempted to use path extensively in your code?

I'm not saying I dismiss the opinions of those who haven't actually
tried working with a string-based path object, but it's worth
considering that you might adopt a different opinion after using it for
a while.

I did.

-Peter
 
G

George Sakkis

Duncan Booth said:
Personally I think the concept of a specific path type is a good one, but
subclassing string just cries out to me as the wrong thing to do. In other
words, to me a path represents something in a filesystem, the fact that it
has one, or indeed several string representations does not mean that the
path itself is simply a more specific type of string.

You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.

First off, I find this is a relatively small detail overall, that is, regardless of whether path
subclasses string or not, its addition in the standard library will be a step forward. Havind said
that, I think the choice between subclassing or not is going to be a practicality-vs-purity
decision. You're right, conceptually a path HAS_A string description, not IS_A string, so from a
pure OO point of view, it should not inherit string. OTOH, people in favor of the subclassing point
out the convenience for many (or most) common cases. It's a tradeoff, so arguments for both cases
should be discussed.

George
 
G

George Sakkis

Duncan Booth said:
Personally I think the concept of a specific path type is a good one, but
subclassing string just cries out to me as the wrong thing to do. In other
words, to me a path represents something in a filesystem, the fact that it
has one, or indeed several string representations does not mean that the
path itself is simply a more specific type of string.

You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.

First off, I find this is a relatively small detail overall, that is,
regardless of whether path subclasses string or not, its addition in
the standard library will be a step forward. Havind said that, I think
the choice between subclassing or not is going to be a
practicality-vs-purity decision. You're right, conceptually a path
HAS_A string description, not IS_A string, so from a pure OO point of
view, it should not inherit string. OTOH, people in favor of the
subclassing point out the convenience for many (or most) common cases.
It's a tradeoff, so arguments for both cases should be discussed.

George
 
J

John Roth

Duncan Booth said:
George Sakkis wrote:


You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.

It may even be that we need a hierarchy of path classes: URLs need similar
but not identical manipulations to file paths, so if we want to address
the
failings of os.path perhaps we should also look at the failings of
urlparse
at the same time.

You have to start somewhere. One of the lessons that's beginning
to seep into people's minds is that getting something that works
out there is almost always preferable to (over) design by committee.

How to do a comprehensive, covers all the corner cases file
system object (or object hierarchy, etc) has been discussed before,
and nothing has ever come of it. Starting with an object that
actually does something some people want gives the designers a
chance to look at things in the wild.

John Roth
 
S

Stefan Rank

Duncan Booth said:
Personally I think the concept of a specific path type is a good one, but
subclassing string just cries out to me as the wrong thing to do. In other
words, to me a path represents something in a filesystem, the fact that it
has one, or indeed several string representations does not mean that the
path itself is simply a more specific type of string.
[snip]
practicality-vs-purity decision. You're right, conceptually a path
HAS_A string description, not IS_A string, so from a pure OO point of
view, it should not inherit string.

Java has `File` which mixes the concepts "an object in the filesystem"
and "a structured locator for such objects (in a hierarchical fs) that
might or might not correspond to an object that is actually there".

`file` and `path` separate that. I think this is very reasonable.

(It would be nice to get `path`(s) easily from a `file`, at the moment
there is only file.name if I'm not mistaken).

And a `path`, to me, actually IS_A string (unicode string) that happens
to have special behaviour (such as os dependent quirks like a
pathseparator that automatically get translated, comparable to '\n' used
internally in strings translated to '\n'|'\r\n')

stefan
 
M

Michael Hoffman

George said:
Havind said that, I think
the choice between subclassing or not is going to be a
practicality-vs-purity decision. You're right, conceptually a path
HAS_A string description, not IS_A string, so from a pure OO point of
view, it should not inherit string. OTOH, people in favor of the
subclassing point out the convenience for many (or most) common cases.

It would be an entirely different matter if we were designing a language
from scratch. But we have to deal with an existing codebase that expects
strings.

Here's some code I just wrote seconds ago to construct a path for a scp
upload:

"""
DST_DIRPATH = path("host:~/destination")
RSS_EXT = "rss"

dst_filenamebase = os.extsep.join([postcode.lower(), RSS_EXT])
dst_filepath = DST_DIRPATH.joinpath(dst_filenamebase)
"""

With the current path implementation, this Just Works. If I were using
something that parsed and understood paths, the scp/rcp convention of
host:filename would either cause an error or have to be programmed in
separately. The current implementation is much more flexible.

What are the practical advantages and conveniences of *not* subclassing
from basestring?
 
D

Duncan Booth

Peter said:
Duncan, are you another formerly non-user of path who has this opinion,
or have you already attempted to use path extensively in your code?

I'm a currently non-user of path who would probably use it if it were in
the standard library but so far have been satisfied to use os.path.
I'm not saying I dismiss the opinions of those who haven't actually
tried working with a string-based path object, but it's worth
considering that you might adopt a different opinion after using it for
a while.

I fully accept that. My point is simply that as a non-user, it sounds to me
as though subclassing string is the wrong approach. I would have expected a
path object to be a sequence of path elements rather than a sequence of
characters. This is basically just a gut feeling though, so I'm perfectly
happy to be told that I'm wrong.

BTW, does it matter at all in practical use that the base class of path
varies between str and unicode depending on the platform?

John said:
You have to start somewhere. One of the lessons that's beginning
to seep into people's minds is that getting something that works
out there is almost always preferable to (over) design by committee.

Dead right, but once it goes into the standard library it has to pretty
well stop evolving, so it needs to be right, or as close as possible before
that happens.
 
D

Daniel Dittmar

Duncan said:
I would have expected a
path object to be a sequence of path elements rather than a sequence of
characters.

Maybe it's nitpicking, but I don't think that a path object should be a
'sequence of path elements' in an iterator context.

This means that

for element in pathobject:

has no intuitive meaning for me, so it shouldn't be allowed.

Daniel
 
J

John Roth

Duncan Booth said:
Dead right, but once it goes into the standard library it has to pretty
well stop evolving, so it needs to be right, or as close as possible
before
that happens.

It has to stop evolving in incompatible directions, at least. Although
there is a precident with the process functions, classes, module,
whatever it is. It's up to five versions now, isn't it?

AFAICT, from a very broad brush perspective, there is really
only one substantive issue: how to handle multiple path-like
"things". URLs have been mentioned in this thread, different
file systems and a possible in-memory file system have been
mentioned in other threads.

So whatever gets out there first shouldn't preempt the ability
to eventually fit into a wider structure without substantial
and incompatible modifications.

John Roth
 
J

John Roth

Daniel Dittmar said:
Maybe it's nitpicking, but I don't think that a path object should be a
'sequence of path elements' in an iterator context.

This means that

for element in pathobject:

has no intuitive meaning for me, so it shouldn't be allowed.

However, a path as a sequence of characters has even less
meaning - I can't think of a use, while I have an application
where traversing a path as a sequence of path elements makes
perfect sense: I need to descend the directory structure, directory
by directory, looking for specific files and types.

John Roth
 
D

Duncan Booth

Michael said:
Here's some code I just wrote seconds ago to construct a path for a
scp upload:

"""
DST_DIRPATH = path("host:~/destination")
RSS_EXT = "rss"

dst_filenamebase = os.extsep.join([postcode.lower(), RSS_EXT])
dst_filepath = DST_DIRPATH.joinpath(dst_filenamebase)
"""

With the current path implementation, this Just Works.

It isn't at all obvious to me that it works:
import os
from path import path
postcode = "AA2 9ZZ"
DST_DIRPATH = path("host:~/destination")
RSS_EXT = "rss"

dst_filenamebase = os.extsep.join([postcode.lower(), RSS_EXT])
dst_filepath = DST_DIRPATH.joinpath(dst_filenamebase)
print dst_filepath
host:~/destination\aa2 9zz.rss

If I were using
something that parsed and understood paths, the scp/rcp convention of
host:filename would either cause an error or have to be programmed in
separately. The current implementation is much more flexible.

You still have to program your scp path separately from your filesystem
path in order to handle the different conventions for path separator
characters and maybe also escaping special characters in the path (I don't
use scp much so I don't know if this is required).
What are the practical advantages and conveniences of *not*
subclassing from basestring?

Simplification of the api: not having methods such as center, expandtabs
and zfill.

Not having the base class change from str to unicode depending on which
system you run your code?

Fewer undetected bugs (explicit is better than implicit)?

Perhaps none of these matter in practice. As I said elsewhere I haven't
used path for anything real, so I'm still finding surprises such as why
this doesn't do what I expect:
path(u'a/bc/d')

If path didn't subclass string then either this would have been
implemented, and probably would Do The Right Thing, or it wouldn't be
implemented so I'd quickly realise I needed to do something else. Instead
it does something suprising.
 
A

Andrew Dalke

Duncan said:
Personally I think the concept of a specific path type is a good one, but
subclassing string just cries out to me as the wrong thing to do.

I disagree. I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, "open" and "mkdir" take strings as input. There is no
automatic coercion.
.... def __getattr__(self, name):
.... print "Want", repr(name)
.... raise AttributeError, name
.... Traceback (most recent call last):
Traceback (most recent call last):

The solutions to this are:
1) make the path object be derived from str or unicode. Doing
this does not conflict with any OO design practice (eg, Liskov
substitution).

2) develop a new "I represent a filename" protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like


if isinstance(input, basestring):
input = open(input, "rU")
for line in input:
print line

I showed several places in the stdlib and in 3rd party packages
where this is used.

In other words, to me a path represents something in a filesystem,

Being picky - or something that could be in a filesystem.
the fact that it
has one, or indeed several string representations does not mean that the
path itself is simply a more specific type of string.

I didn't follow this.
You should need an explicit call to convert a path to a string and that
forces you when passing the path to something that requires a string to
think whether you wanted the string relative, absolute, UNC, uri etc.

You are broadening the definition of a file path to include URIs?
That's making life more complicated. Eg, the rules for joining
file paths may be different than the rules for joining URIs.
Consider if I have a file named "mail:[email protected]" and I
join that with "file://home/dalke/badfiles/".

Additionally, the actions done on URIs are different than on file
paths. What should os.listdir("http://www.python.org/") do?

As I mentioned, I tried some classes which emulated file
paths. One was something like

class TempDir:
"""removes the directory when the refcount goes to 0"""
def __init__(self):
self.filename = ... use a function from the tempfile module
def __del__(self):
if os.path.exists(self.filename):
shutil.rmtree(self.filename)
def __str__(self):
return self.filename

I could do

dirname = TempDir()

but then instead of

os.mkdir(dirname)
tmpfile = os.path.join(dirname, "blah.txt")

I needed to write it as

os.mkdir(str(dirname))
tmpfile = os.path.join(str(dirname), "blah.txt"))

or have two variables, one which could delete the
directory and the other for the name. I didn't think
that was good design.


If I had derived from str/unicode then things would
have been cleaner.

Please note, btw, that some filesystems are unicode
based and others are not. As I recall, one nice thing
about the path module is that it chooses the appropriate
base class at import time. My "str()" example above
does not and would fail on a Unicode filesystem aware
Python build.
It may even be that we need a hierarchy of path
classes: URLs need similar but not identical manipulations
to file paths, so if we want to address the failings
of os.path perhaps we should also look at the failings
of urlparse at the same time.

I've found that hierarchies are rarely useful compared
to the number of times they are proposed and used. One
of the joys to me of Python is its deemphasis of class
hierarchies.

I think the same is true here. File paths and URIs are
sufficiently different that there are only a few bits
of commonality between them. Consider 'split' which
for files creates (dirname, filename) while for urls
it creates (scheme, netloc, path, query, fragment)

Andrew
(e-mail address removed)
 
A

Andrew Dalke

George said:
You're right, conceptually a path
HAS_A string description, not IS_A string, so from a pure OO point of
view, it should not inherit string.

How did you decide it's "has-a" vs. "is-a"?

All C calls use a "char *" for filenames and paths,
meaning the C model file for the filesystem says
paths are strings.

Paths as strings fit the Liskov substitution principle
in that any path object can be used any time a
string is used (eg, "loading from " + filename)

Good information hiding suggests that a better API
is one that requires less knowledge. I haven't
seen an example of how deriving from (unicode)
string makes things more complicated than not doing so.

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,481
Members
44,900
Latest member
Nell636132

Latest Threads

Top