PEP on path module for standard library

Andrew Dalke · Jul 23, 2005

George said:
Bringing up how C models files (or anything else other than primitive types
for that matter) is not a particularly strong argument in a discussion on
OO design ;-)

While I have worked with C libraries which had a well-developed
OO-like interface, I take your point.

Still, I think that the C model of a file system should be a
good fit since after all C and Unix were developed hand-in-hand. If
there wasn't a good match then some of the C path APIs should be
confusing or complicated. Since I don't see that it suggests that
the "path is-a string" is at least reasonable.

Liskov substitution principle imposes a rather weak constraint

Agreed. I used that as an example of the direction I wanted to
go. What principles guide your intuition of what is a "is-a"
vs a "has-a"?

Take for example the case where a PhoneNumber class is subclass
of int. According to LSP, it is perfectly ok to add phone numbers
together, subtract them, etc, but the result, even if it's a valid
phone number, just doesn't make sense.

Mmm, I don't think an integer is a good model of a phone number.
For example, in the US
00148762040828
will ring a mobile number in Sweden while
148762040828
will give a "this isn't a valid phone number" message.

Yet both have the same base-10 representation. (I'm not using
a syntax where leading '0' indicates an octal number.

I wouldn't say more complicated, but perhaps less intuitive in a few cases, e.g.:

path(r'C:\Documents and Settings\Guest\Local Settings').split()

Click to expand...

['C:\\Documents', 'and', 'Settings\\Guest\\Local', 'Settings']
instead of
['C:', 'Documents and Settings', 'Guest', 'Local Settings']

That is why the path module using a different method to split
on pathsep vs. whitespace. I get what you are saying, I just think
it's roughly equivalent to appealing to LSP in terms of weight.

Mmm, then there's a question of the usefulness of ".lower()" and
".expandtabs()" and similar methods. Hmmm....

I just noted that conceptually a path is a composite object consisting of
many properties (dirname, extension, etc.) and its string representation
is just one of them. Still, I'm not suggesting that a 'pure' solution is
better that a more practical that covers most usual cases.

For some reason I think that

path.dirname()

is better than

path.dirname

Python has properties now so the implementation of the latter is
trivial - put a @property on the line before the "def dirname(self):".

I think that the string representation of a path is so important that
it *is* the path. The other things you call properties aren't quite
properties in my model of a path and are more like computable values.

I trust my intuition on this, I just don't know how to justify it, or
correct it if I'm wrong.

Andrew
(e-mail address removed)

George Sakkis · Jul 23, 2005

Andrew Dalke said:
[snipped]

Take for example the case where a PhoneNumber class is subclass
of int. According to LSP, it is perfectly ok to add phone numbers
together, subtract them, etc, but the result, even if it's a valid
phone number, just doesn't make sense.

Click to expand...

Mmm, I don't think an integer is a good model of a phone number.
For example, in the US
00148762040828
will ring a mobile number in Sweden while
148762040828
will give a "this isn't a valid phone number" message.

That's why phone numbers would be a subset of integers, i.e. not every integer would correspond to a
valid number, but with the exception of numbers starting with zeros, all valid numbers would be an
integers. Regardless, this was not my point; the point was that adding two phone numbers or
subtracting them never makes sense semantically.

[snipped]

I just noted that conceptually a path is a composite object consisting of
many properties (dirname, extension, etc.) and its string representation
is just one of them. Still, I'm not suggesting that a 'pure' solution is
better that a more practical that covers most usual cases.

Click to expand...

For some reason I think that

path.dirname()

is better than

path.dirname

Python has properties now so the implementation of the latter is
trivial - put a @property on the line before the "def dirname(self):".

Sorry, I used the term 'property' in the broad sense, as the whole exposed API, not the specific
python feature; I've no strong preference between path.dirname and path.dirname().

I think that the string representation of a path is so important that
it *is* the path.

There are (at least) two frequently used path string representations, the absolute and the relative
to the working directory. Which one *is* the path ? Depending on the application, one of them woud
be more natural choice than the other.

I trust my intuition on this, I just don't know how to justify it, or
correct it if I'm wrong.

My intuition also happens to support subclassing string, but for practical reasons rather than
conceptual.

George

Andrew Dalke · Jul 23, 2005

George said:
That's why phone numbers would be a subset of integers, i.e. not every
integer would correspond to a valid number, but with the exception of
numbers starting with zeros, all valid numbers would be an integers.

But it's that exception which violates the LSP.

With numbers, if x==y then (x,y) = (y,x) makes no difference.
If phone numbers are integers then 001... == 01... but swapping
those two numbers makes a difference. Hence they cannot be modeled
as integers.

Regardless, this was not my point; the point was that adding
two phone numbers or subtracting them never makes sense semantically.

I agree. But modeling them as integers doesn't make sense either.
Your example of adding phone numbers depends on them being represented
as integers. Since that representation doesn't work, it makes sense
that addition of phone number is suspect.

There are (at least) two frequently used path string representations,
the absolute and the relative to the working directory. Which one *is*
the path ? Depending on the application, one of them woud be more
natural choice than the other.

Both. I don't know why one is more natural than the other.

My intuition also happens to support subclassing string, but for
practical reasons rather than conceptual.

As you may have read elsewhere in this thread, I give some examples
of why subclassing from string fits best with existing code.

Even if there was no code base, I think deriving from string is the
right approach. I have a hard time figuring out why though. I think
if the lowest level Python/C interface used a "get the filename"
interface then perhaps it wouldn't make a difference. Which means
I'm also more guided by practical reasons than conceptual.

Andrew
(e-mail address removed)

paul · Jul 23, 2005

Michael said:
Reinhold said:

Probably as Terry said: a path is both a list and a string.

Click to expand...

[...]

One way to divide this is solely based on path separators:

['c:', 'windows', 'system32:altstream', 'test.dir',
'myfile.txt.zip:altstream']

I would argue that any proposed solution has to work with VMS
pathnames. ;-)

The current stdlib solution, os.path.splitext(os.path.splitext(filename)
[0])[0] is extremely clunky, and I have long desired something better.
(OK, using filename.split(os.extsep) works a little better, but you get
the idea.)

And also with unusual (eg. RISC OS) filename extensions.

To do any justice to the existing solutions, any PEP should review at
least the following projects:

* The path module (of course):
http://www.jorendorff.com/articles/python/path/

* The py.path module (or at least the ideas for it):
http://codespeak.net/py/current/doc/future.html

* itools.uri
http://www.ikaaro.org/itools

* Results from the "Object-Oriented File System Virtualisation"
project in the "Summer of Code" programme:
http://wiki.python.org/moin/SummerOfCode

And I hope that the latter project is reviewing some of the other work,
if only to avoid the "framework proliferation" that people keep
complaining about.

Paul

Peter Hansen · Jul 23, 2005

George said:
There are (at least) two frequently used path string representations,
> the absolute and the relative to the working directory. Which one
> *is* the path ? Depending on the application, one of them woud
be more natural choice than the other.

Sorry, George, but that's now how it works.

Whether using the regular string-based Python paths or the new path
module, a path *is* either absolute or relative, but cannot be both at
the same time.

This is therefore not an issue of "representation" but one of state.

-Peter

Duncan Booth · Jul 23, 2005

Peter said:
Just a note, though you probably know, that this is intended to be
written this way with path:

path(u'a/b/c/d')

I know, but it really doesn't look right to me.

I think that my fundamental problem with all of this is that by making path
a subclass of str/unicode it inherits inappropriate definitions of some
common operations, most obviously addition, iteration and subscripting.

These operations have obvious meaning for paths which is not the same as
the meaning for string. Therefore (in my opinion) the two ought to be
distinct.

Bengt Richter · Jul 23, 2005

Try this:

A file-system is a maze of twisty little passages, all alike. Junction
== directory. Cul-de-sac == file. Fortunately it is signposted. You are
dropped off at one of the entrance points ("current directory", say).
You are given a route (a "path") to your destination. The route consists
of a list of intermediate destinations.

for element in pathobject:
follow_sign_post_to(element)

Exception-handling strategy: Don't forget to pack a big ball of string.
Anecdotal evidence is that breadcrumbs are unreliable.

<indulging what="my penchant for seeking the general behind the specific ;-)" >

ISTM a path is essentially a representation of a script whose interpretation
by an orderly choice of interpreters finally leads to accessing to some entity,
typically a serial data representation, through an object, perhaps a local proxy,
that has standard methods for accessing the utimate object's desired info.

IOW, a path sequence is like a script text that has been .splitline()'d and
and the whole sequence fed to a local interpreter, which might chew through multiple
lines on its own, or might invoke interpreters on another network to deal with the
rest of the script, or might use local interpreters for various different kinds of
access (e.g., after seeing 'c:' vs 'http://' vs '/c' vs '//c' etc. on the platform
defining the interpretation of the head element).

Turning a single path string into a complete sequence of elements is not generally possible
unless you have local knowledge of the syntax of the entire tail beyond the the prefix
you have to deal with. Therefore, a local platform-dependent Pathobject class should, I think,
only recognize prefixes that it knows how to process or delegate processing for, leaving
the interpretation of the tail to the next Pathobject instance, however selected and/or
located.

So say (this is just a sketch, mind ;-)

po = Pathobject(<string representation of whole path>)

results in a po that splits out (perhaps by regex) a prefix, a first separator/delimiter,
and the remaining tail. E.g., in class Pathobject,
def __init__(self, pathstring=None)
if pathstring is None: #do useful default??
self.pathstring = pathstring
self.prefix, self.sep, self.tail = self.splitter(pathstring)
if self.prefix in self.registered_prefixes:
self.child = self.registered_prefixes[self.prefix](self.tail)
else:
self.child = []
self.opened_obj = None

Then the loop inside a local pathobject's open method po.open()
might go something like

def open(self, *mode, **kw):
if self.child:
self.opened_obj = self.child.open(self.tail, *mode, **kw)
else:
self.opened_obj = file(self.pathstring, *mode)
return self

And closing would just go to the immediately apparent opened object, and
if that had complex closing to do, it would be its responsibility to deal
with itself and its child-derived objects.

def close(self):
self.opened_object.close()

The point is that a given pathobject could produce a new or modified pathobject child
which might be parsing urls instead of windows file system path strings or could
yield an access object producing something entirely synthetic.

A synthetic capability could easily be introduced if the local element pathobject
instance looked for e.g., 'synthetic://' as a possible first element (prefix) string representation,
and then passed the tail to a subclass defining synthetic:// path interpretation.
E.g., 'synthetic://temp_free_diskspace' could be a platform-independent way to get such info as that.

Opening 'testdata:// ...' might be an interesting way to feed test suites, if pathobject subclasses
could be registered locally and found via the head element's string representation.'

One point from this is that a path string represents an ordered sequence of elements, but is heterogenous,
and therefore has potentially heterogenous syntax reflected in string tails with syntax that should be
interpreted differently from the prefix syntax.

Each successive element of a path string effectively requires an interpreter for that stage of access
pursuit, and the chain of processing may result in different path entities/objects/representations
on different systems, with different interpretations going on, sharing only that they are part of the
process of getting access to something and providing access services, if it's not a one-shot access.

This would also be a potential way to create access to a foreign file system in pure python if desired,
so long as there was a way of accessing the raw data to build on, e.g. a raw stuffit floppy, or a raw
hard disk if there's the required privileges. Also 'zip://' or 'bzip2://' could be defined
and registered by a particular script or in an automatic startup script. 'encrypted://' might be interesting.
Or if polluting the top namespace was a problem, a general serialized data access header element
might work, e.g., 'py_sda://encrypted/...'

This is very H[ot]OTTOMH (though it reflects some thoughts I've had before, so be kind ;-)

For compatibility with the current way of doing things, you might want to do an automatic open
in the Pathobject constructor, but I don't really like that. It's easy enough to tack on ".open()"

po = Pathobject('py_sda://encrypted/...')
po.open() # plain read_only text default file open, apparently, but encrypted does binary behind the scenes
print po.read()
po.close()

Say, how about

if Pathobject('gui://message_box/yn/continue processing?').open().read().lower()!='y':
raise SystemExit, "Ok, really not continuing ;-)"

An appropriate registered subclass for the given platform, returned when the
Pathobject base class instantiates and looks at the first element on open() and delegates
would make that possible, and spelled platform-independently as in the code above.

</indulging>

Regards,
Bengt Richter

Daniel Dittmar · Jul 25, 2005

John said:
However, a path as a sequence of characters has even less
meaning - I can't think of a use, while I have an application

That's true. But the arguments for path objects as strings go more in
the direction of using existing functions that expect strings.

where traversing a path as a sequence of path elements makes
perfect sense: I need to descend the directory structure, directory
by directory, looking for specific files and types.

But then your loop doesn't need the individual path elements, but rather
sub-path objects

for p in pathobj.stepdown ('/usr/local/bin'):
if p.join (searchedFile):
whatever

I'm not saying that there isn't any use for having a list of path
elements. But it isn't that common, so it should get an methodname to
make it more explicit.

Daniel

Daniel Dittmar · Jul 25, 2005

Terry said:
for dir in pathobject:
if isdir(dir): cd(dir)

*is*, in essence, what the OS mainly does with paths (after splitting the
string representation into pieces).

That's why there is rarely a need to to it in Python code.

Directory walks also work with paths as sequences (stacks, in particular).

I'd say it works with stacks of pathes, not with stacks of path elements.

I'm not saying that there isn't any use for having a list of path
elements. But it isn't that common, so it should get an methodname to
make it more explicit.

Daniel

Ron Adam · Jul 30, 2005

Bengt Richter wrote:

<indulging what="my penchant for seeking the general behind the specific ;-)" >

Say, how about

if Pathobject('gui://message_box/yn/continue processing?').open().read().lower()!='y':
raise SystemExit, "Ok, really not continuing ;-)"

An appropriate registered subclass for the given platform, returned when the
Pathobject base class instantiates and looks at the first element on open() and delegates
would make that possible, and spelled platform-independently as in the code above.

I like it. ;-)

No reason why a path can't be associated to any tree based object.

</indulging>

Regards,
Bengt Richter

<more indulging>

I wasn't sure what to comment on, but it does point out some interesting
possibilities I think.

A path could be associated to any file-like object in an external (or
internal) tree structure. I don't see any reason why not.

In the case of an internal file-like object, it could be a series of
keys in nested dictionaries. Why not use a path as a dictionary
interface?

So it sort of raises the question of how tightly a path object should be
associated to a data structure? When and where should the path object
determine what the final path form should be? And how smart should it
be as a potential file-like object?

Currently the device name is considered part of the path, but if instead
you treat the device as an object, it could open up more options.

(Which would extend the pattern of your example above. I think.)

(also a sketch.. so something like...)

# Initiate a device path object.
apath = device('C:').path(initial_path)

# Use it to get and put data
afile = apath.open(mode,position,units) # defaults ('r','line',next)
aline = afile.read().next() # read next unit, or as an iterator.
afile.write(line)
afile.close()

# Manually manipulate the path
apath.append('something') # add to end of path
apath.remove() # remove end of path
alist = apath.split() # convert to a list
apath.join(alist) # convert list to a path
astring = str(apath()) # get string from path
apath('astring') # set path to string
apath.validate() # make sure it's valid

# Iterate and navigate the path
apath.next() # iterate path objects
apath.next(search_string) # iterate with search string
apath.previous() # go back
apath.enter() # enter directory
apath.exit() # exit directory

# Close it when done.
apath.close()

etc...

With this you can iterate a file system as well as it's files. ;-)

(Add more or less methods as needed of course.)

apath = device(dev_obj).path(some_path_sting)
apath.open().write(data).close()

or if you like...

device(dev_obj).append(path_sting).open().write(data).close()

Just a few thoughts,

Cheers,
Ron

qvx · Aug 1, 2005

There is a thing called "Asynchronous pluggable protocol". It is
Microsoft's technology (don't flame me now):

"""
Asynchronous pluggable protocols enable developers to create pluggable
protocol handlers, MIME filters, and namespace handlers that work with
Microsoft® Internet Explorer...

Applications can use pluggable protocol handlers to handle a custom
Uniform Resource Locator (URL) protocol scheme or filter data for a
designated MIME type.
"""

In other words you can develop you own plugin which would allow
Internet Explorer to open URLs like "rar://c/my/doc/book.rar". (I was
going to write plugin for .rar in order to enable offsite browsing of
downloaded portions of web sites, all from an archive file).

You could give it a look. If only to see that it is Mycrosofthonic:

http://msdn.microsoft.com/workshop/networking/pluggable/overview/aplugprot_overviews_entry.asp.

Qvx

PEP for module naming conventions	3	Mar 11, 2011
Modules for inclusion in standard library?	52	Jun 27, 2005
Procedure to request adding a module to the standard library - orinitiating a vote on it	6	Aug 7, 2012
Upgrading standard library module	6	Feb 13, 2009
pre-PEP: Standard Microthreading Pattern	4	May 1, 2007
PEP 3107 Function Annotations for review and comment	4	Dec 29, 2006
[path-PEP] Path inherits from basestring again	34	Jul 23, 2005
python-dev Summary for 2005-06-16 through 2005-06-30	2	Jul 9, 2005

PEP on path module for standard library

Andrew Dalke

George Sakkis

Andrew Dalke

paul

Peter Hansen

Duncan Booth

Bengt Richter

Daniel Dittmar

Daniel Dittmar

Ron Adam

qvx

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads