Re: path module

Discussion in 'Python' started by Ian Bicking, Jul 8, 2003.

  1. Ian Bicking

    Ian Bicking Guest

    On Tue, 2003-07-08 at 04:32, holger krekel wrote:
    > I agree that something like Jason Orendorff's path module should go into
    > the standard library. I've coded a similar module and i think that
    > a discussion about certain design decisions would probably improve our
    > approaches.
    >
    > For example Jason lets the "path" object inherit from "str" (or unicode)
    > but i think it's better to provide a "__str__" method so that you can say
    >
    > str(pathinstance).endswith('.py')
    >
    > and *not* base the path object on str/unicode.
    >
    > unicode(pathinstance)
    >
    > would just fail if your platform doesn't support this. First, i tried
    > the inheritance approach, btw, but it is ambigous (e.g. for the
    > join-method (str.join and os.path.join).


    I'm starting to think the same thing. Not so much because of join, but
    because it doesn't actually offer many advantages. Many methods that
    look for a filename will be using "type(arg) is type('')", so you'd have
    to pass a real string object in anyway -- and people who say "but you
    should use isinstance(arg, str)" are obviously forgetting that you
    couldn't do this not very long ago, and lots of code uses type
    comparison at this point.

    > Also, my module provides most of the os.path.* methods as "filters" so
    > you can say
    >
    > dirs = filter(isdir, list_obj_pathobjects)
    > fnames = filter(AND(nolink, isfile), list_obj_pathobjects)
    >
    > in addition to
    >
    > pathobject.isfile()
    > etc.


    That's not necessary with list comprehension, since you can just do:

    [p for p in list_obj_pathobjects if p.isdir()]

    > Recently, i also did some experimentation with "virtual-fs" features so
    > that you can transparently access http/ftp/svn files/directories. I even
    > got that to work with "<tab>-completion" but that was quite a hack :)
    >
    > I am pretty sure that virtual-fs-like-extensibility would be a big
    > "selling" point and would motivate the use of such a module and
    > finally the inclusion into the stdlib. Of course, the local-fs should
    > be the convenient case but it shouldn't be hard to use the same methods
    > for accessing remote "repositories".


    Yes, virtual filesystems are certainly an important idea here. Almost
    makes me wonder if path() should also recognize URLs by default...
    probably not, as that isn't always desired, and a URL is going to create
    a significantly different object than a mere filesystem path, even
    though its interface will be very similar.

    Ian
     
    Ian Bicking, Jul 8, 2003
    #1
    1. Advertising

  2. I wrote the 'path' module at:
    http://www.jorendorff.com/articles/python/path

    There was some discussion on it here:
    http://groups.google.com/groups?th=42ab4db337b60ce3

    Just a few comments:

    Ian and Holger wondered why 'path' should subclass 'str'. It's because
    a path is a string. Benefit: you can pass 'path' objects to functions
    that expect strings (like functions in 'win32file'). I find this
    really useful in practice.

    I agree with Just that 'path' shouldn't override '__iter__()'. I'll
    change this eventually.

    I think Just is the first to argue that 'path / filename' is confusing.
    I find it intuitive. Other people have chosen / for this purpose,
    independently: see the O'Reilly book _Python Cookbook_ [1], recipe 4.17,
    and the Boost path object [2].

    I do believe 'path' should be in the standard library (if not builtin).
    I enjoy it and I use it all the time. My perception is that the Python
    core dev team doesn't see any particular need for it. If anyone wants me
    to, I'll write the PEP.

    for f in path('/music').walkfiles('*.mp3'):
    play_mp3(f)

    Cheers,
    Jason

    [1] http://safari.oreilly.com/?xmlid=0-596-00167-3
    [2] http://www.boost.org/libs/filesystem/doc/path.htm#operator_slash
     
    Jason Orendorff, Jul 21, 2003
    #2
    1. Advertising

  3. Jason Orendorff wrote:
    > I wrote the 'path' module at:
    > http://www.jorendorff.com/articles/python/path
    >
    > There was some discussion on it here:
    > http://groups.google.com/groups?th=42ab4db337b60ce3
    >
    > Just a few comments:
    >
    > Ian and Holger wondered why 'path' should subclass 'str'. It's because
    > a path is a string. Benefit: you can pass 'path' objects to functions
    > that expect strings (like functions in 'win32file'). I find this
    > really useful in practice.


    IMO you'll almost never use the following string-methods on a 'Path' object:

    capitalize center count decode encode
    expandtabs find index isalnum isalpha isdigit
    islower isspace istitle isupper
    ljust lstrip rjust splitlines startswith
    swapcase title translate zfill

    and so these methods pollute a Path object's name-space quite a bit.
    Also 'join', '__contains__', startswith etc. produce some ambigouity.

    I think it's convenient enough to use "str(path)" if passing a 'path'
    instance as a string somewhere.

    cheers,

    holger
     
    holger krekel, Jul 25, 2003
    #3
  4. Ian Bicking

    Ian Bicking Guest

    On Fri, 2003-07-25 at 11:41, holger krekel wrote:
    > Yes, i think adding platform specific methods to a Path object makes sense.
    > A friend and me started working on (local and subversion) Path
    > implementations last week. Currently a Path instance provides
    > these "path-taking" methods
    >
    > open
    > read
    > write
    > visit (a recursive walker)
    > listdir
    > stat
    > load/save (unpickle/pickle object)
    > setmtime (set modification time, uses os.utime)


    I like read and write too -- I do:

    f = open(filename)
    contents = f.read()
    f.close()

    All the time (when I'm uninterested in streaming or performance, which
    is most of the time I deal with files). Or just open(filename).read()
    and let garbage collection fix it up, even if it seems a little messy.
    A single method to encapsulate that would be nice, and of course write
    gives symmetry. Hmmm... Jason's distinguishes bytes (binary) and text
    (which is potentially encoded). I kind of like that distinction.

    Jason had walkers both for all files, just non-directory files, and
    directory files. This seems useful to me, and by making it explicit I
    might just start distinguishing text from binary (which I don't now
    because I am forgetful). And a globbing walker, though I don't know how
    much of an advantage that would be over list comprehension. Actually,
    all his walkers have a globbing option.

    > apart from all the os.path.* stuff like 'exists', 'dirname' etc.
    > Providing these "path-taking" methods on the Path object is very important
    > because otherwise you'll have to convert back and fro for using those
    > os.* and os.path.* or builtin methods (which is evil).


    dirname is a good name, since it should return a path object, not a
    "name" (which to me implies a string). I think Jason's module uses a
    parent attribute, though it also supports dirname(), and a name
    attribute instead of basename() (though that does not return a path
    object). And things like dirname make less sense in some non-path
    situations, like a URL. Probably not too much renaming should occur,
    but at least a little may be appropriate.

    Ian
     
    Ian Bicking, Jul 25, 2003
    #4
  5. Ian Bicking

    Ian Bicking Guest

    On Mon, 2003-07-21 at 13:16, Jason Orendorff wrote:
    > Ian and Holger wondered why 'path' should subclass 'str'. It's because
    > a path is a string. Benefit: you can pass 'path' objects to functions
    > that expect strings (like functions in 'win32file'). I find this
    > really useful in practice.


    I feel like this would lead to some annoying behavior in some
    circumstances. Most particularly, I'm thinking of:

    def dosomething(file):
    if type(file) is type(""):
    file = open(file)
    ...

    This isn't uncommon in functions that take pathnames or file objects.
    While isinstance(path, str) works, it was not an option until 2.2. So
    you'd be forced to do str(pathname) sometimes anyway, to deal with this.

    Ideally, interfaces would be changed to use a .open() method on the path
    instead of opening the string representation (as Holger's implementation
    does), so in the long term it would be nice to abandon direct string
    representations entirely. It would also make it more clear when you had
    a real path object and when you just had a string.

    Ian
     
    Ian Bicking, Jul 25, 2003
    #5
  6. Ian Bicking wrote:
    > Jason had walkers both for all files, just non-directory files, and
    > directory files. This seems useful to me, and by making it explicit I
    > might just start distinguishing text from binary (which I don't now
    > because I am forgetful). And a globbing walker, though I don't know how
    > much of an advantage that would be over list comprehension. Actually,
    > all his walkers have a globbing option.


    We currently only have one 'visit' method that accepts a filter for returning
    results and a filter for recursing into the tree. You can use and
    combine multiple filters like so:

    root = Path('...)
    for path in root.visit(AND(isdir, nolink)):
    # iterates over all non-link dirs in the tree (breadth-first)

    or

    for path in root.visit(AND(isfile, endswith('.txt')), nodotfile):
    # iterates over all '*.txt' files but not recursing into ".*"

    and so on. This proved to be flexible and convenient and mostly avoids
    the need for multiple walk-methods.

    cheers,

    holger
     
    holger krekel, Jul 25, 2003
    #6
  7. Ian Bicking wrote:
    > On Fri, 2003-07-25 at 13:33, holger krekel wrote:
    > > We currently only have one 'visit' method that accepts a filter for returning
    > > results and a filter for recursing into the tree.
    > > ...
    > > This proved to be flexible and convenient and mostly avoids
    > > the need for multiple walk-methods.

    >
    > Yeah... but we know that's not going to get into the standard library.
    > It requires a big namespace, logic functions (AND, OR, etc.), and it
    > confuses functions with these filter objects, which are named the same
    > (and even if the filter objects can be used as functions, it's still
    > confusing). It's a style that doesn't exist in the standard library,
    > and it seems unlikely that it would get in here.


    Maybe right. This is not my first priority, anyway, but i also thought that
    functional style is just not liked among the builtins.

    Anyway, the "filter functions" are indeed just callables which accept
    Path objects. You could as well take the unbound method Path.isdir
    but this feels ugly and isn't flexible enough.

    I don't exactly know what you mean by "big namespace". The filters are
    all contained in a 'filter' submodule because they can apply to
    multiple Path implementations anyway.

    > The multiple walk methods would only be a shortcut anyway. Again, they
    > might be difficult in a situation like a URL where directory and file
    > are intermingled (and maybe ReiserFS 4...?) -- which maybe is okay, a
    > urlpath object simply wouldn't implement that walker.


    Yep, URL pathes have no notion of directories and files. Thus a general
    URL path can't have a 'listdir' method and thus we can't recurse.
    You can easily special case it for Apache's "Indexes" view, though :)

    holger
     
    holger krekel, Jul 25, 2003
    #7
  8. holger krekel wrote:
    > IMO you'll almost never use the following string-methods on a 'Path' object:
    > capitalize center count decode encode [...]
    > and so these methods pollute a Path object's name-space quite a bit.
    > Also 'join', '__contains__', startswith etc. produce some ambigouity.


    I'm not worried about "namespace pollution", but you're right that
    strings and paths are generally used for different things. I also
    agree 'join()' is a wart.

    > I think it's convenient enough to use "str(path)" if passing a 'path'
    > instance as a string somewhere.


    Hmmm. If the plan were to convert the whole standard library to accept
    path objects for pathnames, I would likely agree. But when you say
    "str(p)" is "convenient enough", you're saying I need this rule in my head:

    Don't pass path objects to functions that take path arguments.
    Pass string objects instead.

    This is a type rule. Such a thing has no place in Python.

    Furthermore, this rule is counterlogical! I would have to change
    "mimetypes.guess_type(mypath)" to "mimetypes.guess_type(str(mypath))".

    -- j
     
    Jason Orendorff, Jul 25, 2003
    #8
  9. Ian Bicking

    Andrew Dalke Guest

    holger krekel
    > We currently only have one 'visit' method that accepts a filter for

    returning
    > results and a filter for recursing into the tree.


    > for path in root.visit(AND(isdir, nolink)):


    > for path in root.visit(AND(isfile, endswith('.txt')), nodotfile):


    I've used the AND trick before, as well as tricks to support "isdir &&
    nolink".
    Still, as these things get more complicated, its easier to just do

    for path in root.visit(lambda name: isfile(name) and name.endswith(".txt"))
    -or-
    def myfilter(name):
    return isfile(name) and name.endswith(".txt")
    for path in root.visit(myfilter):

    rather than use an prefix-style function interface.

    This doesn't introduce any new programming styles, which makes it
    easier to understand.

    The exception is if the result builds up some sort of parse tree which
    can be further analyzed for performance, which is not the case here.

    Andrew
     
    Andrew Dalke, Jul 26, 2003
    #9
  10. Ian Bicking

    Ian Bicking Guest

    On Fri, 2003-07-25 at 20:10, Van Gale wrote:
    > Interesting, I started a project modifying Jason's Path module to work
    > on subversion trees as well. I didn't get too far before putting the
    > project on a back-burner so I'm glad to hear someone else is thinking
    > the same way :)
    >
    > My extensions to Path included an additional argument to "open" that
    > included a version number, and a mechanism for retrieving some kind of
    > "metadata" associated with the file.


    It's interesting that different kinds of filesystems (or
    filesystem-like-things) have very different kinds of metadata
    available. Like last-modified, last-accessed, inode (identity),
    version, title, branch, mimetype, log message, etc. And then there's
    information that's not quite metadata... like <link ref> data, or the
    volume name, the host, etc.

    I feel like a common interface for these different filesystems should
    somehow degrade well in terms of metadata, or expedite introspection in
    some fashion.

    The differences on the client side are probably easier to handle, as
    they can be handled by the constructor, which might look different for
    different filesystems. Like url('http://whatever', user='bob',
    password='secret', proxy='http://myproxy'), or
    cvs(pserver='cvs.sourceforge.net', repository='python'). Or should
    there be a string-based representation (i.e., URIs)? Of course for
    symmetry then __str__ would always return a URI, but for many
    circumstances we'd prefer a more concise notation, like a filesystem
    path (though most other cases would be acceptable squeezed into URIs).


    I'd have placed the version in the object itself, not as an argument to
    open. Then you'd want to query for alternate versions, most recent
    version -- maybe some version identifier that meant most recent... a
    similar situation might be language negotiation with an HTTP file.

    Ian
     
    Ian Bicking, Jul 26, 2003
    #10
  11. Ian Bicking

    Ian Bicking Guest

    On Fri, 2003-07-25 at 14:56, holger krekel wrote:
    > > The multiple walk methods would only be a shortcut anyway. Again, they
    > > might be difficult in a situation like a URL where directory and file
    > > are intermingled (and maybe ReiserFS 4...?) -- which maybe is okay, a
    > > urlpath object simply wouldn't implement that walker.

    >
    > Yep, URL pathes have no notion of directories and files. Thus a general
    > URL path can't have a 'listdir' method and thus we can't recurse.
    > You can easily special case it for Apache's "Indexes" view, though :)


    WebDAV does, though, doesn't it? But you can still edit the directory
    resource, so it gets overloaded. WebDAV's use of GET is messed up.

    And we should specify HTTP, of course, since FTP does have a notion of
    directories, and possibly other URL methods would as well.

    But this is a digression...

    Ian
     
    Ian Bicking, Jul 26, 2003
    #11
  12. On 25 Jul 2003 20:48:07 -0500, Ian Bicking <> wrote:

    >On Fri, 2003-07-25 at 20:10, Van Gale wrote:
    >> Interesting, I started a project modifying Jason's Path module to work
    >> on subversion trees as well. I didn't get too far before putting the
    >> project on a back-burner so I'm glad to hear someone else is thinking
    >> the same way :)
    >>
    >> My extensions to Path included an additional argument to "open" that
    >> included a version number, and a mechanism for retrieving some kind of
    >> "metadata" associated with the file.

    >
    >It's interesting that different kinds of filesystems (or
    >filesystem-like-things) have very different kinds of metadata
    >available. Like last-modified, last-accessed, inode (identity),
    >version, title, branch, mimetype, log message, etc. And then there's
    >information that's not quite metadata... like <link ref> data, or the
    >volume name, the host, etc.
    >
    >I feel like a common interface for these different filesystems should
    >somehow degrade well in terms of metadata, or expedite introspection in
    >some fashion.
    >

    IMO a mounted file system per se should be represented by an object, and then
    that object should have the methods to deliver generic or file-system-specific
    file and path and walking objects etc.

    After all, even NT can see DOS partitions, vs NTFS vs raw floppy
    and HD images of potentially foreign formats. And my slackware linux sees one DOS partition
    that can be alternately booted, but can read from slackware via a mount.

    Cf. another post in this thread (which didn't get any response ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Jul 26, 2003
    #12
  13. Van Gale wrote:
    > holger krekel wrote:
    > > Yes, i think adding platform specific methods to a Path object makes sense.
    > > A friend and me started working on (local and subversion) Path
    > > implementations last week.
    > > ...

    >
    > Interesting, I started a project modifying Jason's Path module to work
    > on subversion trees as well. I didn't get too far before putting the
    > project on a back-burner so I'm glad to hear someone else is thinking
    > the same way :)


    It's even working although i am not sure we stay with the
    subversion-python bindings as they are fragile and incomplete at places.
    We might switch to using the commandline "svn" utility for the time beeing.

    > My extensions to Path included an additional argument to "open" that
    > included a version number, and a mechanism for retrieving some kind of
    > "metadata" associated with the file.


    We instantiate the Path like so

    path = SvnPath('http://codespeak.net/svn/vpath/trunk/dist', rev=X)

    where X is either -1 (default) meaning it should grab the latest
    revision or some positive revision number. When you 'visit' or
    'listdir' or 'open' on that 'path' you stay in the same revision
    and thus get a consistent view. this is obviously a nice property.
    Btw, via the above URL you'll get our current implementation with
    lots of unittests. You currently need subversion-python-bindings
    which are not exactly easy to get going unless you already have a
    server-side install.

    > I also made another Path module that implements a "poor mans cms" if
    > subversion/rcs/cvs are not available. It uses hidden files with version
    > numbers in the filename to emulate a real version control system.


    I thought about this too. Right now we just want to make it easy and
    complete enough.

    cheers,

    holger
     
    holger krekel, Jul 26, 2003
    #13
  14. Ian Bicking wrote:
    > On Fri, 2003-07-25 at 14:56, holger krekel wrote:
    > > > The multiple walk methods would only be a shortcut anyway. Again, they
    > > > might be difficult in a situation like a URL where directory and file
    > > > are intermingled (and maybe ReiserFS 4...?) -- which maybe is okay, a
    > > > urlpath object simply wouldn't implement that walker.

    > >
    > > Yep, URL pathes have no notion of directories and files. Thus a general
    > > URL path can't have a 'listdir' method and thus we can't recurse.
    > > You can easily special case it for Apache's "Indexes" view, though :)

    >
    > WebDAV does, though, doesn't it? But you can still edit the directory
    > resource, so it gets overloaded. WebDAV's use of GET is messed up.


    I am not very familiar with the low-level details of WebDAV but i think
    determining if something is a directory is done by a PROPGET command.

    cheers,

    holger
     
    holger krekel, Jul 26, 2003
    #14
  15. Ian Bicking

    John J. Lee Guest

    holger krekel <> writes:

    > Jason Orendorff wrote:

    [...about passing path objects to library methods that expect a string...]
    > > This is a type rule. Such a thing has no place in Python.

    >
    > Oh, the stdlib has lots of places where it expects certain types in
    > certain places. Look for e.g. 'isinstance'.


    It's not even a strict type rule. It's just that a path object
    wouldn't implement the string interface. I don't know why that would
    have 'no place in Python', or be 'counterlogical'.


    John
     
    John J. Lee, Jul 27, 2003
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Keith-Earl
    Replies:
    1
    Views:
    2,025
    Peter O'Reilly
    May 3, 2004
  2. Replies:
    0
    Views:
    2,298
  3. Mupota Muchelemba
    Replies:
    1
    Views:
    950
    Tony Morris
    Feb 4, 2004
  4. Ron Adam
    Replies:
    3
    Views:
    464
    Ron Adam
    Aug 2, 2005
  5. Maric Michaud
    Replies:
    0
    Views:
    7,221
    Maric Michaud
    Jun 24, 2006
Loading...

Share This Page