[path-PEP] Path inherits from basestring again

  • Thread starter Reinhold Birkenfeld
  • Start date
A

Andrew Dalke

Reinhold said:
Okay. While a path has its clear use cases and those don't need above methods,
it may be that some brain-dead functions needs them.

"brain-dead"?

Consider this code, which I think is not atypical.

import sys

def _read_file(filename):
if filename == "-":
# Can use '-' to mean stdin
return sys.stdin
else:
return open(filename, "rU")


def file_sum(filename):
total = 0
for line in _read_file(filename):
total += int(line)
return total

(Actually, I would probably write it

def _read_file(file):
if isinstance(file, basestring):
if filename == "-":
# Can use '-' to mean stdin
return sys.stdin
else:
return open(filename, "rU")
return file

)

Because the current sandbox Path doesn't support
the is-equal test with strings, the above function
won't work with a filename = path.Path("-"). It
will instead raise an exception saying
IOError: [Errno 2] No such file or directory: '-'

(Yes, the code as-is can't handle a file named '-'.
The usual workaround (and there are many programs
which support '-' as an alias for stdin) is to use "./-"

% cat > './-'
This is a file
% cat ./-
This is a file
% cat -
I'm typing directly into stdin.
^D
I'm typing directly into stdin.
%
)


If I start using the path.Path then in order to use
this function my upstream code must be careful on
input to distinguish between filenames which are
really filenames and which are special-cased pseudo
filenames.

Often the code using the API doesn't even know which
names are special. Even if it is documented,
the library developer may decide in the future to
extend the list of pseudo filenames to include, say,
environment variable style expansion, as
$HOME/.config

Perhaps the library developer should have come up
with a new naming system to include both types of
file naming schemes, but that's rather overkill.

As a programmer calling the API should I convert
all my path.Path objects to strings before using it?
Or to Unicode? How do I know which filenames will
be treated specially through time?

Is there a method to turn a path.Path into the actual
string? str() and unicode() don't work because I
want the result to be unicode if the OS&Python build
support it, otherwise string.

Is that library example I mentioned "brain-dead"?
I don't think so. Instead I think you are pushing
too much for purity and making changes that will
cause problems - and hard to fix problems - with
existing libraries.



Here's an example of code from an existing library
which will break in several ways if it's passed a
path object instead of a string. It comes from
spambayes/mboxutils.py

#################

This is mostly a wrapper around the various useful classes in the
standard mailbox module, to do some intelligent guessing of the
mailbox type given a mailbox argument.

+foo -- MH mailbox +foo
+foo,bar -- MH mailboxes +foo and +bar concatenated
+ALL -- a shortcut for *all* MH mailboxes
/foo/bar -- (existing file) a Unix-style mailbox
/foo/bar/ -- (existing directory) a directory full of .txt and .lorien
files
/foo/bar/ -- (existing directory with a cur/ subdirectory)
Maildir mailbox
/foo/Mail/bar/ -- (existing directory with /Mail/ in its path)
alternative way of spelling an MH mailbox

....

def getmbox(name):
"""Return an mbox iterator given a file/directory/folder name."""

if name == "-":
return [get_message(sys.stdin)]

if name.startswith("+"):
# MH folder name: +folder, +f1,f2,f2, or +ALL
name = name[1:]
import mhlib
mh = mhlib.MH()
if name == "ALL":
names = mh.listfolders()
elif ',' in name:
names = name.split(',')
else:
names = [name]
mboxes = []
mhpath = mh.getpath()
for name in names:
filename = os.path.join(mhpath, name)
mbox = mailbox.MHMailbox(filename, get_message)
mboxes.append(mbox)
if len(mboxes) == 1:
return iter(mboxes[0])
else:
return _cat(mboxes)

if os.path.isdir(name):
# XXX Bogus: use a Maildir if /cur is a subdirectory, else a MHMailbox
# if the pathname contains /Mail/, else a DirOfTxtFileMailbox.
if os.path.exists(os.path.join(name, 'cur')):
mbox = mailbox.Maildir(name, get_message)
elif name.find("/Mail/") >= 0:
mbox = mailbox.MHMailbox(name, get_message)
else:
mbox = DirOfTxtFileMailbox(name, get_message)
else:
fp = open(name, "rb")
mbox = mailbox.PortableUnixMailbox(fp, get_message)
return iter(mbox)



It breaks with the current sandbox path because:
- a path can't be compared to "-"
- range isn't supported, as "name = name[1:]"

note that this example uses __contains__ ("," in name)


Is this function brain-dead? Is it reasonable that people might
want to pass a path.Path() directly to it? If not, what's
the way to convert the path.Path() into the correct string
object?

Andrew
(e-mail address removed)
 
C

Carl Banks

Reinhold said:
Peter said:
Could you please expand on what this means? Are you referring to doing
< and >= type operations on Paths and strings, or == and != or all those
or something else entirely?

All of these. Do you need them?
[snip]

At the moment, I think about overriding certain string methods that make
absolutely no sense on a path and raising an exception from them.


Ick. This reeks of the sort of hubris from people who think they
anticipate all valid uses of something.

Is it a basestring or not? If it is, then let it be a basestring. It
is unreasonable to want to format a pathame for printing? We might
want to retain ljust and friends. Maybe there's a filenaming scheme
where files are related by having a character changed here or there.
So we might want to iterate though the characters in a pathname. How
do you know how people are going to use it? We're all supposed to be
adults here.

Let me suggest that wanting to remove all these methods/operations
suggests that one doesn't really think it ought to be a basestring.
The way I see it, the only compelling reason for it to be a basestring
is to accommodate poorly designed functions that test whether an
argument is a filename or a file object using isinstance(basestring,x)
on it. But the best thing to do is fix those interfaces, and let path
be what it should be, and not a hack to accommodate poor code.
 
R

Reinhold Birkenfeld

Reinhold said:
Hi,

the arguments in the previous thread were convincing enough, so I made the
Path class inherit from str/unicode again.

Current change:

* Add base() method for converting to str/unicode.
* Allow compare against normal strings.

Reinhold
 
P

Peter Hansen

Reinhold said:
Current change:

* Add base() method for converting to str/unicode.

Would basestring() be a better name? Partly because that seems to be
exactly what it's doing, but more because there are (or used to be?)
other things in Path that used the word "base", such as "basename".

-1 on that specific name if it could be easily confused with "basename"
types of things.

-Peter
 
R

Reinhold Birkenfeld

Peter said:
Would basestring() be a better name? Partly because that seems to be
exactly what it's doing, but more because there are (or used to be?)
other things in Path that used the word "base", such as "basename".

-1 on that specific name if it could be easily confused with "basename"
types of things.

Right, that was a concern of mine, too.
"tobase"?
"tostring"?
"tobasestring"?

Alternative is to set a class attribute "Base" of the Path class. Or export
PathBase as a name from the module (but that's not quite useful, because I
expect Path to be imported via "from os.path import Path").

Reinhold
 
P

Peter Hansen

> "tobase"?
> "tostring"?
> "tobasestring"?

Of these choices, the latter would be preferable.
> Alternative is to set a class attribute "Base" of the
> Path class. Or export PathBase as a name from the module
> (but that's not quite useful, because I
> expect Path to be imported via "from os.path import Path").

I don't understand how that would work. An attribute on the *class*?
What would it be, a callable? So mypath.Base(mypath) or something?
Please elaborate...

What about just .basestring, as a read-only attribute on the Path object?

-Peter
 
R

Reinhold Birkenfeld

Peter said:
Of these choices, the latter would be preferable.


I don't understand how that would work. An attribute on the *class*?
What would it be, a callable? So mypath.Base(mypath) or something?
Please elaborate...

[_base is str or unicode]

class Path:
Base = _base
[...]

So you could do "Path.Base(mypath)" or "mypath.Base(mypath)".
What about just .basestring, as a read-only attribute on the Path object?

Reasonable, though the term as such is preoccupied too.

Reinhold
 
S

skip

Reinhold> Right, that was a concern of mine, too.
Reinhold> "tobase"?
Reinhold> "tostring"?
Reinhold> "tobasestring"?

If we're on a filesystem that understands unicode, would somepath.tostring()
return a unicode object or a string object encoded with some
to-be-determined encoding?

Why not just add __str__ and __unicode__ methods to the class and let the
user use str(somepath) or unicode(somepath) as needed?

Or am I missing something fundamental about what the base() method is
supposed to do?

Skip
 
R

Reinhold Birkenfeld

Reinhold> Right, that was a concern of mine, too.
Reinhold> "tobase"?
Reinhold> "tostring"?
Reinhold> "tobasestring"?

If we're on a filesystem that understands unicode, would somepath.tostring()
return a unicode object or a string object encoded with some
to-be-determined encoding?

Whatever the base of the Path object is. It selects its base class based on
os.path.supports_unicode_filenames.
Why not just add __str__ and __unicode__ methods to the class and let the
user use str(somepath) or unicode(somepath) as needed?

Or am I missing something fundamental about what the base() method is
supposed to do?

It should provide an alternative way of spelling Path.__bases__[0](path).

Reinhold
 
A

Andrew Dalke

Now that [:] slicing works, and returns a string,
another way to convert from path.Path to str/unicode
is path[:]

Andrew
(e-mail address removed)
 
B

Bengt Richter

Right, that was a concern of mine, too.
"tobase"? -1
+1
-0

Alternative is to set a class attribute "Base" of the Path class. Or export
PathBase as a name from the module (but that's not quite useful, because I
expect Path to be imported via "from os.path import Path").

Reinhold

Regards,
Bengt Richter
 
N

NickC

[Re: how to get at the base class]

Do you really want to have a "only works for Path" way to get at the
base class, rather than using the canonical Path.__bases__[0]?

How about a new property in the os.path module instead? Something like
os.path.path_type.

Then os.path.path_type is unicode if and only if
os.path.supports_unicode_filenames is True. Otherwise,
os.path.path_type is str.

Then converting a Path to str or unicode is possible using:

as_str_or_unicode = os.path.path_type(some_path)

The other thing is that you can simply make Path inherit from
os.path.path_type.

Regards,
Nick C.
 
R

Reinhold Birkenfeld

NickC said:
[Re: how to get at the base class]

Do you really want to have a "only works for Path" way to get at the
base class, rather than using the canonical Path.__bases__[0]?

How about a new property in the os.path module instead? Something like
os.path.path_type.

Then os.path.path_type is unicode if and only if
os.path.supports_unicode_filenames is True. Otherwise,
os.path.path_type is str.

Then converting a Path to str or unicode is possible using:

as_str_or_unicode = os.path.path_type(some_path)

The other thing is that you can simply make Path inherit from
os.path.path_type.

That's what I suggested with Path.Base. It has the advantage that you don't have
to import os.path to get at it (Path is meant so that you can avoid os.path).

Reinhold
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,167
Latest member
SusanaSwan
Top