PRE-PEP: new Path class

G

Gerrit Holl

John said:
I'm adding a thread for comments on Gerrit Holl's pre-pep, which
can be found here:

http://tinyurl.com/2578q

I have updated a lot of things on the PEP in the past few days.
There are still larger and smaller open issues, though, besides the
usual 'I-can-change-my-mind-and-the-PEP-can-change-its-mind' things:

Quoting my own PEP:

Should path.__eq__ match path.samefile?

There are at least 2 possible ways to do it:

- Normalize both operands by checking to which actual file they
point (same (l)stat).
- Try to find out whether the paths point to the same filesystem
entry, without doing anything with the filesystem.

pro
- A path usually points to a certain place on the filesystem, and
two paths with different string representations may point to the same place,
which means they are essentially equal in usage.
con
- We would have to choose a way, so we should first decide which is
better and whether the difference is intuitive enough.
- It makes hashing more difficult/impossible.
conclusion
- I don't know.
links
- Bernard Herzog `points out
<http://mail.python.org/pipermail/python-list/2004-January/201857.html>`__
that it would essentialy make path-objects non-hashable.
- `James Orendorff's Path`_ inherits str.__eq__.
- `Mike C. Fletcher's Path`_ chooses for the first variant.

Do we need to treat unicode filenames specially?

I have no idea.

links
- An `explanation
<http://mail.python.org/pipermail/python-list/2004-January/201418.html>`__
by Martin von Loewis.

- should os.tempnam be included?
- can normpath be coded using only os.path constants (if so, it's in the
'platform-independent' class? (I think no)
- Should normalize be called normalized or not?
- Should stat be defined in the platform-dependent or -independent class?
- Should we include chdir and chroot? If so, where?
- Should rename return a new path object?
- Should renames be included?

And one meta-question:

Shall I submit this as an official PEP? Or shall I first fill in more
open issues and perhaps give the possibility to change "closed" issues?

See also: http://people.nl.linux.org/~gerrit/creaties/path/pep-xxxx.html

yours,
Gerrit.
 
C

Christoph Becker-Freyseng

Gerrit said:
This is indead a disadvantage. On the other hand, although p.openwith is
longer, I do think it is more readable. It occurs often that shorter is
not more readable: just think of all those 'obfuscated-oneliners'
contests in C and Perl.

Sure. However this case seems to be quite obvious both ways.
I think this is not a good idea. In my opinion, any __gt__ method should
always compare, no more, no less. Further, it's very unusal to call
something like this.
Additionaly getting the right documentation for these "operator-tricks"
is harder.
Another possibility is defining __call__:

path(f, *args) == f(str(path), *args)

which may be unconvinient as well, however. Is it intuitive to let
calling mean opening?
I like this one. What else could calling a path mean?
Hm, I think almost all file constructors have the path as the first
argument. Are there counter-examples?
The whole openwith (other then path.open) is IMO mainly for
"backward-compatibility" if the function doesn't know the path-class.
I think openwith or better __call__ could be used for other things, too
--- not only for opening a file. E.g. there could be some
"FileWatcher-Modules" that might only accept strings and have a call like:
watchFile(onChangeFunc, path_string)

For different postition of the path_string we could make a special case
if the path-object is given as an argument of the call.
path(f, arg1, arg2, path, arg3, arg4, ...)
results in: f(arg1, arg2, str(path), arg3, arg4, ...)

Changing old code to use the new Path-class could be done with a minimal
amount of work then.
OLD: result= f(arg1, arg2, path, arg3, arg4, ...) # path is/was a string
here
..... result= f,arg1, arg2, path, arg3, arg4, ...)
..... result= (f, arg1, arg2, path, arg3, arg4, ...)
NEW: result= path(f, arg1, arg2, path, arg3, arg4, ...)
Yes... I was planning to define it in a class which is able to 'touch'
the filesystem, so an FTPPath could subclass basepath without the need
to overload open, or subclass ReadablePath with this need.
Fine :)


Christoph Becker-Freyseng


P.S.: ((I'll post the other things in different Re:'s))
 
C

Christoph Becker-Freyseng

Gerrit said:
I think this makes validating a path essentially impossible to get
right. Let's say we can declare path to be invalid, but we can't declare
a path to be valid. Is it a good thing to add a method for it then? (I
think yes)




I agree that it's better. We should allow invalid paths after all.
Yes. path.isValid would it make possible to check better for situations
were calling things like mkdir and mkdirs (they directly depend on the
path being valid) makes trouble.
We could also add an InvalidPathException. Which will at least help
debugging. isValid could have a default argument "raiseExc=False" to
make checking in these functions convienent e.g.
def mkdir(self):
self.isValid(raiseExc=True)
moreStuff ...
If the path is invalid it will stop with an InvalidPathException.

Also path.exists should depend on path.isValid (not the other way).
If the full-path doesn't exist one can go up all the parent-dirs until
one exist. Here we can check if the specified sub-path is valid by
getting some information about the filesystem where the parent-dir is
stored. *this implicit makes isValid a reading method* --- however AFAIK
isValid is only needed for reading and writing methods.
Well, if path.exists(), path.isValid(). The question is - should
path.isValid() read the filesystem?
Yes as stated above.

path.exists should at first check if the path isValid at all. If it
isn't a statement about it's existance is senseless. In this case it
should return None, which evaluates also False but is different (it's a
"dreiwertige Logik" --- when you have 3 states (true, false, unknown)
how is this called in English)

FIXME: We have to finetune the "recursive" behavior of isValid and
exists otherwise we have a lot of unnecessary calls as exists and
isValid call each other going one dir up ...


Christoph Becker-Freyseng
 
C

Christoph Becker-Freyseng

Gerrit said:
I have though about this, too. It should certainly not be fully mutable,
because if a path changes, it changes. But maybe we could have a
.normalise_inplace() which mutates the Path? What consequences would
this have for hashability?

I like paths to be hashable. so they probably should be immutable.
Yes. (already in the PEP)
While paths aren't strings they have a lot in common because paths (as I
now think of them) are not directly associated with files. (Paths can be
nonexistent or even invalid)

Moreover the basic operations like __eq__ shouldn't be reading methods ()!

__hash__ has to be compatible with __eq__.
hash(p1) == hash(p2) <<<=== p1 == p2

Also
hash(p1) == hash(p2) ===>>> p1 == p2
should be true as far as possible.

I think
def __hash__(self):
return hash(str(self.normalized()))
would do this fine.

So for __eq__ it follows naturaly
def __eq__(self, other):
FIXME: isinstance checking
return (str(self.normalized()) == str(other.normalized()))
It cares about nonexistent paths, too. (The samefile-solution won't ---
we might code a special case for it ...)


What about __cmp__?
I've to admit that __cmp__ comparing the file-sizes is nice (__eq__=
samefile is attractive, too --- they're both evil temptations :) )

However __eq__ and __cmp__ returning possibly different results is odd.
Finaly implementing __cmp__ that way would make it a reading method and
is problematic for nonexistent paths.

I'd like an implementation of __cmp__ which is more path specific than
just string.__cmp__. But it should be consistent with __eq__.
Could we do something about parent and sub dirs?




Christoph Becker-Freyseng
 
C

Christoph Becker-Freyseng

I think the implementation should be changed for the "NormalFSPath".

def exists(self):
try:
os.stat(str(self))
return True
except OSError, exc: # Couldn't stat so what's up
if exc.errno == errno.ENOENT: # it simply doesn't exist
return False
return None # the path is invalid



def isValid(self, raiseExc=False):
if self.exists() is None:
if raiseExc:
raise InvalidPath
else:
return False
else:
return True




Christoph Becker-Freyseng
 
B

Bernhard Herzog

Christoph Becker-Freyseng said:
So for __eq__ it follows naturaly
def __eq__(self, other):
FIXME: isinstance checking
return (str(self.normalized()) == str(other.normalized()))
It cares about nonexistent paths, too. (The samefile-solution won't ---
we might code a special case for it ...)

What exactly does normalized() do? If it's equivalent to
os.path.normpath, then p1 == p2 might be true even though they refer to
different files (on posix, a/../b is not necessarily the same file as
b). OTOH, if it also called os.path.realpath too to take symlinks into
account, __eq__ would depend on the state of the filesystem which is
also bad.

IMO __eq__ should simply compare the strings without any modification.
If you want to compare normalized paths you should have to normalize
them explicitly.


Bernhard
 
C

Christoph Becker-Freyseng

Bernhard said:
What exactly does normalized() do? If it's equivalent to
os.path.normpath, then p1 == p2 might be true even though they refer to IMO yes.
different files (on posix, a/../b is not necessarily the same file as
b). OTOH, if it also called os.path.realpath too to take symlinks into
account, __eq__ would depend on the state of the filesystem which is
also bad.

IMO __eq__ should simply compare the strings without any modification.
If you want to compare normalized paths you should have to normalize
them explicitly.
I agree with that. While it would be nice if __eq__ could match such
things it is ambiguous.
So better let __eq__ be a bit strict than faulty.



Christoph Becker-Freyseng
 
C

Christoph Becker-Freyseng

As I pointed out path.__cmp__ should not be used for e.g. comparing
filesizes.

But features like sorting on filesizes are very useful.
I'm not sure if Gerrit Holl already meant this in his conclusion on
"Comparing files" in the PEP.
I'll outline it a bit ...

I propose a callable singleton class which only instance we assign to
sort_on (defined in the path-module).
It will have methods like: filesize, extension, filename, etc.
They will all be defined like:
def filesize(self, path1, path2):
try:
return path1._cmp_filesize(path2)
except XXX: # catch Exceptions that are raised because path1 doesn't
know how to compare with path2 (for different path-subclasses)
XXX
try:
return (-1) * path2._cmp_filesize(path1) # is this the best way to do
this?
except XXX:
XXX
raise YYY # "path1 and path2 can't be compared on filesize; class1 and
class2 are not compatible"

And
def __call__(self, *args):
if len(args) == 0:
return self.filesize # example!
elif len(args) == 1: # allow comparing uncommon things for subclasses
of path e.g. ServerName/IPs for FTPPath ...
def cmp_x(path1, path2, what=str(args[0])):
# like filesize but
pathCmpFunc= getattr(path1, '_cmp_'+what)
return pathCmpFunc(path2)
# Catch exceptions ...
return cmp_x
elif len(args) == 2: # default comparison
return self.filesize(path1, path2) # example!
else:
raise "Won't work ... FIXME"




Then we can have things like:

l= [path1, path2, path3]
l.sort(path.sort_on.filesize)
l.sort(path.sort_on.extension)

.....


I like this :)

What do You think?



Christoph Becker-Freyseng
 
J

Jp Calderone

As I pointed out path.__cmp__ should not be used for e.g. comparing
filesizes.

But features like sorting on filesizes are very useful.
I'm not sure if Gerrit Holl already meant this in his conclusion on
"Comparing files" in the PEP.
I'll outline it a bit ...

This seems to be covered by the new builtin DSU support which will exist
in 2.4. See the (many, many) posts on python-dev on the "groupby" iterator:

http://mail.python.org/pipermail/python-dev/2003-December/thread.html

In particular, the ones talking about `attrget'.

Jp
 
C

Christoph Becker-Freyseng

Gerrit said:
John said:
I'm adding a thread for comments on Gerrit Holl's pre-pep, which
[...]
Shall I submit this as an official PEP? Or shall I first fill in more
open issues and perhaps give the possibility to change "closed" issues?

I think there are still a lot of issues. I think letting settle things
down at first is wiser. And then present a PEP where many (even better
all) contributors agree. (I didn't like the result of PEP308 very much ...)

In additions there are still good points in the older discussions and
existing modules that should be integrated (You linked in the prePEP).


Moreover I'd like to extend the pre-PEP (and PEP):
"PEP xxx: new path module"
Because there is more than just the Path-Class: BaseClasses, Exceptions,
Helper-Functions ...



Christoph Becker-Freyseng
 
G

Gerrit Holl

Christoph said:
Gerrit said:
John said:
I'm adding a thread for comments on Gerrit Holl's pre-pep, which
[...]
Shall I submit this as an official PEP? Or shall I first fill in more
open issues and perhaps give the possibility to change "closed" issues?

I think there are still a lot of issues. I think letting settle things
down at first is wiser. And then present a PEP where many (even better
all) contributors agree. (I didn't like the result of PEP308 very much ...)

Yes. But of course, a PEP being an official PEP does not mean there
can't be any more changes to it. So the question is, at what point does
a pre-PEP become a PEP? Some PEPs have a $Revision: 1.20$, after all.
In additions there are still good points in the older discussions and
existing modules that should be integrated (You linked in the prePEP).

Yes, that's true. I'll do that.

yours,
Gerrit.

--
49. If any one take money from a merchant, and give the merchant a
field tillable for corn or sesame and order him to plant corn or sesame in
the field, and to harvest the crop; if the cultivator plant corn or sesame
in the field, at the harvest the corn or sesame that is in the field shall
belong to the owner of the field and he shall pay corn as rent, for the
money he received from the merchant, and the livelihood of the cultivator
shall he give to the merchant.
-- 1780 BC, Hammurabi, Code of Law
 
C

Christoph Becker-Freyseng

I've found some links that might be interresting.

http://www.w3.org/2000/10/swap/uripath.py
http://dev.w3.org/cvsweb/2000/10/swap/uripath.html?rev=1.9 [the doc for it]

Links of the operator.attrgetter / "groupby" iterator discussion in
python-dev (Jp Calderone posted a hint on it already --- I chose some
exemplary posts)

http://mail.python.org/pipermail/python-dev/2003-December/040614.html
http://mail.python.org/pipermail/python-dev/2003-December/040628.html
http://mail.python.org/pipermail/python-dev/2003-December/040643.html



Christoph Becker-Freyseng
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,266
Latest member
DavidaAlla

Latest Threads

Top