finding file size

S

Sean Ross

Hi.

Recently I made a small script to do some file transferring (among other
things). I wanted to monitor the progress of the file transfer, so I needed
to know the size of the files I was transferring. Finding out how to get
this information took some time (reading the manuals - googling did not
prove worthwhile). Anyway, I did eventually figure out how to do it (there
are a few ways, including os.path.getsize(filename)).

My question is this: Is there a reason why file objects could not have a
size method or property? So that you could then just ask the file how big it
is using fd.size or fd.size(). I'm just curious, because, well, it seems to
have obvious utility, and the way to find it is less than obvious (at least,
it was to me).

Thanks,
Sean
 
D

David M. Wilson

Sean Ross said:
My question is this: Is there a reason why file objects could not have a
size method or property? So that you could then just ask the file how big it
is using fd.size or fd.size(). I'm just curious, because, well, it seems to
have obvious utility, and the way to find it is less than obvious (at least,
it was to me).

Hey!

1) Using 'fd' as a name for a file object is a bad idea - you can get
fds from os.open. If you insist on C-ish names, how about 'fp'
instead? :)

2) There's nothing to stop the file object from having a size method,
except that file-like objects then have more to implement.

How about something like:

py> class SizedFile(file):
.... def __len__(self):
.... oldpos = self.tell()
.... self.seek(0, 2)
.... length = self.tell()
.... self.seek(oldpos)
.... return length
....
py> bleh = SizedFile("/etc/passwd")
py> len(bleh)
1520
py> len([ x for x in bleh ])
33



As I wrote this I realised it's wrong - size() would be better, since
the length of the sequence is not the number of bytes. Maybe it is in
binary mode? Dunno, me sleepy, goodnight..


David.
 
M

Martin v. Loewis

Sean said:
My question is this: Is there a reason why file objects could not have a
size method or property?

Yes. In Python, file objects belong to the larger category of "file-like
objects", and not all file-like objects have the inherent notion of a
size. E.g. what would you think sys.stdin.size should return (which
actually is a proper file object - not just file-like)?

Other examples include the things returned from os.popen or socket.socket.

Regards,
Martin
 
S

Sean Ross

Martin v. Loewis said:
Yes. In Python, file objects belong to the larger category of "file-like
objects", and not all file-like objects have the inherent notion of a
size. E.g. what would you think sys.stdin.size should return (which
actually is a proper file object - not just file-like)?

Other examples include the things returned from os.popen or socket.socket.

Regards,
Martin

I see what you mean. I suppose the only option I could think of for
sys.stdin, os.popen, and socket.socket would be to return the number of
bytes written to these objects so far. But, then, those objects, or
something else, would have to track that information. Also, pipes and
sockets could be written to from two directions, so is the size the total
number of bytes written from both sides, or would you prefer to know how
much you'd written as the size, or how much the other side had written
(Perhaps all three would be nice). Another option would be to return '-1',
or 'None', to let people know that the request is unsupported for this
file-like object. Still another option would be to raise an exception. And,
of course, there's the ever popular, leave-well-enough-alone option.

Anyway, thank you for your response. I see it's merit.
Sean
 
S

Sean Ross

David M. Wilson said:
1) Using 'fd' as a name for a file object is a bad idea - you can get
fds from os.open. If you insist on C-ish names, how about 'fp'
instead? :)

or just f would work ...

2) There's nothing to stop the file object from having a size method,
except that file-like objects then have more to implement.

See Martin v. Loewis' post for some other rationale.
How about something like:

py> class SizedFile(file):
... def __len__(self):
... oldpos = self.tell()
... self.seek(0, 2)
... length = self.tell()
... self.seek(oldpos)
... return length
...
py> bleh = SizedFile("/etc/passwd")
py> len(bleh)
1520
py> len([ x for x in bleh ])
33



As I wrote this I realised it's wrong - size() would be better, since
the length of the sequence is not the number of bytes. Maybe it is in
binary mode? Dunno, me sleepy, goodnight..


David.

Right. size() is more apt. Also, while I appreciate the effort of
subclassing file, what I was looking for was to have the builtin file (or
file-like) objects expose this operation, not just custom implementations.

Thanks for your response,
Sean
 
G

Gerrit Holl

Hi,

I propose to add a "filename" type to Python.
Yes. In Python, file objects belong to the larger category of "file-like
objects", and not all file-like objects have the inherent notion of a
size. E.g. what would you think sys.stdin.size should return (which
actually is a proper file object - not just file-like)?

A different solution to this problem would be to introduce "filename"
type to Python, a subclass of str. The "name" attribute of file would be of this
type. This type would inherit a lot of os.path stuff: getsize becomes
simpler, more readable, and more object oriented, as do other os.path
functions. I think the alternatives look a lot more prety:

OLD NEW
os.path.realpath(fn) fn.realpath()
os.path.getmtime(fp.name) fp.name.getmtime()
os.path.ismount(os.path.dirname(fp.name)) fp.name.dirname().ismount()

It's more beatiful, simpler, flatter (#3), practical, obvious, easy.

problem: what do do with os.path constants?
solution: make them class attributes
problem: how to handle posixpath, ntpath, macpath?
solution: abstract Path class with NTPath, MacPath, PosixPath sublasses which is the actual type of e.g. fn.name on a certain platform
problem: backwards compatibility
solution: same as string methods
problem: "/dev/null" reads as a Path but is a str
solution: path("/dev/null") is a little more typing for a lot more luxery
problem: what to do with commonprefix?
solution: don't know
problem: what to do with os.path.walk?
solution: use os.walk instead
problem: what to do with sameopenfile?
solution: make it a file method
problem: what to do with join, split?
solution: rename to joinpath, splitpath.

Any comments?

yours,
Gerrit.
 
M

Martin v. Loewis

Gerrit said:
Any comments?

It should be possible to implement that type without modifying
Python proper. It might make a good recipe for the cookbook.

Any volunteers?

Regards,
Martin
 
P

Peter Otten

Gerrit said:
I propose to add a "filename" type to Python.
A different solution to this problem would be to introduce "filename"
type to Python, a subclass of str. The "name" attribute of file would be
of this type. This type would inherit a lot of os.path stuff: getsize
becomes simpler, more readable, and more object oriented, as do other
os.path functions. I think the alternatives look a lot more prety:
OLD NEW
os.path.realpath(fn) fn.realpath()
os.path.getmtime(fp.name) fp.name.getmtime()
os.path.ismount(os.path.dirname(fp.name)) fp.name.dirname().ismount()

It's more beatiful, simpler, flatter (#3), practical, obvious, easy.

You might have a look at

http://mail.python.org/pipermail/python-list/2002-June/108425.html

http://members.rogers.com/mcfletch/programming/filepath.py

has an implementation of your proposal by Mike C. Fletcher. I think both
filename class and os.path functions can peacefully coexist.


Peter
 
J

Just

Gerrit Holl said:
I propose to add a "filename" type to Python.
[ ... ]
A different solution to this problem would be to introduce "filename"
type to Python, a subclass of str. The "name" attribute of file would be of
this
type. This type would inherit a lot of os.path stuff: getsize becomes
simpler, more readable, and more object oriented, as do other os.path
functions. I think the alternatives look a lot more prety:

OLD NEW
os.path.realpath(fn) fn.realpath()
os.path.getmtime(fp.name) fp.name.getmtime()
os.path.ismount(os.path.dirname(fp.name)) fp.name.dirname().ismount()

It's more beatiful, simpler, flatter (#3), practical, obvious, easy.

This has been proposed a few times, and even implemented at least once:

http://www.jorendorff.com/articles/python/path/

I'm very much in favor of adding such an object, but I don't like Jason
Orendorff's design all that much. There has been a discussion about it
in the past:

http://groups.google.com/groups?q=g:thl1422628736d&dq=&hl=en&lr=&ie=UTF-8
&oe=UTF-8&safe=off&selm=mailman.1057651032.22842.python-list%40python.org

Just
 
J

John Roth

Martin v. Loewis said:
Yes. In Python, file objects belong to the larger category of "file-like
objects", and not all file-like objects have the inherent notion of a
size. E.g. what would you think sys.stdin.size should return (which
actually is a proper file object - not just file-like)?

Other examples include the things returned from os.popen or socket.socket.

I think the issue here is that the abstract concept behind a "file-like
object"
is that of something external that can be opened, read, written to and
closed.
As you say, this does not include the notion of basic implementation: a file
on a file system is a different animal than a network socket, which is
different
from a pipe, etc.

I think we need an object that encapsulates the notion of a file (or
directory)
as a file system object. That object shouldn't support "file-like"
activities:
it should have a method that returns a standard file object to do that.

I like Geritt Holl's filename suggestion as well, but it's not the same
as this suggestion.

John Roth
 
G

Gerrit Holl

Martin said:
It should be possible to implement that type without modifying
Python proper.

It should indeed. But it isn't what I had in mind, and it's not exactly
the same as a filename type in the language: for example, the name
attribute of a file will still be a string, just as the contents of
os.listdir, glob.glob, etc. (it seems glob follows listdir).
It might make a good recipe for the cookbook.

If the type would be created without changing python proper, the type
would probably just call os.path.foo for the filename.foo method. It
would be the other way around if the type would become part of the
language: os.path would only be there for backward compatibility, like
string. But in order for os.listdir (and probably more functions) to
return Path objects rather than strings, a C implementation would be
preferable (necessary?). On the other hand, would this type ever be
added, a python implementation would of course be a must.
Any volunteers?

I may have a look at it.
When thinking about it, a lot more issues than raised in my first post
need to be resolved, like what to do when the intializer is empty...
curdir? root?

I guess there would a base class with all os-independent stuff, or stuff
that can be coded independently, e.g:
class Path(str):
def split(self):
return self.rsplit(self.sep, 1)
def splitext(self):
return self.rsplit(self.extsep, 1)
def basename(self):
return self.split()[1]
def dirname(self):
return self.split()[0]
def getsize(self):
return os.stat(self).st_size
def getmtime(self):
return os.stat(self).st_mtime
def getatime(self):
return os.stat(self).st_atime
def getctime(self):
return os.stat(self).st_ctime

where the subclasses define, sep, extsep, etc.

yours,
Gerrit.

--
168. If a man wish to put his son out of his house, and declare before
the judge: "I want to put my son out," then the judge shall examine into
his reasons. If the son be guilty of no great fault, for which he can be
rightfully put out, the father shall not put him out.
-- 1780 BC, Hammurabi, Code of Law
 
M

Mike C. Fletcher

Gerrit said:
Peter Otten wrote:



Thanks for the links.
(I think they don't, by the way)
You hawks, always seeing war where we see peace :) ;) .

Seriously, though, a path type would eventually have ~ the same relation
as the str type now does to the string module. Initial implementations
of a path type are going to use the os.path stuff, but to avoid code
duplication, the os.path module would eventually become a set of trivial
wrappers that dispatch on their first argument's method(s) (after
coercian to path type).

Is that peaceful? I don't know. If there's a war, let's be honest,
os.path is going to take a good long while to defeat because it's there
and embedded directly into thousands upon thousands of scripts and
applications. We can fight a decent campaign, making a common module,
then getting it blessed into a standard module, encouraging newbies to
shun the dark old os.path way, encouraging maintainers to use the new
module throughout their code-base, etceteras, but os.path is going to
survive a good long while, and I'd imagine that being friendly toward it
would keep a few of our comrades off the floor.

Just as a note, however, we haven't had a *huge* outpouring of glee for
the current spike-tests/implementations. So it may be that we need to
get our own little army in shape before attacking the citadel :) .

Have fun,
Mike

_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
 
G

Gerrit Holl

[Peter Otten]
[Gerrit Holl (me)]
[Mike C. Fletcher]
Is that peaceful? I don't know. If there's a war, let's be honest,
os.path is going to take a good long while to defeat because it's there
and embedded directly into thousands upon thousands of scripts and
applications. We can fight a decent campaign, making a common module,
then getting it blessed into a standard module, encouraging newbies to
shun the dark old os.path way, encouraging maintainers to use the new
module throughout their code-base, etceteras, but os.path is going to
survive a good long while, and I'd imagine that being friendly toward it
would keep a few of our comrades off the floor.

Sure, I don't think os.path would die soon, it will surely take longer
than the string module to die. But I think there is a number of places
where Python could be more object-oriented than it is, and this is one
of them. The first step in making those modules more object-oriented is
providing a OO-alternative: the second step is deprecating the old way,
and the third step is providing only the OO-way. The third step will
surely not be made until Python 3.0.

The string module has made the first two steps. In my view, the time
module has made the first step, although I'm not sure whether that's
true. I would like to see a datetime module that makes the time module
totally reduntant, because I never liked the time module: it doesn't fit
into my brain properly, because it's not object oriented. Now, I try to
use the datetime module whenever I can, but something like strptime
isn't there. PEP 321 solves this, so I'd like time to become eventually
deprecated after something DateUtil-like inclusion as well, but it
probably won't.

Hmm, the Zen of Python is not very clear about this:

Now is better than never.
Although never is often better than *right* now.

....so there must be a difference between 'now' and 'right now' :)
Just as a note, however, we haven't had a *huge* outpouring of glee for
the current spike-tests/implementations. So it may be that we need to
get our own little army in shape before attacking the citadel :) .

Sure :)

yours,
Gerrit.
 
P

Peter Otten

[Peter Otten]
[Gerrit Holl (me)]
[Mike C. Fletcher]
Is that peaceful? I don't know. If there's a war, let's be honest,
os.path is going to take a good long while to defeat because it's there
and embedded directly into thousands upon thousands of scripts and

[Gerrit Holl]
Sure, I don't think os.path would die soon, it will surely take longer
than the string module to die. But I think there is a number of places
where Python could be more object-oriented than it is, and this is one
of them. The first step in making those modules more object-oriented is
providing a OO-alternative: the second step is deprecating the old way,
and the third step is providing only the OO-way. The third step will
surely not be made until Python 3.0.

I don't think OO is a goal in itself. In addition to the os.path functions'
ubiquity there are practical differences between a path and the general str
class.

While a string is the default that you read from files and GUI widgets, a
filename will never be. So expect to replace e. g.

os.path.exists(somestring)

with

os.filename(somestring).exists()

which is slightly less compelling than somefile.exists().

Are unicode filenames something we should care about?
Should filename really be a subclass of str? I think somepath[-1] could
return the name as well.
Should files and directories really be of the same class?

These to me all seem real questions and at that point I'm not sure whether a
filename class that looks like a light wrapper around os.path (even if you
expect os.path to be implemented in terms of filename later) is the best
possible answer.

Peter
 
G

Gerrit Holl

Peter said:
While a string is the default that you read from files and GUI widgets, a
filename will never be.

I'm not so sure about that. A GUI where a file is selected from the list
could very well return a Path object - it won't for a while, of course,
but that's a different issue. But I agree that is often isn't. Just as
an integer is not something you read from a file, etc.
So expect to replace e. g.

os.path.exists(somestring)

with

os.filename(somestring).exists()

which is slightly less compelling than somefile.exists().

I would rather read:
path(somestring).exists()

which is better than os.filename(somestring).exists() and, IMO, better
than os.path.exists(somestring). I think path should be a builtin.
Are unicode filenames something we should care about?

That's a difficult issue. I don't know how to solve that.
Should filename really be a subclass of str? I think somepath[-1] could
return the name as well.

It could. But I don't think it should. This would mean that the index of
a path returns the respective directories. Explicit is better than
implicit: somepath[-1] is not very explicit as being a basename.
Should files and directories really be of the same class?

Directories could be a subclass, with some more features. But...
These to me all seem real questions and at that point I'm not sure whether a
filename class that looks like a light wrapper around os.path (even if you
expect os.path to be implemented in terms of filename later) is the best
possible answer.

....questions exist to be answered. I don't claim to know all answers,
but I think OO-ifying os.path is a good thing. How - that's another
issue, which is PEP-worthy.
From earlier discussions, I get the impression that most people are
sympathic about OO-ifying os.path but that people don't agree in how to
do it. If we can agree on that, the only thing we need to do is
upgrading the BDFL's judgement from lukewarm to liking :)

I've written a Pre-PEP at: http://tinyurl.com/2578q
It is very unfinished but it is a rough draft. Comments welcome.

yours,
Gerrit.
 
M

Martin v. Loewis

Gerrit said:
That's a difficult issue. I don't know how to solve that.

It depends on the platform. There are:

1. platforms on which Unicode is the natural string type
for file names, with byte strings obtained by conversion
only. On these platforms, all filenames can be represented
by a Unicode string, but some file names cannot
be represented by a byte string.
Windows NT+ is the class of such systems.
2. platforms on which Unicode and byte string filenames
work equally well; they can be converted forth and
back without any loss of accuracy or expressiveness.
OS X is one such example; probably Plan 9 as well.
3. platforms on which byte strings are the natural string
type for filenames. They often have only a weak notion
of file name encoding, causing
a) not all Unicode strings being available as filenames
b) not all byte string filenames being convertible to
Unicode
c) the conversion may depend on user settings, so for
the same file, Unicode conversion may give different
results for different users.
POSIX systems fall in this category.

So if filenames where a datatype, I think they should be
able to use both Unicode strings and byte strings as their
own internal representation, and declare one of the two
as "accurate". Conversion of filenames to both Unicode
strings and byte strings should be supported, but may
fail at runtime (unless conversion into the "accurate"
type is attempted).

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,173
Latest member
GeraldReund
Top