os.path.getsize() on Windows

S

Sean DiZazzo

Hi all,

I'm seeing some behavior that is confusing me. I often use a simple
function to tell if a file is growing...ie being copied into a certain
location. (Can't process it until it's complete) My function is not
working on windows, and I'm wondering if I am missing something
simple, or if I have just never tried this before. Here's what I'm
trying to do:

def isGrowing(f, timeout):
ssize = os.path.getsize(f)
time.sleep(timeout)
esize =os.path.getsize(f)
return esize != ssize

On windows, this returns the size of the file as it _will be_, not the
size that it currently is. Is this a feature? What is the proper way
to get the current size of the file? I noticed
win32File.GetFileSize() Does that behave the way I expect?

PS. I also tried os.stat()[6]

~Sean
 
D

Duncan Booth

Sean DiZazzo said:
On windows, this returns the size of the file as it _will be_, not the
size that it currently is. Is this a feature? What is the proper way
to get the current size of the file? I noticed
win32File.GetFileSize() Does that behave the way I expect?

PS. I also tried os.stat()[6]

I think all of those will return the current size of the file, but that may
be the same as the final size: just because the data hasn't been copied
doesn't mean the file space hasn't been allocated. You don't say how you
are copying the file, but I seem to remember that Windows copy command pre-
allocates the file at its final size (so as to reduce fragmentation) and
then just copies the data after that.

If you need to make sure you don't access a file until the copy has
finished then get hwatever is doing the copy to copy it to a temporary
filename in the same folder and rename it when complete. Then you just have
to check for existence of the target file.
 
S

Sean DiZazzo

Sean DiZazzo said:
On windows, this returns the size of the file as it _will be_, not the
size that it currently is.  Is this a feature?  What is the proper way
to get the current size of the file?  I noticed
win32File.GetFileSize()  Does that behave the way I expect?
PS.  I also tried os.stat()[6]

I think all of those will return the current size of the file, but that may
be the same as the final size: just because the data hasn't been copied
doesn't mean the file space hasn't been allocated. You don't say how you
are copying the file, but I seem to remember that Windows copy command pre-
allocates the file at its final size (so as to reduce fragmentation) and
then just copies the data after that.

If you need to make sure you don't access a file until the copy has
finished then get hwatever is doing the copy to copy it to a temporary
filename in the same folder and rename it when complete. Then you just have
to check for existence of the target file.

Hmmm... The file could be copied in by several different sources of
which I have no control. I can't use your technique in my situation.
I also tried getting md5 hashes with some time in between on advice,
but the file is not opened for reading until the copy completes so I
can't get the hashes.

Any other ideas?
 
J

John Machin

Sean DiZazzo said:
On windows, this returns the size of the file as it _will be_, not the
size that it currently is. Is this a feature? What is the proper way
to get the current size of the file? I noticed
win32File.GetFileSize() Does that behave the way I expect?
PS. I also tried os.stat()[6]
I think all of those will return the current size of the file, but that may
be the same as the final size: just because the data hasn't been copied
doesn't mean the file space hasn't been allocated. You don't say how you
are copying the file, but I seem to remember that Windows copy command pre-
allocates the file at its final size (so as to reduce fragmentation) and
then just copies the data after that.
If you need to make sure you don't access a file until the copy has
finished then get hwatever is doing the copy to copy it to a temporary
filename in the same folder and rename it when complete. Then you just have
to check for existence of the target file.

Hmmm... The file could be copied in by several different sources of
which I have no control. I can't use your technique in my situation.
I also tried getting md5 hashes with some time in between on advice,
but the file is not opened for reading until the copy completes so I
can't get the hashes.

Any other ideas?

Why not try to open the file for exclusive write/update access?
 
S

Steven D'Aprano

I'm seeing some behavior that is confusing me. I often use a simple
function to tell if a file is growing...ie being copied into a certain
location. (Can't process it until it's complete)

Surely though, under Windows, while something else is writing to the file
you can't open it? So instead of this:


def wait_for_open(pathname):
"""Return file open for reading, or fail.
If the file is busy, will wait forever.
"""
import time
while isGrowing(path, 0.2): # defined elsewhere by you
time.sleep(1) # wait a bit
# now we HOPE we can read the file
return open(path, 'r') # this can fail in many, many ways



do this:


def wait_for_open(pathname):
"""Return file open for reading, or fail.
If the file is busy, will wait forever.
"""
import time, errno
while True:
try:
return open(path, 'r')
except IOError, e:
if e.errno == errno.EBUSY:
time.sleep(1)
else:
raise


Note: I've made a guess that the error you get under Windows is
errno.EBUSY. You'll need to check that for yourself. This whole approach
assumes that Windows does the sensible thing of returning a unique error
code when you try to open a file for reading that is already open for
writing.
 
D

Duncan Booth

Steven D'Aprano said:
This whole approach
assumes that Windows does the sensible thing of returning a unique error
code when you try to open a file for reading that is already open for
writing.

So how would you use a file to share data then?

By default Python on Windows allows you to open a file for reading
unless you specify a sharing mode which prevents it: the easiest way is
probably to call win32file.CreateFile with appropriate parameters.

In one window:
and then while that other window is open:
.... win32file.GENERIC_WRITE,
.... 0, # i.e. "not shared" is the default
.... None,
.... win32file.OPEN_ALWAYS,
.... win32file.FILE_ATTRIBUTE_NORMAL,
.... None)
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
pywintypes.error: (32, 'CreateFile', 'The process cannot access the file
because it is being used by another process.')'hello'

The CreateFile call was copied from
http://mail.python.org/pipermail/python-list/2002-January/122462.html
 
S

Steven D'Aprano

So how would you use a file to share data then?


I see I was a little unclear.

What I meant to say was that I assumed that Windows returned a specific
error code of "file is busy" as opposed to "you don't have permission to
access this file right now" without specifying whether this is a
permanent permissions error or a temporary file busy error.


By default Python on Windows allows you to open a file for reading
unless you specify a sharing mode which prevents it:

But the OP is talking about another process having opened the file for
WRITING, not reading. It's that other process that has exclusive access,
and the OP was trying to determine when it was safe to attempt opening
the file according to whether or not it was still growing.
 
D

Duncan Booth

Steven D'Aprano said:
But the OP is talking about another process having opened the file for
WRITING, not reading. It's that other process that has exclusive access,
and the OP was trying to determine when it was safe to attempt opening
the file according to whether or not it was still growing.
No, unless the other process has specified that it wants exclusive access
there is nothing stopping his process also opening the file. That's why he
has to specify when he opens it that he wants exclusive access: then it
doesn't matter what the other process does, he won't be able to open it
until the other process has closed the file.

This all of course assumes that the other process writes the file in one
single atomic chunk. If it were to create it and then separately open and
write to it then all bets are off.
 
S

Sean DiZazzo

No, unless the other process has specified that it wants exclusive access
there is nothing stopping his process also opening the file. That's why he
has to specify when he opens it that he wants exclusive access: then it
doesn't matter what the other process does, he won't be able to open it
until the other process has closed the file.

This all of course assumes that the other process writes the file in one
single atomic chunk. If it were to create it and then separately open and
write to it then all bets are off.

Thanks for your input.

After trying again this morning, the file is opened for reading. I
must have had some wonky permissions on that file, so the error method
won't work. Trying to use the md5 technique won't work here either.
It takes quite awhile to run one md5, let alone two on a growing
file. These files can be 20-50GB.

The overall idea is to be able to tell if a file has finished being
placed in a directory without any control over what is putting it
there. If I'm in control of the process, I know I can put it in a
temp area, etc. I use the method I mention in my original post
regularly without knowing how the file gets there, and was surprised
to see it didn't work on Windows.

In this case, there will be so few people touching the system, that I
think I can get away with having the copy be done from Unix, but it
would be nice to have a general way of knowing this on Windows.

~Sean
 
D

Duncan Booth

Sean DiZazzo said:
In this case, there will be so few people touching the system, that I
think I can get away with having the copy be done from Unix, but it
would be nice to have a general way of knowing this on Windows.

Doesn't the CreateFile call I posted earlier do what you want?
 
S

Steven D'Aprano

No, unless the other process has specified that it wants exclusive
access there is nothing stopping his process also opening the file.
That's why he has to specify when he opens it that he wants exclusive
access: then it doesn't matter what the other process does, he won't be
able to open it until the other process has closed the file.


I think you're confused. Or possibly I'm confused. Or both.

It seems to me that you're assuming that the OP has opened the file for
reading first, and *then* another process comes along and wants to open
it for writing. That's not how I read his post: he's trying to open a
file for reading while it is already being written to by another process.
Asking for exclusive access when reading isn't going to make any
difference, because the other process has already opened the file for
writing.

I suppose it is conceivable that the other process might have opened the
file for non-exclusive writing, assuming that such a thing is even
possible, but how likely is that?


This all of course assumes that the other process writes the file in one
single atomic chunk. If it were to create it and then separately open
and write to it then all bets are off.

The OP is repeatedly polling the file to see when the size stops
increasing. Obviously a single atomic write is *not* taking place.
 
S

Steven D'Aprano

After trying again this morning, the file is opened for reading. I must
have had some wonky permissions on that file, so the error method won't
work.

Then fix the permissions.
 
T

Tim Roberts

Sean DiZazzo said:
The overall idea is to be able to tell if a file has finished being
placed in a directory without any control over what is putting it
there.

There is simply no way to do this on Windows that works in the general
case.
 
D

Dennis Lee Bieber

The OP is repeatedly polling the file to see when the size stops
increasing. Obviously a single atomic write is *not* taking place.

Even worse... I've had times when attempting to open file (via
Notepad, etc.) will fail... But doing a "cut&paste" of the file, and
opening the "paste" copy WILL succeed.

Apparently Notepad wants write access, but a cut and paste uses
read-only to copy whatever is in the file at that instant...
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
D

Duncan Booth

Steven D'Aprano said:
I think you're confused. Or possibly I'm confused. Or both.

I think it is you, but then I could be wrong.
It seems to me that you're assuming that the OP has opened the file
for reading first, and *then* another process comes along and wants to
open it for writing. That's not how I read his post: he's trying to
open a file for reading while it is already being written to by
another process. Asking for exclusive access when reading isn't going
to make any difference, because the other process has already opened
the file for writing.

I'm assuming the other process has already opened the file for writing. In
that case either it has asked for exclusive access so any attempt to open
it for reading will fail, or it hasn't in which case Python's default
'open' will succeed but opening it for exclusive access will fail.

Asking for exclusive access when opening will fail if another process
already has the file open for reading or writing.
I suppose it is conceivable that the other process might have opened
the file for non-exclusive writing, assuming that such a thing is even
possible, but how likely is that?

The usual situation is that the file is opened for writing but permits
reading while it is being written. Then opening it to read will succeed
unless you ask for exclusive access.

BTW, I did test the 'CreateFile' code I posted: I opened the file for
writing in one Python interpreter, just using open('...', 'w') wrote to it,
and called flush but didn't close it. Then in another interpreter I checked
that the CreateFile call threw an exception but open('...', 'r') succeeded
and I was able to read what had been written. After I closed the file in
the original interpreter the CreateFile call completed successfully.

Try this:
Session 1:
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.0,None,win32file.OPEN_ALWAYS,win32file.FILE_ATTRIBUTE_NORMAL,None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pywintypes.error: (32, 'CreateFile', 'The process cannot access the file
because it is being used by another process.')0,None,win32file.OPEN_ALWAYS,win32file.FILE_ATTRIBUTE_NORMAL,None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pywintypes.error: (32, 'CreateFile', 'The process cannot access the file
because it is being used by another process.')
File open for writing in session 1: CreateFile throws an exception, open
succeeds.
File open for reading in session 1: CreateFile still throws an exception.
File closed in session 1: CreateFile succeeds.
 
M

Martin v. Löwis

def isGrowing(f, timeout):
ssize = os.path.getsize(f)
time.sleep(timeout)
esize =os.path.getsize(f)
return esize != ssize

On windows, this returns the size of the file as it _will be_, not the
size that it currently is.

Why do you say that? It most definitely returns what the size currently
is, not what it will be in the future (how could it know, anyway).

Regards,
Martin
 
P

Paul M¢Nett

Martin said:
Why do you say that? It most definitely returns what the size currently
is, not what it will be in the future (how could it know, anyway).

I've seen this before, when copying a file in Windows. Windows reports
the size the file will be after the copy is complete (it knows, after
all, the size of the source file). I always thought this meant that
Windows is just much smarter than me, so I ignored it.

Paul
 
M

Martin v. Löwis

Why do you say that? It most definitely returns what the size currently
I've seen this before, when copying a file in Windows. Windows reports
the size the file will be after the copy is complete (it knows, after
all, the size of the source file). I always thought this meant that
Windows is just much smarter than me, so I ignored it.

No, I really think the target file has its size right from the
beginning.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top