shutil.copyfile is incomplete (truncated)

R

Rob Schneider

Using Python 2.7.2 on OSX, I have created a file in temp space, then use the function "shutil.copyfile(fn,loc+fname)" from "fn" to "loc+fname".

At the destination location, the file is truncated. About 10% of the file is lost. Original file is unchanged.

I added calls to "statinfo" immediately after the copy, and all looks ok (correct file size).

filecmp.cmp(fn,loc+fname)
print "Statinfo :"+fn+":\n", os.stat(fn)
print "Statinfo :"+loc+fname+":\n", os.stat(loc+fname)

But when I look at the file in Finder, destination is smaller and even looking at the file (with text editor) file is truncated.

What could be causing this?
 
N

Neil Cerutti

Using Python 2.7.2 on OSX, I have created a file in temp space,
then use the function "shutil.copyfile(fn,loc+fname)" from "fn"
to "loc+fname".

At the destination location, the file is truncated. About 10%
of the file is lost. Original file is unchanged.

I added calls to "statinfo" immediately after the copy, and all
looks ok (correct file size).

filecmp.cmp(fn,loc+fname)
print "Statinfo :"+fn+":\n", os.stat(fn)
print "Statinfo :"+loc+fname+":\n", os.stat(loc+fname)

But when I look at the file in Finder, destination is smaller
and even looking at the file (with text editor) file is
truncated.

What could be causing this?

Could fn be getting some changes written after the copy is made?

Is the file flushed/closed before you copy it?
 
R

Rob Schneider

Thanks. Yes, there is a close function call before the copy is launched. No other writes.
Does Python wait for file close command to complete before proceeding?
 
N

Neil Cerutti

Thanks. Yes, there is a close function call before the copy is
launched. No other writes. Does Python wait for file close
command to complete before proceeding?

The close method is defined and flushing and closing a file, so
it should not return until that's done.

What command are you using to create the temp file?
 
S

Steven D'Aprano

The close method is defined and flushing and closing a file, so it
should not return until that's done.

But note that "done" in this case means "the file system thinks it is
done", not *actually* done. Hard drives, especially the cheaper ones,
lie. They can say the file is written when in fact the data is still in
the hard drive's internal cache and not written to the disk platter.
Also, in my experience, hardware RAID controllers will eat your data, and
then your brains when you try to diagnose the problem.

I would consider the chance that the disk may be faulty, or the file
system is corrupt. Does the problem go away if you write to a different
file system or a different disk?
 
C

Cameron Simpson

| On Thu, 11 Apr 2013 19:55:53 +0000, Neil Cerutti wrote:
| >> Thanks. Yes, there is a close function call before the copy is
| >> launched. No other writes. Does Python wait for file close command to
| >> complete before proceeding?
| >
| > The close method is defined and flushing and closing a file, so it
| > should not return until that's done.
|
| But note that "done" in this case means "the file system thinks it is
| done", not *actually* done.

Unless there's a reboot (or crash) in between, the view from the
app should be consistent and correct.

| Hard drives, especially the cheaper ones,
| lie. They can say the file is written when in fact the data is still in
| the hard drive's internal cache and not written to the disk platter.
| Also, in my experience, hardware RAID controllers will eat your data, and
| then your brains when you try to diagnose the problem.
|
| I would consider the chance that the disk may be faulty, or the file
| system is corrupt. Does the problem go away if you write to a different
| file system or a different disk?

Or that the filesystem may be full? Of course, that's usually obvious
more widely when it happens...

Question: is the size of the incomplete file a round number? (Like
a multiple of a decent sized power of 2>)

Cheers,
 
N

Ned Deily

Or that the filesystem may be full? Of course, that's usually obvious
more widely when it happens...

Question: is the size of the incomplete file a round number? (Like
a multiple of a decent sized power of 2>)

Also on what OS X file system type does the file being created reside,
in particular, is it a network file system?
 
R

Rob Schneider

The close method is defined and flushing and closing a file, so

it should not return until that's done.



What command are you using to create the temp file?

re command to write the file:
f=open(fn,'w')
.... then create HTML text in a string
f.write(html)
f.close
 
R

Rob Schneider

I would consider the chance that the disk may be faulty, or the file
system is corrupt. Does the problem go away if you write to a different

file system or a different disk?

It's a relatively new MacBook Pro with a solid state disk. I've not noticed any other disk problems. I did a "repair permissions" (for what it's worth). Maybe I'll have it tested at the Genius Bar. I don't have the full system on another computer to try that; but will work on that today.
 
R

Rob Schneider

Also on what OS X file system type does the file being created reside,

in particular, is it a network file system?

File system not full (2/3 of disk is free)

Source (correct one) is 47,970 bytes. Target after copy of 45,056 bytes. I've tried changing what gets written to change the file size. It is usually this sort of difference.

The file system is Mac OS Extended Journaled (default as out of the box).
 
R

Rob Schneider

Also on what OS X file system type does the file being created reside,

in particular, is it a network file system?

File system not full (2/3 of disk is free)

Source (correct one) is 47,970 bytes. Target after copy of 45,056 bytes. I've tried changing what gets written to change the file size. It is usually this sort of difference.

The file system is Mac OS Extended Journaled (default as out of the box).
 
R

Rob Schneider

The file system is Mac OS Extended Journaled (default as out of the box).

I ran a repair disk .. .while it found and fixed what it called "minor" problems, it did something. However, the repair did not fix the problem. I just ran the program again and the source is 47,970 bytes and target after copy if 45,056.

Interestingly, the test I run just after the copy , i run a file compare:

code:

if showproperties:
print "Filecompare :",filecmp.cmp(fn,loc+fname)
print "Statinfo :"+fn+":\n", os.stat(fn)
print "Statinfo :"+loc+fname+":\n", os.stat(loc+fname)

results:

Filecompare : True
Statinfo :/var/folders/p_/n5lktj2n0r938_46jyqb52g40000gn/T/speakers.htm:
posix.stat_result(st_mode=33188, st_ino=32205850, st_dev=16777218L, st_nlink=1, st_uid=501, st_gid=20, st_size=45056, st_atime=1365749178, st_mtime=1365749178, st_ctime=1365749178)
Statinfo :/Users/rmschne/Documents/ScottishOilClub/SOC Board Doc Sharing Folder/Meetings/speakers.htm:
posix.stat_result(st_mode=33188, st_ino=32144179, st_dev=16777218L, st_nlink=1, st_uid=501, st_gid=20, st_size=45056, st_atime=1365749178, st_mtime=1365749178, st_ctime=1365749178)

It shows file size 45,056 on both source and target, which is the file size of the flawed target, and is not what Finder shows for source.

Sigh.
 
R

Rob Schneider

The file system is Mac OS Extended Journaled (default as out of the box).

I ran a repair disk .. .while it found and fixed what it called "minor" problems, it did something. However, the repair did not fix the problem. I just ran the program again and the source is 47,970 bytes and target after copy if 45,056.

Interestingly, the test I run just after the copy , i run a file compare:

code:

if showproperties:
print "Filecompare :",filecmp.cmp(fn,loc+fname)
print "Statinfo :"+fn+":\n", os.stat(fn)
print "Statinfo :"+loc+fname+":\n", os.stat(loc+fname)

results:

Filecompare : True
Statinfo :/var/folders/p_/n5lktj2n0r938_46jyqb52g40000gn/T/speakers.htm:
posix.stat_result(st_mode=33188, st_ino=32205850, st_dev=16777218L, st_nlink=1, st_uid=501, st_gid=20, st_size=45056, st_atime=1365749178, st_mtime=1365749178, st_ctime=1365749178)
Statinfo :/Users/rmschne/Documents/ScottishOilClub/SOC Board Doc Sharing Folder/Meetings/speakers.htm:
posix.stat_result(st_mode=33188, st_ino=32144179, st_dev=16777218L, st_nlink=1, st_uid=501, st_gid=20, st_size=45056, st_atime=1365749178, st_mtime=1365749178, st_ctime=1365749178)

It shows file size 45,056 on both source and target, which is the file size of the flawed target, and is not what Finder shows for source.

Sigh.
 
C

Chris Angelico

re command to write the file:
f=open(fn,'w')
... then create HTML text in a string
f.write(html)
f.close

Hold it one moment... You're not actually calling close. The file's
still open. Is that a copy/paste problem, or is that your actual code?

In Python, a function call ALWAYS has parentheses after it. Evaluating
a function's name like that returns the function (or method) object,
which you then do nothing with. (You could assign it someplace, for
instance, and call it later.) Try adding empty parens:

f.close()

and see if that solves the problem. Alternatively, look into the
'with' statement and the block syntax that it can give to I/O
operations.

ChrisA
 
N

Ned Deily

It shows file size 45,056 on both source and target, which is the file size
of the flawed target, and is not what Finder shows for source.

Perhaps the source file has an OS X resource fork or other extended
attribute metadata. shutil's copy functions won't handle those. One
way to see if that is the case is to examine the source file in a
terminal window with: ls -l@

$ ls -l@ test.jpg
-rw-r--r--@ 1 nad staff 40359 Jul 15 2009 test.jpg
com.apple.FinderInfo 32
com.apple.ResourceFork 899489
 
C

Cameron Simpson

| > > Question: is the size of the incomplete file a round number? (Like
| > > a multiple of a decent sized power of 2>)
[...]
| Source (correct one) is 47,970 bytes. Target after copy of 45,056
| bytes. I've tried changing what gets written to change the file
| size. It is usually this sort of difference.

45046 is exactly 11 * 4096. I'd say your I/O is using 4KB blocks,
and the last partial block (to make it up to 47970) didn't get
written (at the OS level).

Earlier you wrote:
| I have created a file in temp space, then use the function
| "shutil.copyfile(fn,loc+fname)" from "fn" to "loc+fname".
and:
| Yes, there is a close function call before the copy is launched. No other writes.
| Does Python wait for file close command to complete before proceeding?

Please show us the exact code used to make the temp file.

I would guess the temp file has not been closed (or flushed) before
the call to copyfile.

If you're copying data to a tempfile, it will only have complete
buffers (i.e. multiples of 4096 bytes) in it until the final flush
or close.

So I'm imagining something like:

tfp = open(tempfilename, "w")
... lots of tfp.write() ...
shutil.copyfile(tempfilename, newfilename)

Note above no flush or close of tfp. So the final incomplete I/O
buffer is still in Python's memory; it hasn't been actually written
to the temp file because the buffer has not been filled, and the file
has not been closed.

Anyway, can you show us the relevant bits of code involved?

Cheers,
 
R

Rob Schneider

| > > Question: is the size of the incomplete file a round number? (Like

| > > a multiple of a decent sized power of 2>)

[...]

| Source (correct one) is 47,970 bytes. Target after copy of 45,056

| bytes. I've tried changing what gets written to change the file

| size. It is usually this sort of difference.



45046 is exactly 11 * 4096. I'd say your I/O is using 4KB blocks,

and the last partial block (to make it up to 47970) didn't get

written (at the OS level).



Earlier you wrote:

| I have created a file in temp space, then use the function

| "shutil.copyfile(fn,loc+fname)" from "fn" to "loc+fname".

and:

| Yes, there is a close function call before the copy is launched. No other writes.

| Does Python wait for file close command to complete before proceeding?



Please show us the exact code used to make the temp file.



I would guess the temp file has not been closed (or flushed) before

the call to copyfile.



If you're copying data to a tempfile, it will only have complete

buffers (i.e. multiples of 4096 bytes) in it until the final flush

or close.



So I'm imagining something like:



tfp = open(tempfilename, "w")

... lots of tfp.write() ...

shutil.copyfile(tempfilename, newfilename)



Note above no flush or close of tfp. So the final incomplete I/O

buffer is still in Python's memory; it hasn't been actually written

to the temp file because the buffer has not been filled, and the file

has not been closed.



Anyway, can you show us the relevant bits of code involved?



Cheers,

--

Cameron Simpson <[email protected]>



Processes are like potatoes. - NCR device driver manual

Thanks for the observation.

Code (simplified but results in same flaw) (which a close, far as I can tell).

def CreateSpeakerList1():
import shutil
import filecmp
import os.path

t=get_template('speaker_list.html')
fn=TEMP_DIR+SOC_SPEAKER_LIST
fn=tempfile.gettempdir()+"/"+SOC_SPEAKER_LIST
f=open(fn,'w')
speaker_list=Speaker.objects.order_by('status__order','targetmtg__date')
print " Creating " + SOC_SPEAKER_LIST + " ..."
html=(smart_str(t.render(Context(
{
'css_include_file':CSS_INCLUDE_FILE,
'css_link':False,
'title': ORG_NAME+" Speaker List",
'speaker_list': speaker_list,
}))))
f.write(html)
f.close
print " Wrote "+fn
shutil.copyfile(fn,SOC_GENERAL_OUTPUT_FOLDER+SOC_SPEAKER_LIST)
print "Filecompare :",filecmp.cmp(fn,SOC_GENERAL_OUTPUT_FOLDER+SOC_SPEAKER_LIST)
print "Statinfo :"+fn+":\n", os.stat(fn)
print "Statinfo :"+SOC_GENERAL_OUTPUT_FOLDER+SOC_SPEAKER_LIST+"\n", os.stat(SOC_GENERAL_OUTPUT_FOLDER+SOC_SPEAKER_LIST)
return

Output on latest run:

Creating speakers.htm ...
Wrote /var/folders/p_/n5lktj2n0r938_46jyqb52g40000gn/T/speakers.htm
Filecompare : True
Statinfo :/var/folders/p_/n5lktj2n0r938_46jyqb52g40000gn/T/speakers.htm:
posix.stat_result(st_mode=33188, st_ino=32332374, st_dev=16777218L, st_nlink=1, st_uid=501, st_gid=20, st_size=45056, st_atime=1365758139, st_mtime=1365758139, st_ctime=1365758139)
Statinfo :/Users/rmschne/Documents/ScottishOilClub/Output/speakers.htm
posix.stat_result(st_mode=33188, st_ino=32143886, st_dev=16777218L, st_nlink=1, st_uid=501, st_gid=20, st_size=45056, st_atime=1365758029, st_mtime=1365758139, st_ctime=1365758139)
 
R

Rob Schneider

| > > Question: is the size of the incomplete file a round number? (Like

| > > a multiple of a decent sized power of 2>)

[...]

| Source (correct one) is 47,970 bytes. Target after copy of 45,056

| bytes. I've tried changing what gets written to change the file

| size. It is usually this sort of difference.



45046 is exactly 11 * 4096. I'd say your I/O is using 4KB blocks,

and the last partial block (to make it up to 47970) didn't get

written (at the OS level).



Earlier you wrote:

| I have created a file in temp space, then use the function

| "shutil.copyfile(fn,loc+fname)" from "fn" to "loc+fname".

and:

| Yes, there is a close function call before the copy is launched. No other writes.

| Does Python wait for file close command to complete before proceeding?



Please show us the exact code used to make the temp file.



I would guess the temp file has not been closed (or flushed) before

the call to copyfile.



If you're copying data to a tempfile, it will only have complete

buffers (i.e. multiples of 4096 bytes) in it until the final flush

or close.



So I'm imagining something like:



tfp = open(tempfilename, "w")

... lots of tfp.write() ...

shutil.copyfile(tempfilename, newfilename)



Note above no flush or close of tfp. So the final incomplete I/O

buffer is still in Python's memory; it hasn't been actually written

to the temp file because the buffer has not been filled, and the file

has not been closed.



Anyway, can you show us the relevant bits of code involved?



Cheers,

--

Cameron Simpson <[email protected]>



Processes are like potatoes. - NCR device driver manual

Thanks for the observation.

Code (simplified but results in same flaw) (which a close, far as I can tell).

def CreateSpeakerList1():
import shutil
import filecmp
import os.path

t=get_template('speaker_list.html')
fn=TEMP_DIR+SOC_SPEAKER_LIST
fn=tempfile.gettempdir()+"/"+SOC_SPEAKER_LIST
f=open(fn,'w')
speaker_list=Speaker.objects.order_by('status__order','targetmtg__date')
print " Creating " + SOC_SPEAKER_LIST + " ..."
html=(smart_str(t.render(Context(
{
'css_include_file':CSS_INCLUDE_FILE,
'css_link':False,
'title': ORG_NAME+" Speaker List",
'speaker_list': speaker_list,
}))))
f.write(html)
f.close
print " Wrote "+fn
shutil.copyfile(fn,SOC_GENERAL_OUTPUT_FOLDER+SOC_SPEAKER_LIST)
print "Filecompare :",filecmp.cmp(fn,SOC_GENERAL_OUTPUT_FOLDER+SOC_SPEAKER_LIST)
print "Statinfo :"+fn+":\n", os.stat(fn)
print "Statinfo :"+SOC_GENERAL_OUTPUT_FOLDER+SOC_SPEAKER_LIST+"\n", os.stat(SOC_GENERAL_OUTPUT_FOLDER+SOC_SPEAKER_LIST)
return

Output on latest run:

Creating speakers.htm ...
Wrote /var/folders/p_/n5lktj2n0r938_46jyqb52g40000gn/T/speakers.htm
Filecompare : True
Statinfo :/var/folders/p_/n5lktj2n0r938_46jyqb52g40000gn/T/speakers.htm:
posix.stat_result(st_mode=33188, st_ino=32332374, st_dev=16777218L, st_nlink=1, st_uid=501, st_gid=20, st_size=45056, st_atime=1365758139, st_mtime=1365758139, st_ctime=1365758139)
Statinfo :/Users/rmschne/Documents/ScottishOilClub/Output/speakers.htm
posix.stat_result(st_mode=33188, st_ino=32143886, st_dev=16777218L, st_nlink=1, st_uid=501, st_gid=20, st_size=45056, st_atime=1365758029, st_mtime=1365758139, st_ctime=1365758139)
 
R

Rob Schneider

Yep, there's the problem! See my previous post for details. Change this to:



f.close()



and you should be sorted.



ChrisA

Slapping forehead ... hard. Thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top