Safely modify a file in place -- am I doing it right?

  • Thread starter steve+comp.lang.python
  • Start date
S

steve+comp.lang.python

I have a script running under Python 2.5 that needs to modify files in
place. I want to do this with some level of assurance that I won't lose
data. E.g. this is not safe:

def unsafe_modify(filename):
fp = open(filename, 'r')
data = modify(fp.read())
fp.close()
fp = open(filename, 'w') # <== original data lost here
fp.write(fp)
fp.close() # <== new data not saved until here

If something goes wrong writing the new data, I've lost the previous
contents.

I have come up with this approach:

import os, tempfile
def safe_modify(filename):
fp = open(filename, 'r')
data = modify(fp.read())
fp.close()
# Use a temporary file.
loc = os.path.dirname(filename)
fd, tmpname = tempfile.mkstemp(dir=loc, text=True)
# In my real code, I need a proper Python file object,
# not just a file descriptor.
outfile = os.fdopen(fd, 'w')
outfile.write(data)
outfile.close()
# Move the temp file over the original.
os.rename(tmpname, filename)

os.rename is an atomic operation, at least under Linux and Mac, so if the
move fails, the original file should be untouched.

This seems to work for me, but is this the right way to do it? Is there a
better/safer way?
 
G

Grant Edwards

I have a script running under Python 2.5 that needs to modify files in
place. I want to do this with some level of assurance that I won't lose
data. E.g. this is not safe:

def unsafe_modify(filename):
fp = open(filename, 'r')
data = modify(fp.read())
fp.close()
fp = open(filename, 'w') # <== original data lost here
fp.write(fp)
fp.close() # <== new data not saved until here

If something goes wrong writing the new data, I've lost the previous
contents.

I have come up with this approach:

import os, tempfile
def safe_modify(filename):
fp = open(filename, 'r')
data = modify(fp.read())
fp.close()
# Use a temporary file.
loc = os.path.dirname(filename)
fd, tmpname = tempfile.mkstemp(dir=loc, text=True)
# In my real code, I need a proper Python file object,
# not just a file descriptor.
outfile = os.fdopen(fd, 'w')
outfile.write(data)
outfile.close()
# Move the temp file over the original.
os.rename(tmpname, filename)

os.rename is an atomic operation, at least under Linux and Mac, so if
the move fails, the original file should be untouched.

This seems to work for me, but is this the right way to do it?

That's how Unix programs have modified files "in place" since time
immemorial.
Is there a better/safer way?

Many programs rename the original file with a "backup" suffix (a tilde
is popular).
 
C

Chris Torek

I have a script running under Python 2.5 that needs to modify files in
place. I want to do this with some level of assurance that I won't lose
data. ... I have come up with this approach:

[create temp file in suitable directory, write new data, and
use os.rename() to atomically swap out the old file for the
new]

As Grant Edwards said, this is the right general idea. There
are lots of variations. If you want to make the original
be a backup, the sequence:

os.link(original_name, backup_name)
os.rename(new_synced_file, original_name)

should generally do the trick (rename will unlink the target
which means that the backup name will refer to the original
inode).
import os, tempfile
def safe_modify(filename):
fp = open(filename, 'r')
data = modify(fp.read())
fp.close()
# Use a temporary file.
loc = os.path.dirname(filename)
fd, tmpname = tempfile.mkstemp(dir=loc, text=True)
# In my real code, I need a proper Python file object,
# not just a file descriptor.
outfile = os.fdopen(fd, 'w')
outfile.write(data)
outfile.close()

It is a good idea to use outfile.flush() and then os.fsync() before
doing the close, as well. Among other things, this *usually* gets
you some kind of notice-of-failure in the case of deferred writes
across a network (e.g., NFS). (While it would be nice for os.close()
to deliver failure notices, in practice the fsync() is at least
sometimes required. This is the OS's fault, not Python's. :) )
# Move the temp file over the original.
os.rename(tmpname, filename)

os.rename is an atomic operation, at least under Linux and Mac,
so if the move fails, the original file should be untouched.

This seems to work for me, but is this the right way to do it?
Is there a better/safer way?

For additional checking and cleanup purposes, you may want to catch
exceptions and delete the temporary file if the rename has not yet
been done (and therefore the original file is still intact).

You will likely also need to fiddle with the permission bits
on the file resulting from the mkstemp() call (to make them
match those on the original file). Alternatively, you may want
to build your own mkstemp() (this can be a bit of a challenge!).

Finally, as I implied above in talking about the os.link()-then-
os.rename() sequence, if the original file has multiple links to
it, note that this "breaks the links". If this is not what you
want, the problem has no fully general solution (but there are
various application-specific solutions).
 
S

Steven D'Aprano

I have a script running under Python 2.5 that needs to modify files in
place. I want to do this with some level of assurance that I won't lose
data. E.g. this is not safe:
[snip]

Thanks to all who replied, your comments were helpful.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top