Save to a file, but avoid overwriting an existing file

zoom · Mar 12, 2014

Hi!

I would like to assure that when writing to a file I do not overwrite an
existing file, but I'm unsure which is the best way to approach to this
problem. As I can see, there are at least two possibilities:

1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
which will fail - if the file exists. However, I would prefer if the
program would try to save under different name in this case, instead of
discarding all the calculation done until now - but I' not too well with
catching exceptions.

2. Alternatively, a unique string could be generated to assure that no
same file exists. I can see one approach to this is to include date and
time in the file name. But this seems to me a bit clumsy, and is not
unique, i.e. it could happen (at least in theory) that two processes
finish in the same second.

Any suggestions, please?

Skip Montanaro · Mar 12, 2014

This seems to be an application-level decision. If so, in your
application, why not just check to see if the file exists, and
implement whatever workaround you deem correct for your needs? For
example (to choose a simple, but rather silly, file naming strategy):

fname = "x"
while os.path.exists(fname):
fname = "%s.%f" % (fname, random.random())
fd = open(fname, "w")

It's clearly not going to be safe from race conditions, but I leave
solving that problem as an exercise for the reader.

Skip

Tim Chase · Mar 12, 2014

2. Alternatively, a unique string could be generated to assure that
no same file exists. I can see one approach to this is to include
date and time in the file name. But this seems to me a bit clumsy,
and is not unique, i.e. it could happen (at least in theory) that
two processes finish in the same second.

Python offers a "tempfile" module that gives this (and a whole lot
more) to you out of the box.

-tkc

Dave Angel · Mar 12, 2014

zoom said:
Hi!

I would like to assure that when writing to a file I do not overwrite an
existing file, but I'm unsure which is the best way to approach to this
problem. As I can see, there are at least two possibilities:

1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
which will fail - if the file exists. However, I would prefer if the
program would try to save under different name in this case, instead of
discarding all the calculation done until now - but I' not too well with
catching exceptions.

The tempfile module is your best answer, but if you really need
to keep the file afterwards, you'll have the same problem when
you rename it later.

I suggest you learn about try/except. For simple cases it's not
that tough, though if you want to ask about it, you'll need to
specify your Python version.

Emile van Sebille · Mar 12, 2014

2. Alternatively, a unique string could be generated to assure that no
same file exists. I can see one approach to this is to include date and
time in the file name. But this seems to me a bit clumsy, and is not
unique, i.e. it could happen (at least in theory) that two processes
finish in the same second.

I tend to use this method -- prepending the job name or targeting
different directories per job precludes duplication. Unless you're
running the same job at the same time, in which case tempfile is the way
to go (which I use for archiving spooled print files which can occur
simultaneously.)

Emile

Cameron Simpson · Mar 12, 2014

I would like to assure that when writing to a file I do not
overwrite an existing file, but I'm unsure which is the best way to
approach to this problem. As I can see, there are at least two
possibilities:

1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
which will fail - if the file exists. However, I would prefer if the
program would try to save under different name in this case, instead
of discarding all the calculation done until now - but I' not too
well with catching exceptions.

Others have menthions tempfile, though of course you have the same collision
issue when you come to rename the temp file if you are keeping it.

I would run with option 1 for your task.

Just iterate until os.open succeeds.

However, you need to distinuish _why_ an open fails. For example,
if you were trying to make files in a directory to which you do not
have write permission, or just a directory that did not exist,
os.open would fail not matter what name you used, so your loop would
run forever.

Therefore you need to continue _only_ if you get EEXIST. Otherwise abort.

So you'd have some code like this (totally untested):

# at top of script
import errno

# where you make the file
def open_new(primary_name):
try:
fd = os.open(primary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except OSError as e:
if e.errno != errno.EEXIST:
raise
else:
return primary_name, fd
n = 1
while True:
secondary_name = "%s.%d" % (primary_name, n)
try:
fd = os.open(secondary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except OSError as e:
if e.errno != errno.EEXIST:
raise
else:
return secondary_name, fd
n += 1

# where you need the file
path, fd = open_new("x")

That gets you a function your can reuse which returns the file's
name and the file descriptor.

Cheers,
--
Cameron Simpson <[email protected]>

Reason #173 to fear technology:

o o o o o o <o <o>
^|\ ^|^ v|^ v|v |/v |X| \| |
/\ >\ /< >\ /< >\ /< >\

o> o o o o o o o
\ x </ <|> </> <\> <)> |\
/< >\ /< >\ /< >\ >> L

Mr. email does the Macarena.

Mark Lawrence · Mar 12, 2014

Others have menthions tempfile, though of course you have the same collision
issue when you come to rename the temp file if you are keeping it.

I would run with option 1 for your task.

Just iterate until os.open succeeds.

However, you need to distinuish _why_ an open fails. For example,
if you were trying to make files in a directory to which you do not
have write permission, or just a directory that did not exist,
os.open would fail not matter what name you used, so your loop would
run forever.

Therefore you need to continue _only_ if you get EEXIST. Otherwise abort.

So you'd have some code like this (totally untested):

# at top of script
import errno

# where you make the file
def open_new(primary_name):
try:
fd = os.open(primary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except OSError as e:
if e.errno != errno.EEXIST:
raise
else:
return primary_name, fd
n = 1
while True:
secondary_name = "%s.%d" % (primary_name, n)
try:
fd = os.open(secondary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except OSError as e:
if e.errno != errno.EEXIST:
raise
else:
return secondary_name, fd
n += 1

# where you need the file
path, fd = open_new("x")

That gets you a function your can reuse which returns the file's
name and the file descriptor.

Cheers,

I haven't looked but would things be easier if the new exception
hierarchy were used
http://docs.python.org/3.3/whatsnew/3.3.html#pep-3151-reworking-the-os-and-io-exception-hierarchy
?

How to save textBox values into a xml-file(with naming an choosing directory)?	1	Aug 23, 2022
How to subclass file	1	May 15, 2008
How to add a current string into an already existing list	40	Nov 2, 2013
redirection in a file with os.system	2	Nov 3, 2008
How to use python to register a service (an existing .exe file)	4	Feb 16, 2010
Locking a file under Windows	0	Nov 24, 2005
Can't get exclusive file lock when safely renaming a file	3	Dec 6, 2008
appending a new element to an existing xml file	8	Mar 5, 2008

Save to a file, but avoid overwriting an existing file

zoom

Skip Montanaro

Tim Chase

Dave Angel

Emile van Sebille

Cameron Simpson

Mark Lawrence

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads