Save to a file, but avoid overwriting an existing file

Z

zoom

Hi!

I would like to assure that when writing to a file I do not overwrite an
existing file, but I'm unsure which is the best way to approach to this
problem. As I can see, there are at least two possibilities:

1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
which will fail - if the file exists. However, I would prefer if the
program would try to save under different name in this case, instead of
discarding all the calculation done until now - but I' not too well with
catching exceptions.

2. Alternatively, a unique string could be generated to assure that no
same file exists. I can see one approach to this is to include date and
time in the file name. But this seems to me a bit clumsy, and is not
unique, i.e. it could happen (at least in theory) that two processes
finish in the same second.

Any suggestions, please?
 
S

Skip Montanaro

This seems to be an application-level decision. If so, in your
application, why not just check to see if the file exists, and
implement whatever workaround you deem correct for your needs? For
example (to choose a simple, but rather silly, file naming strategy):

fname = "x"
while os.path.exists(fname):
fname = "%s.%f" % (fname, random.random())
fd = open(fname, "w")

It's clearly not going to be safe from race conditions, but I leave
solving that problem as an exercise for the reader.

Skip
 
T

Tim Chase

2. Alternatively, a unique string could be generated to assure that
no same file exists. I can see one approach to this is to include
date and time in the file name. But this seems to me a bit clumsy,
and is not unique, i.e. it could happen (at least in theory) that
two processes finish in the same second.

Python offers a "tempfile" module that gives this (and a whole lot
more) to you out of the box.

-tkc
 
D

Dave Angel

zoom said:
Hi!

I would like to assure that when writing to a file I do not overwrite an
existing file, but I'm unsure which is the best way to approach to this
problem. As I can see, there are at least two possibilities:

1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
which will fail - if the file exists. However, I would prefer if the
program would try to save under different name in this case, instead of
discarding all the calculation done until now - but I' not too well with
catching exceptions.

The tempfile module is your best answer, but if you really need
to keep the file afterwards, you'll have the same problem when
you rename it later.

I suggest you learn about try/except. For simple cases it's not
that tough, though if you want to ask about it, you'll need to
specify your Python version.
 
E

Emile van Sebille

2. Alternatively, a unique string could be generated to assure that no
same file exists. I can see one approach to this is to include date and
time in the file name. But this seems to me a bit clumsy, and is not
unique, i.e. it could happen (at least in theory) that two processes
finish in the same second.

I tend to use this method -- prepending the job name or targeting
different directories per job precludes duplication. Unless you're
running the same job at the same time, in which case tempfile is the way
to go (which I use for archiving spooled print files which can occur
simultaneously.)

Emile
 
C

Cameron Simpson

I would like to assure that when writing to a file I do not
overwrite an existing file, but I'm unsure which is the best way to
approach to this problem. As I can see, there are at least two
possibilities:

1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
which will fail - if the file exists. However, I would prefer if the
program would try to save under different name in this case, instead
of discarding all the calculation done until now - but I' not too
well with catching exceptions.

Others have menthions tempfile, though of course you have the same collision
issue when you come to rename the temp file if you are keeping it.

I would run with option 1 for your task.

Just iterate until os.open succeeds.

However, you need to distinuish _why_ an open fails. For example,
if you were trying to make files in a directory to which you do not
have write permission, or just a directory that did not exist,
os.open would fail not matter what name you used, so your loop would
run forever.

Therefore you need to continue _only_ if you get EEXIST. Otherwise abort.

So you'd have some code like this (totally untested):

# at top of script
import errno

# where you make the file
def open_new(primary_name):
try:
fd = os.open(primary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except OSError as e:
if e.errno != errno.EEXIST:
raise
else:
return primary_name, fd
n = 1
while True:
secondary_name = "%s.%d" % (primary_name, n)
try:
fd = os.open(secondary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except OSError as e:
if e.errno != errno.EEXIST:
raise
else:
return secondary_name, fd
n += 1

# where you need the file
path, fd = open_new("x")

That gets you a function your can reuse which returns the file's
name and the file descriptor.

Cheers,
--
Cameron Simpson <[email protected]>

Reason #173 to fear technology:

o o o o o o <o <o>
^|\ ^|^ v|^ v|v |/v |X| \| |
/\ >\ /< >\ /< >\ /< >\

o> o o o o o o o
\ x </ <|> </> <\> <)> |\
/< >\ /< >\ /< >\ >> L

Mr. email does the Macarena.
 
M

Mark Lawrence

Others have menthions tempfile, though of course you have the same collision
issue when you come to rename the temp file if you are keeping it.

I would run with option 1 for your task.

Just iterate until os.open succeeds.

However, you need to distinuish _why_ an open fails. For example,
if you were trying to make files in a directory to which you do not
have write permission, or just a directory that did not exist,
os.open would fail not matter what name you used, so your loop would
run forever.

Therefore you need to continue _only_ if you get EEXIST. Otherwise abort.

So you'd have some code like this (totally untested):

# at top of script
import errno

# where you make the file
def open_new(primary_name):
try:
fd = os.open(primary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except OSError as e:
if e.errno != errno.EEXIST:
raise
else:
return primary_name, fd
n = 1
while True:
secondary_name = "%s.%d" % (primary_name, n)
try:
fd = os.open(secondary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except OSError as e:
if e.errno != errno.EEXIST:
raise
else:
return secondary_name, fd
n += 1

# where you need the file
path, fd = open_new("x")

That gets you a function your can reuse which returns the file's
name and the file descriptor.

Cheers,

I haven't looked but would things be easier if the new exception
hierarchy were used
http://docs.python.org/3.3/whatsnew/3.3.html#pep-3151-reworking-the-os-and-io-exception-hierarchy
?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top