Using "with open(filename, 'ab'):" and calling code only if the fileis new?

V

Victor Hooi

Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is new?

My understanding is that I don't want to use os.path.exist(), since that opens me up to race conditions.

I'm guessing I can't use try-except with IOError, since the open(..., 'ab') will work whether the file exists or not.

Is there another way I can execute code only if the file is new?

Cheers,
Victor
 
J

Joseph L. Casale

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as
output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is
new?

My understanding is that I don't want to use os.path.exist(), since that opens
me up to race conditions.

What stops you from checking before and setting a flag?
 
D

Dave Angel

On 29/10/2013 21:42, Joseph L. Casale wrote:

You forgot the attribution line: "Victor says"
What stops you from checking before and setting a flag?

Like Victor says, that opens him up to race conditions.

Victor:

You need to more completely specify your environment. Are there
multiple instances of this or similar program running simultaneously?
If so, you've got lots more problems than a missing or duplicated header
line. You could get partial lines intermixing between the two outputs.

Chances are if you really need to support more than one program at the
same time, you'll need to use a lower-level open, perhaps from the os
module. Some form of locking is called for. And if the data SHOULD be
interleaved, you'll have to arrange it so it gets done in whole number
increments.
 
J

Joseph L. Casale

Like Victor says, that opens him up to race conditions.

Slim chance, it's no more possible than it happening in the time try/except
takes to recover an alternative procedure.

with open('in_file') as in_file, open('out_file', 'ab') as outfile_file:
if os.path.getsize('out_file'):
print('file not empty')
else:
#write header
print('file was empty')

And if that's still not acceptable (you did say new) than open the out_file'r+' an seek
and read to check for a header.

But if your file is not new and lacks a header, then what?
jlc
 
V

Victor Hooi

Hi,

In theory, it *should* just be our script writing to the output CSV file.

However, I wanted it to be robust - e.g. in case somebody spins up two copies of this script running concurrently.

I suppose the timing would have to be pretty unlucky to hit a race condition there, right?

As in, somebody would have have to open the new file and write to it somewhere in between the check line (os.path.getsize) and the following line (writeheaders).

However, you're saying the only way to be completely safe is some kind of file locking?

Another person (Zachary Ware) suggested using .tell() on the file as well - I suppose that's similar enough to using os.path.getsize(), right?

But basically, I can call .tell() or os.path.getsize() on the file to see if it's zero, and then just call writeheaders() on the following line.

In the future - we may be moving to storing results in something like SQLite, or MongoDB and outputting a CSV directly from there.

Cheers,
Victor
 
Z

Zachary Ware

Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is new?

My understanding is that I don't want to use os.path.exist(), since that opens me up to race conditions.

I'm guessing I can't use try-except with IOError, since the open(..., 'ab') will work whether the file exists or not.

Is there another way I can execute code only if the file is new?

Cheers,
Victor

I've not tested, but you might try

with ... open(...) as output:
...
if output.tell() == 0:
csv_writer.writeheader()
....

HTH
 
A

Antoon Pardon

Op 30-10-13 02:02, Victor Hooi schreef:
Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is new?

If you are using 3.3 you could use something like this:

with open(self.full_path, 'r') as input:
try:
output = open(self.output_csv, 'abx')
new_file = True
except FileExistsError:
output = open(self.output_csv, 'ab')
new_file = False
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
if new_file:
csv_writer.writeheader()
csv_writer.writerows(my_dict)
 
N

Neil Cerutti

Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only
if the file is new?

My understanding is that I don't want to use os.path.exist(),
since that opens me up to race conditions.

I'm guessing I can't use try-except with IOError, since the
open(..., 'ab') will work whether the file exists or not.

Is there another way I can execute code only if the file is new?

A heavy-duty approach involves prepending the old contents to a
temporary file.

fieldnames = (...)

with tempfile.TempDirectory() as temp:
tempname = os.path.join(temp, 'output.csv')
with open(tempname, 'wb') as output:
writer = csv.DictWriter(output, fieldnames=fieldnames)
writer.writeheader()
try:
with open(self.output_csv, 'b') old_data:
reader = csv.DictReader(old_data)
for rec in reader:
writer.writerow(rec)
except IOError:
pass
with open(self.full_path, 'b') as infile:
# etc...
shutil.copy(tempname, self.output_csv)

This avoids clobbering output_csv unless new data is succesfully
written. I believe TempDirectory isn't available in Python 2, so
some other way of creating that path will be needed, and I'm too
lazy to look up how. ;)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,127
Latest member
CyberDefense
Top