Using "with open(filename, 'ab'):" and calling code only if the fileis new?

Victor Hooi · Oct 30, 2013

Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is new?

My understanding is that I don't want to use os.path.exist(), since that opens me up to race conditions.

I'm guessing I can't use try-except with IOError, since the open(..., 'ab') will work whether the file exists or not.

Is there another way I can execute code only if the file is new?

Cheers,
Victor

Joseph L. Casale · Oct 30, 2013

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as

output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is
new?

My understanding is that I don't want to use os.path.exist(), since that opens
me up to race conditions.

What stops you from checking before and setting a flag?

Dave Angel · Oct 30, 2013

On 29/10/2013 21:42, Joseph L. Casale wrote:

You forgot the attribution line: "Victor says"

What stops you from checking before and setting a flag?

Like Victor says, that opens him up to race conditions.

Victor:

You need to more completely specify your environment. Are there
multiple instances of this or similar program running simultaneously?
If so, you've got lots more problems than a missing or duplicated header
line. You could get partial lines intermixing between the two outputs.

Chances are if you really need to support more than one program at the
same time, you'll need to use a lower-level open, perhaps from the os
module. Some form of locking is called for. And if the data SHOULD be
interleaved, you'll have to arrange it so it gets done in whole number
increments.

Joseph L. Casale · Oct 30, 2013

Like Victor says, that opens him up to race conditions.

Slim chance, it's no more possible than it happening in the time try/except
takes to recover an alternative procedure.

with open('in_file') as in_file, open('out_file', 'ab') as outfile_file:
if os.path.getsize('out_file'):
print('file not empty')
else:
#write header
print('file was empty')

And if that's still not acceptable (you did say new) than open the out_file'r+' an seek
and read to check for a header.

But if your file is not new and lacks a header, then what?
jlc

Victor Hooi · Oct 30, 2013

Hi,

In theory, it *should* just be our script writing to the output CSV file.

However, I wanted it to be robust - e.g. in case somebody spins up two copies of this script running concurrently.

I suppose the timing would have to be pretty unlucky to hit a race condition there, right?

As in, somebody would have have to open the new file and write to it somewhere in between the check line (os.path.getsize) and the following line (writeheaders).

However, you're saying the only way to be completely safe is some kind of file locking?

Another person (Zachary Ware) suggested using .tell() on the file as well - I suppose that's similar enough to using os.path.getsize(), right?

But basically, I can call .tell() or os.path.getsize() on the file to see if it's zero, and then just call writeheaders() on the following line.

In the future - we may be moving to storing results in something like SQLite, or MongoDB and outputting a CSV directly from there.

Cheers,
Victor

Zachary Ware · Oct 30, 2013

Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is new?

My understanding is that I don't want to use os.path.exist(), since that opens me up to race conditions.

I'm guessing I can't use try-except with IOError, since the open(..., 'ab') will work whether the file exists or not.

Is there another way I can execute code only if the file is new?

Cheers,
Victor

I've not tested, but you might try

with ... open(...) as output:
...
if output.tell() == 0:
csv_writer.writeheader()
....

HTH

Antoon Pardon · Oct 30, 2013

Op 30-10-13 02:02, Victor Hooi schreef:

Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is new?

If you are using 3.3 you could use something like this:

with open(self.full_path, 'r') as input:
try:
output = open(self.output_csv, 'abx')
new_file = True
except FileExistsError:
output = open(self.output_csv, 'ab')
new_file = False
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
if new_file:
csv_writer.writeheader()
csv_writer.writerows(my_dict)

Neil Cerutti · Oct 30, 2013

Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only
if the file is new?

My understanding is that I don't want to use os.path.exist(),
since that opens me up to race conditions.

I'm guessing I can't use try-except with IOError, since the
open(..., 'ab') will work whether the file exists or not.

Is there another way I can execute code only if the file is new?

A heavy-duty approach involves prepending the old contents to a
temporary file.

fieldnames = (...)

with tempfile.TempDirectory() as temp:
tempname = os.path.join(temp, 'output.csv')
with open(tempname, 'wb') as output:
writer = csv.DictWriter(output, fieldnames=fieldnames)
writer.writeheader()
try:
with open(self.output_csv, 'b') old_data:
reader = csv.DictReader(old_data)
for rec in reader:
writer.writerow(rec)
except IOError:
pass
with open(self.full_path, 'b') as infile:
# etc...
shutil.copy(tempname, self.output_csv)

This avoids clobbering output_csv unless new data is succesfully
written. I believe TempDirectory isn't available in Python 2, so
some other way of creating that path will be needed, and I'm too
lazy to look up how.

How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Processing large CSV files - how to maximise throughput?	11	Oct 25, 2013
Using "with" context handler, and catching specific exception?	4	Oct 22, 2013
Using csv.DictReader with \r\n in the middle of fields	4	Oct 13, 2010
CSV, lists, and functions	4	Mar 19, 2013
The pty module, reading from a pty, and Python 2/3	0	Oct 24, 2012
Sharing common code between multiple scripts?	0	Oct 30, 2013
Optparse object containing generators: only seem to work if givenparticular names?	2	Oct 31, 2008

Using "with open(filename, 'ab'):" and calling code only if the fileis new?

Victor Hooi

Joseph L. Casale

Dave Angel

Joseph L. Casale

Victor Hooi

Zachary Ware

Antoon Pardon

Neil Cerutti

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads