Newbie question: replacing nulls in CSV with preceding value

Matt Waite · Feb 1, 2007

My first post, my first real python use, please be gentle:

I have a CSV file, exported from Excel, that has blank records in it,
and I need to fill them in with the values from the record just above
it until it hits a non-blank value. Here's an example of the data,
which is in a file called test2.csv:

Zone,City,Event
1,Anytown,Event
,,Event1
,,Event2
,,Event44
2,Anothertown,Event3
,,Event5
,,Event7

What I need it to look like is:

Zone,City,Event
1,Anytown,Event1
1,Anytown,Event2
1,Anytown,Event44
2,Anothertown,Event3
2,Anothertown,Event5
2,Anothertown,Event7

Pretty much everything I've tried has failed, and I've been searching
for hours for something similar and haven't found anything. The best
I've come up with --which is half baked and I don't even know if it
works -- is this:

import csv
citynew=''
reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
for row in reader:
row['CITY'] == citynew
else:
citynew=row['CITY']

The logic here -- in this case trying to fill in the city information
-- would seem to work, but I'm not sure. And I'm not sure how to write
the results of this back to the file.

Any help anyone can offer is greatly appreciated. I'm trying hard to
learn, but have been frustrated by this problem.

Matt

Matthew Waite
www.mattwaite.com

skip · Feb 1, 2007

Matt> I have a CSV file, exported from Excel, that has blank records in
Matt> it, and I need to fill them in with the values from the record
Matt> just above it until it hits a non-blank value.

Try something like:

#!/usr/bin/env python

import sys
import csv

last = {}
reader = csv.DictReader(open("test1.csv", "rb"))
writer = csv.DictWriter(open("test2.csv", "wb"),
sys.stdout, fieldnames="Zone City Event".split())
for row in reader:
for key in row:
if not row[key]:
row[key] = last.get(key, "")
writer.writerow(row)
last = row

Skip

Laszlo Nagy · Feb 1, 2007

import csv
citynew=''
reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
for row in reader:
row['CITY'] == citynew
else:
citynew=row['CITY']

The logic here -- in this case trying to fill in the city information
-- would seem to work, but I'm not sure. And I'm not sure how to write
the results of this back to the file.

I recommend that you create a writer and a new file. After you processed
the input, you can delete the original file and replace it with the new
one. My experience is that this is highly efficient and works with very
large files. I would do something like this (not tested)

import csv
import os
import shutil
fin = file("/home/mwaite/test/test2.csv", "rb")
reader = csv.DictReader(fin)
fout = file(fin.name + '.tmp',"wb+")
writer = csv.writer(fout)
prev = {}
columns = ('CITY','ZIP','NAME')
for curr in reader:
foreach col in columns:
if (not curr.has_key(col)) or (curr[col] == ''):
if prev.has_key(col):
curr[col] = prev[col]
else:
curr[col] = ''
row = [curr[col] for col in columns ]
writer.write(row)
prev = curr
os.unlink(fin.name)
fout.close()
shutil.rename(fout.name,fin.name)

I really did not test it, but you can get the idea.

Best,

Laszlo

Bruno Desthuilliers · Feb 1, 2007

Matt Waite a écrit :

My first post, my first real python use, please be gentle:

I have a CSV file, exported from Excel, that has blank records in it,
and I need to fill them in with the values from the record just above
it until it hits a non-blank value. Here's an example of the data,
which is in a file called test2.csv:

Zone,City,Event
1,Anytown,Event
,,Event1
,,Event2
,,Event44
2,Anothertown,Event3
,,Event5
,,Event7

What I need it to look like is:

Zone,City,Event
1,Anytown,Event1
1,Anytown,Event2
1,Anytown,Event44
2,Anothertown,Event3
2,Anothertown,Event5
2,Anothertown,Event7

Pretty much everything I've tried has failed, and I've been searching
for hours for something similar and haven't found anything. The best
I've come up with --which is half baked and I don't even know if it
works -- is this:

import csv
citynew=''
reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
for row in reader:
row['CITY'] == citynew
else:
citynew=row['CITY']

The logic here -- in this case trying to fill in the city information
-- would seem to work, but I'm not sure.

Did you actually tried it ? And itf yes, do you really think it 'works'?

Ok, the solution is quite easy - assuming the very first data row has no
blank values, and that either both Zone and City are blank or both are
filled:

prev_zone = prev_city = ''
for row in reader:
if row['Zone']:
prev_zone, prev_city = row['Zone'], row['City']
else:
row['Zone'], row['City'] = prev_zone, prev_city
# now do something with row...

And I'm not sure how to write
the results of this back to the file.

look for csv.DictWriter.writerow

NB : note that you cannot "write back to the file" - just write to
*another* file (then eventually os.rename(olfile, newfile))

Matt Waite · Feb 1, 2007

Thanks everyone for your help. I got Skip's answer to work (mine is
pasted below):

import sys
import csv

last = {}
reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
writer = csv.DictWriter(open("/home/mwaite/test/test3.csv", "wb"),
['ZONE','CITY','EVENT'], dialect='excel')
for row in reader:
for key in row:
if not row[key]:
row[key] = last.get(key, "")
writer.writerow(row)
last = row

Newbie question: Tuples and reading csv files	3	Mar 29, 2010
Dealing with \r in CSV fields in Python2.4	0	Sep 4, 2013
csv read clean up and write out to csv	2	Nov 2, 2012
.csv to .txt after adding columns	7	Sep 17, 2013
CSV confusion newbie question	1	Dec 5, 2009
Question on the "csv" library	17	Aug 27, 2009
simple file flow question with csv.reader	4	Nov 2, 2011
Implementing a Q-Learning Algorithm with Logistic Regression Normalization in C++	0	Jun 4, 2025

Newbie question: replacing nulls in CSV with preceding value

Matt Waite

skip

Laszlo Nagy

Bruno Desthuilliers

Matt Waite

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads