Newbie question: replacing nulls in CSV with preceding value

M

Matt Waite

My first post, my first real python use, please be gentle:

I have a CSV file, exported from Excel, that has blank records in it,
and I need to fill them in with the values from the record just above
it until it hits a non-blank value. Here's an example of the data,
which is in a file called test2.csv:

Zone,City,Event
1,Anytown,Event
,,Event1
,,Event2
,,Event44
2,Anothertown,Event3
,,Event5
,,Event7

What I need it to look like is:

Zone,City,Event
1,Anytown,Event1
1,Anytown,Event2
1,Anytown,Event44
2,Anothertown,Event3
2,Anothertown,Event5
2,Anothertown,Event7

Pretty much everything I've tried has failed, and I've been searching
for hours for something similar and haven't found anything. The best
I've come up with --which is half baked and I don't even know if it
works -- is this:

import csv
citynew=''
reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
for row in reader:
row['CITY'] == citynew
else:
citynew=row['CITY']

The logic here -- in this case trying to fill in the city information
-- would seem to work, but I'm not sure. And I'm not sure how to write
the results of this back to the file.

Any help anyone can offer is greatly appreciated. I'm trying hard to
learn, but have been frustrated by this problem.

Matt

Matthew Waite
www.mattwaite.com
 
S

skip

Matt> I have a CSV file, exported from Excel, that has blank records in
Matt> it, and I need to fill them in with the values from the record
Matt> just above it until it hits a non-blank value.

Try something like:

#!/usr/bin/env python

import sys
import csv

last = {}
reader = csv.DictReader(open("test1.csv", "rb"))
writer = csv.DictWriter(open("test2.csv", "wb"),
sys.stdout, fieldnames="Zone City Event".split())
for row in reader:
for key in row:
if not row[key]:
row[key] = last.get(key, "")
writer.writerow(row)
last = row

Skip
 
L

Laszlo Nagy

import csv
citynew=''
reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
for row in reader:
row['CITY'] == citynew
else:
citynew=row['CITY']

The logic here -- in this case trying to fill in the city information
-- would seem to work, but I'm not sure. And I'm not sure how to write
the results of this back to the file.
I recommend that you create a writer and a new file. After you processed
the input, you can delete the original file and replace it with the new
one. My experience is that this is highly efficient and works with very
large files. I would do something like this (not tested)

import csv
import os
import shutil
fin = file("/home/mwaite/test/test2.csv", "rb")
reader = csv.DictReader(fin)
fout = file(fin.name + '.tmp',"wb+")
writer = csv.writer(fout)
prev = {}
columns = ('CITY','ZIP','NAME')
for curr in reader:
foreach col in columns:
if (not curr.has_key(col)) or (curr[col] == ''):
if prev.has_key(col):
curr[col] = prev[col]
else:
curr[col] = ''
row = [curr[col] for col in columns ]
writer.write(row)
prev = curr
os.unlink(fin.name)
fout.close()
shutil.rename(fout.name,fin.name)

I really did not test it, but you can get the idea.

Best,

Laszlo
 
B

Bruno Desthuilliers

Matt Waite a écrit :
My first post, my first real python use, please be gentle:

I have a CSV file, exported from Excel, that has blank records in it,
and I need to fill them in with the values from the record just above
it until it hits a non-blank value. Here's an example of the data,
which is in a file called test2.csv:

Zone,City,Event
1,Anytown,Event
,,Event1
,,Event2
,,Event44
2,Anothertown,Event3
,,Event5
,,Event7

What I need it to look like is:

Zone,City,Event
1,Anytown,Event1
1,Anytown,Event2
1,Anytown,Event44
2,Anothertown,Event3
2,Anothertown,Event5
2,Anothertown,Event7

Pretty much everything I've tried has failed, and I've been searching
for hours for something similar and haven't found anything. The best
I've come up with --which is half baked and I don't even know if it
works -- is this:

import csv
citynew=''
reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
for row in reader:
row['CITY'] == citynew
else:
citynew=row['CITY']

The logic here -- in this case trying to fill in the city information
-- would seem to work, but I'm not sure.

Did you actually tried it ? And itf yes, do you really think it 'works'?

Ok, the solution is quite easy - assuming the very first data row has no
blank values, and that either both Zone and City are blank or both are
filled:

prev_zone = prev_city = ''
for row in reader:
if row['Zone']:
prev_zone, prev_city = row['Zone'], row['City']
else:
row['Zone'], row['City'] = prev_zone, prev_city
# now do something with row...
And I'm not sure how to write
the results of this back to the file.

look for csv.DictWriter.writerow

NB : note that you cannot "write back to the file" - just write to
*another* file (then eventually os.rename(olfile, newfile))
 
M

Matt Waite

Thanks everyone for your help. I got Skip's answer to work (mine is
pasted below):

import sys
import csv

last = {}
reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
writer = csv.DictWriter(open("/home/mwaite/test/test3.csv", "wb"),
['ZONE','CITY','EVENT'], dialect='excel')
for row in reader:
for key in row:
if not row[key]:
row[key] = last.get(key, "")
writer.writerow(row)
last = row
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top